Job Board
Consulting

Scala 3 and Spark: Where Things Stand in 2026

Apache Spark still ships exclusively for Scala 2.13, and official Scala 3 support has no target release. But a practical workaround exists today using Scala 3's forward-compatibility mode. Here's what works, what doesn't, and whether your team should try it.

The Official Status: SPARK-54150

SPARK-54150, filed in November 2025, is the umbrella ticket tracking Scala 3 support in Apache Spark. It's a Major priority issue — but it has no target version and is currently unassigned.

Of the three sub-tasks:

  • SPARK-55118 — Replace ASM Opcodes wildcard imports → Resolved
  • SPARK-55152 — Evaluate serialization strategy for Scala 3 support → Open
  • SPARK-55157 — Replace classutil with ClassGraph in GenerateMIMAIgnore tool → Open

The critical blocker is SPARK-55152: the serialization strategy evaluation. Spark's internal serialization relies on Scala 2 runtime reflection — specifically TypeTags and the Scala 2 metaprogramming API — which was completely redesigned in Scala 3. Figuring out how to handle serialization without runtime reflection is the core engineering challenge, and that work hasn't started in earnest.

The ticket has 11 watchers and some interest from the Scala Teams at VirtusLab, but no concrete roadmap. At the current pace, official Scala 3 support is not imminent.

What Spark 4.x Ships Today

Spark 4.0 dropped Scala 2.12 and moved to Scala 2.13 only. Spark 4.1 bumped to Scala 2.13.17. There is no Scala 3 build of Spark, and none is planned for the 4.x line as far as the public roadmap shows.

This is the baseline: if you want to use Spark, you're linking against Scala 2.13 artifacts. The question is whether you can write your application code in Scala 3 while still depending on those 2.13 jars.

The Workaround: Scala 3's Forward-Compatibility Mode

Scala 3 can consume Scala 2.13 libraries natively. The Scala 3 compiler includes an unpickler that reads the Scala 2.13 Pickle format, and the Scala 2.13 standard library is actually the official standard library for Scala 3. This means you can set your project to Scala 3 and pull in Spark's 2.13 artifacts directly.

In sbt, the configuration looks like this:

// build.sbt — Scala 3 application with Spark 4.1 (Scala 2.13 artifacts)
scalaVersion := "3.6.4"

libraryDependencies ++= Seq(
  ("org.apache.spark" %% "spark-core" % "4.1.0" % "provided")
    .cross(CrossVersion.for3Use2_13),
  ("org.apache.spark" %% "spark-sql" % "4.1.0" % "provided")
    .cross(CrossVersion.for3Use2_13)
)

The .cross(CrossVersion.for3Use2_13) tells sbt to resolve the _2.13 variant of each dependency even though your project is compiling with Scala 3. This is a supported sbt feature, not a hack.

If you're using provided scope for Spark dependencies (and you should be), the % "provided" works the same way it always has — the cluster supplies Spark at runtime.

The Encoder Problem

Here's where it gets complicated. Spark's Encoder instances — the mechanism that converts between JVM objects and Spark's internal Tungsten format — are derived at runtime using Scala 2's reflection API. Scala 3 does not support Scala 2 runtime reflection. This means implicit Encoder derivation for case classes simply doesn't work out of the box.

// This works in Scala 2.13 with import spark.implicits._
case class Order(id: Long, amount: Double)
val ds: Dataset[Order] = df.as[Order]  // Encoder derived via runtime reflection

// In Scala 3, the implicit Encoder[Order] cannot be derived
// You get a compile error: no given instance of type Encoder[Order]

The spark-scala3 library by Vincenzo Bazzucchi provides a workaround. It uses Scala 3's compile-time metaprogramming to derive Encoder instances instead of relying on runtime reflection:

// build.sbt — add spark-scala3 for encoder derivation
libraryDependencies ++= Seq(
  // For Spark 4.x:
  "io.github.vincenzobaz" %% "spark4-scala3-encoders" % "0.3.2",
  "io.github.vincenzobaz" %% "spark4-scala3-udf" % "0.3.2"
  // For Spark 3.x, use "spark-scala3-encoders" and "spark-scala3-udf" instead
)

// In your application code:
import scala3encoders.given

case class Order(id: Long, amount: Double)
val ds: Dataset[Order] = df.as[Order]  // Now works — encoder derived at compile time

The library also provides UDF support, since Spark's built-in udf() function relies on TypeTags that don't exist in Scala 3.

Known Limitations

The forward-compatibility approach works for many use cases, but it has real gaps:

Encoder coverage is incomplete. The spark-scala3 library handles common types — case classes, primitives, strings, collections — but not every type Spark's built-in encoders support. Complex nested structures, Option types, and Array[Byte] have known issues. The VirtusLab blog post documenting early experiments had to convert Array[Byte] to Base64 strings and replace Option[String] with plain String to make things work.

Scala 3 macros don't cross the boundary. Any Scala 3 code that uses inline definitions, match types, or Scala 3-specific macro features cannot be consumed by the Scala 2.13 runtime. Since Spark itself is compiled with Scala 2.13, your application code must avoid Scala 3 features that produce artifacts incompatible with the 2.13 classloader.

Spark 3.3.x has type-check issues. The spark-scala3 library recommends Spark 3.4.1 or later. If you're on Spark 3.3.x, collect() and similar operations can fail unexpectedly due to type-check problems.

It's a third-party dependency. spark-scala3 is maintained by one person. It's well-made, but it's not battle-tested at the scale of Spark's own encoder infrastructure. For production pipelines processing critical data, this is a risk factor your team should weigh.

Should You Try It?

Here's a practical decision framework:

Yes, if: - You're starting a new project and want Scala 3 language features (enums, union types, contextual abstractions, improved pattern matching) - Your Spark usage is primarily DataFrame API calls (SQL expressions, built-in functions) rather than heavy typed Dataset work - You're comfortable depending on spark-scala3 and can work around its encoder gaps - You want to position your codebase for the eventual official Scala 3 support

Wait, if: - You have a production pipeline running on Scala 2.13 that works reliably — a language migration adds risk without changing functionality - Your code makes heavy use of typed Datasets, custom encoders, or complex UDFs that depend on implicit derivation - You need the stability guarantees of a fully supported stack for compliance or SLA reasons

Contribute upstream, if: - You have deep expertise in both Scala 3 metaprogramming and Spark internals — the SPARK-55152 serialization strategy evaluation is the bottleneck, and the Spark project has signaled openness to external contributors

The TASTy Bridge Goes Both Ways

One additional detail worth knowing: the compatibility also works in reverse. Scala 2.13.13+ includes a TASTy reader that can consume Scala 3 libraries, enabled with the -Ytasty-reader compiler flag. This means a Scala 2.13 project can depend on libraries published for Scala 3.

This matters for library authors. If you're publishing a Spark utility library and want to support both Scala 2.13 and Scala 3 consumers, you can publish for one version and let the cross-compatibility bridge handle the other direction. The Scala migration guide documents supported and unsupported features in detail — union types, match types, and Scala 3 macros are among the features that don't cross the bridge.

What This Means for the Ecosystem

As we noted in The State of Spark Scala in 2026, the Spark project is actively investing in Scala 2.13. Spark Connect reached full Java client API parity in 4.0, Structured Streaming real-time mode ships for Scala, and the 4.x line continues to evolve.

But the Scala language itself has moved to Scala 3. The broader ecosystem — Cats, ZIO, http4s, Circe — has largely migrated. Spark is one of the last major Scala projects still on 2.13 only, and the gap between the language ecosystem and the data processing ecosystem is widening.

For teams maintaining Spark Scala applications, the practical answer in 2026 is: write Scala 2.13. The forward-compatibility workaround exists and works for some use cases, but official Scala 3 support in Spark requires solving the serialization problem at the engine level, and that work is still in the evaluation phase with no timeline.

Keep an eye on SPARK-54150. When the serialization strategy sub-task moves to "In Progress," that's when things will start to change.

Article Details

Created: 2026-04-06

Last Updated: 2026-04-06 10:44:51 PM