Page 1 of 2
-
ANSI Mode by Default in Spark 4.0: What Breaks and How to Fix It
Spark 4.0 flipped spark.sql.ansi.enabled from false to true, so invalid casts, arithmetic overflow, divide-by-zero, and bad array indices that used to silently return null now throw runtime errors. This guide catalogs each failure mode with the exception you'll see and the try_* function that fixes it without falling back to legacy mode.
-
DuckDB for Spark Scala Developers: What You Need to Know
DuckDB is an embedded, in-process columnar OLAP engine that runs inside your application — no cluster, no JVM serialization tax, sub-second startup. For Spark Scala developers, the entry point is the DuckDB JDBC driver on Maven Central, and the Scala 3 duck4s wrapper if you want a more idiomatic API. This is not a Spark replacement — it's a complementary tool that fits the gaps Spark was never designed to fill.
-
The JVM Is Not Dead: Why Scala Spark Still Makes Sense
The "PySpark won, Scala is legacy" narrative is half right and half lazy. PySpark genuinely owns notebooks, ML, and the hiring funnel — but Spark itself still runs on the JVM, and Scala code still executes on the engine without a serialization boundary. Here's an honest look at where each language wins in 2026, and why Scala remains the right call for a meaningful slice of production work.
-
Apache Spark on Databricks vs Open Source in 2026
The Databricks vs open-source Spark debate is usually framed as a feature comparison, but for Scala teams shipping production pipelines it's really a question about who owns operational complexity. Here's a practical decision guide for 2026 — where the gap has narrowed, where it persists, and what should actually drive the call.
-
Dependency Confusion Attacks and Your Private Spark Libraries
Five years after Alex Birsan's original dependency confusion research collected more than $130,000 in bug bounties from Apple, Microsoft, PayPal, and Shopify, the same class of supply-chain attack is still landing in JVM builds. Spark Scala teams are an especially easy target — and sbt's default resolver behavior, combined with repository managers that auto-proxy Maven Central, makes the attack surprisingly close to a one-line publish on Sonatype's side.
-
Choosing a Private Maven Repository for Your Spark Scala Team in 2026
Most comparison guides for artifact repositories are written for Java teams using Maven or Gradle. If your team builds Spark Scala applications with sbt, the landscape looks different — and some popular options have sharp edges that only show up with sbt's dependency resolution.
-
sbt 2.0 and What It Means for Spark Scala Projects
sbt 2.0 is in its final release candidates with the 2.0.0 milestone fully closed. Build definitions now require Scala 3, all tasks are cached by default with Bazel-compatible remote caching, and the plugin ecosystem is being rebuilt. Here's what Spark Scala teams need to know before upgrading.
-
Apache Iceberg vs Delta Lake: Choosing a Table Format
Both Iceberg and Delta Lake give you ACID transactions, time travel, and schema evolution on top of object storage. But they make different architectural trade-offs that matter when you're building Spark Scala pipelines. Here's a practical comparison to help you choose.
-
Scala 3 and Spark: Where Things Stand in 2026
Apache Spark still ships exclusively for Scala 2.13, and official Scala 3 support has no target release. But a practical workaround exists today using Scala 3's forward-compatibility mode. Here's what works, what doesn't, and whether your team should try it.
-
Spark Declarative Pipelines: First Look from a Scala Dev
Spark 4.1 introduces Spark Declarative Pipelines (SDP) — a framework for building managed ETL pipelines where you declare datasets and Spark handles the rest. The catch for Scala developers: authoring is Python and SQL only, with no JVM support yet.