Delta Lake 4.0: What Scala Engineers Need to Know
Delta Lake 4.0 shipped in September 2025, requiring Apache Spark 4.0 and dropping Scala 2.12 entirely. Here's what changed, what's new, and what it means for your sbt builds and production pipelines.
The Platform Shift: Spark 4.0 and Scala 2.13 Only
Delta Lake 4.0 is built on Apache Spark 4.0. That means the same platform prerequisites apply: Scala 2.13 only, JDK 17 minimum, and no backward compatibility with Spark 3.x.
If you haven't upgraded to Spark 4.0 yet, Delta Lake 4.0 forces the issue. The Spark 3.x to 4.0 migration guide covers the full upgrade path — ANSI mode, Scala 2.12 to 2.13, JDK 17, dependency bumps. You'll need to complete that migration before moving to Delta Lake 4.0.
The Maven coordinates for 4.0 are straightforward:
// build.sbt — Delta Lake 4.0 with Spark 4.0
libraryDependencies += "io.delta" %% "delta-spark" % "4.0.0"
// The artifact resolves to delta-spark_2.13:4.0.0 on Maven Central
Delta Standalone, Delta Flink, and Delta Hive connectors are no longer released as part of the 4.x line. They've been superseded by Delta Kernel. If you depend on any of these, you'll need to migrate to Kernel-based alternatives or stay on the 3.x maintenance branch for critical security fixes.
Delta Connect: Delta Operations Over Spark Connect
Delta Connect extends the Spark Connect wire protocol to support Delta-specific operations. If your team is adopting Spark Connect's client-server architecture, Delta Connect means you don't lose Delta functionality in the process.
The practical benefit: Delta APIs — MERGE INTO, UPDATE, DELETE, time travel, and table management — all work through the Connect protocol. The client and server can evolve independently, which is the same decoupling model Spark Connect itself provides.
// Delta Connect artifacts for client-server architecture
libraryDependencies ++= Seq(
"io.delta" %% "delta-connect-client" % "4.0.0",
"io.delta" %% "delta-connect-common" % "4.0.0",
"io.delta" %% "delta-connect-server" % "4.0.0"
)
This is a preview feature in 4.0. It works, but expect the protocol to evolve in subsequent releases.
Catalog-Managed Tables (Preview)
Delta Lake 4.0 introduces catalog-managed tables — a shift from the traditional filesystem-managed model where Delta handles its own transaction log directly on storage. With catalog-managed tables, the catalog becomes the coordinator of table access and the source of truth for table state.
What this enables down the road: multi-table transactions, centralized governance, better observability, and coordinated commits across writers. In 4.0, catalog-managed tables support INSERT, MERGE INTO, UPDATE, and DELETE.
The important caveat: this is a preview feature and not recommended for production tables. The protocol is still under active development, and future releases may not maintain backward compatibility with tables created during the preview. Use it for experimentation and evaluation, not for data you can't recreate.
If you're using Unity Catalog, catalog-managed tables are the direction Delta is heading. Delta Lake 4.0.1 renamed the internal feature from catalogOwned-preview to catalogManaged and updated the table identifier from ucTableId to io.unitycatalog.tableId — if you experimented with early builds, be aware of this breaking change.
Variant Data Type: Schema-on-Read for Semi-Structured Data
Spark 4.0 introduced the VARIANT data type for semi-structured data, and Delta Lake 4.0 brings full support for storing and querying Variant columns in Delta tables.
This matters for any pipeline that handles JSON, event data, or payloads with variable schemas. Instead of storing JSON as a string column and parsing it on every read, Variant uses a high-performance binary encoding that supports efficient querying without upfront schema definitions.
// Store semi-structured event data in a Delta table with Variant
spark.sql("""
CREATE TABLE events (
event_id BIGINT,
payload VARIANT
) USING delta
""")
// Insert JSON data — stored as efficient binary, not a raw string
spark.sql("""
INSERT INTO events VALUES
(1, parse_json('{"action": "click", "target": "button_save", "meta": {"browser": "chrome"}}')),
(2, parse_json('{"action": "purchase", "amount": 49.99, "items": ["widget_a", "widget_b"]}'))
""")
// Query nested fields directly — no JSON parsing on read
spark.sql("SELECT event_id, payload:action, payload:amount FROM events")
Delta Lake 4.0 also includes a preview of Shredded Variants, which extract specific columns from variant data into dedicated Parquet columns for optimized querying. The Delta team claims up to 20x read performance improvement for queries that target specific fields within variant data. Shredded Variants follow the Parquet Variant Shredding specification, but tables created with this preview feature may lack forward compatibility — treat it as experimental.
Type Widening: Change Column Types Without Rewriting Data
Type widening graduated from preview to general availability in Delta Lake 4.0. It lets you change a column's data type — say, INT to LONG, or FLOAT to DOUBLE — without rewriting the underlying data files.
// Widen a column type in place — no data rewrite needed
spark.sql("ALTER TABLE orders CHANGE COLUMN quantity TYPE LONG")
// Works with schema evolution too — INSERT and MERGE can auto-widen
spark.sql("""
MERGE INTO orders USING updates
ON orders.id = updates.id
WHEN MATCHED THEN UPDATE SET *
WHEN NOT MATCHED THEN INSERT *
""")
// If 'updates' has a wider type for a column, Delta handles it automatically
Type changes can be applied manually with ALTER TABLE CHANGE COLUMN TYPE or automatically through schema evolution in INSERT and MERGE operations. Delta Lake 4.1.0 went further by making automatic type widening the default mode.
This is a practical improvement for any pipeline that evolves over time. Previously, widening a column type required a full table rewrite — which for large tables means significant compute and storage costs. Now it's a metadata operation.
One thing to note: reading tables with type widening enabled requires Delta Lake 3.3 or later. If you have downstream consumers on older Delta versions, they won't be able to read these tables.
Row Tracking: Efficient Change Data Feeds
Row Tracking allows Delta to assign stable identifiers to individual rows across inserts, updates, and deletes. Each row gets a unique row ID and a commit version, enabling efficient tracking of how rows change over time.
The primary use case is Change Data Feed (CDF). If you're building CDC pipelines that need to know exactly which rows changed between two versions of a table, Row Tracking makes this significantly more efficient. Instead of diffing entire file sets, Delta can track row-level provenance directly.
// Enable row tracking on a table
spark.sql("""
CREATE TABLE customers (
id BIGINT,
name STRING,
email STRING
) USING delta
TBLPROPERTIES ('delta.enableRowTracking' = 'true')
""")
// Query the change data feed to see what changed
spark.read.format("delta")
.option("readChangeFeed", "true")
.option("startingVersion", 5)
.table("customers")
.show()
// Each row includes _change_type, _commit_version, _commit_timestamp
Delta Kernel now supports writing to row-tracking-enabled tables, which means third-party connectors built on Kernel can participate in the row tracking protocol. This broadens the ecosystem beyond just Spark.
Drop Feature Without Truncating History
Previous versions of Delta Lake required truncating table history to remove a table feature. Delta Lake 4.0 introduces a new DROP FEATURE implementation that removes features instantly while preserving the full transaction log history.
// Drop a table feature without losing history
spark.sql("ALTER TABLE my_table DROP FEATURE deletionVectors")
// If you do need to truncate history (legacy behavior), use the explicit clause
spark.sql("ALTER TABLE my_table DROP FEATURE deletionVectors TRUNCATE HISTORY")
This matters for table evolution and client compatibility. If you enabled a feature that a downstream reader doesn't support, you can now drop it without destroying the table's audit trail. The new approach introduces a checkpointProtection writer feature under the hood to maintain log integrity.
The 4.1.0 Artifact Rename: Update Your sbt Builds
Delta Lake 4.1.0, released in early 2026, introduces a significant change to Maven artifact naming. Artifacts now include a Spark version suffix alongside the Scala version:
// Delta Lake 4.0.x — artifact includes only Scala version
libraryDependencies += "io.delta" %% "delta-spark" % "4.0.0"
// Resolves to: delta-spark_2.13:4.0.0
// Delta Lake 4.1.0 — artifact now includes Spark AND Scala version
// For Spark 4.1:
libraryDependencies += "io.delta" % "delta-spark_4.1_2.13" % "4.1.0"
// For Spark 4.0 (backward compatible build):
libraryDependencies += "io.delta" % "delta-spark_4.0_2.13" % "4.1.0"
Notice the change from %% (which appends only the Scala version) to % with the full artifact name including the Spark version. This is because sbt's %% operator only handles Scala cross-building, not Spark version suffixes.
The old unsuffixed artifacts are still published in 4.1.0 for backward compatibility, but this won't last forever. Update your build files now. Delta Lake 4.1.0 is compatible with both Spark 4.1.0 and Spark 4.0.1, so you can choose the artifact that matches your runtime.
UniForm with Iceberg: Not Yet
One notable gap in Delta Lake 4.0: UniForm with Iceberg is not available. When 4.0 shipped, Apache Iceberg didn't yet support Spark 4.0, so the UniForm interoperability feature couldn't be included. This is expected to be re-enabled in a future release once Iceberg catches up.
If your architecture depends on UniForm to expose Delta tables as Iceberg tables for cross-engine access, you'll need to stay on Delta 3.x or wait for this to land.
Upgrade Checklist: Delta 3.x to 4.0
If you're moving from Delta Lake 3.x, work through this in order:
- [ ] Upgrade to Spark 4.0 first. Delta 4.0 requires it. Follow the Spark 3 to 4 migration guide — Scala 2.13, JDK 17, ANSI mode, dependency changes
- [ ] Update your Delta dependency. Change
delta-sparkto version4.0.0in your build. If jumping straight to 4.1, use the new artifact naming with Spark version suffix - [ ] Check for Delta Standalone usage. If you use Delta Standalone, Flink, or Hive connectors, plan a migration to Delta Kernel — these are no longer published in 4.x
- [ ] Audit table features. If you plan to use type widening, Row Tracking, or Variant columns, verify that all downstream readers support Delta 3.3+ (for type widening) or Delta 4.0+ (for newer features)
- [ ] Test catalog-managed tables separately. If you're evaluating Unity Catalog integration, keep preview features on non-production tables until the protocol stabilizes
- [ ] Check UniForm dependencies. If you rely on UniForm with Iceberg, this is blocked in 4.0 — plan accordingly
- [ ] Run your test suite. Delta 4.0's behavioral changes are small compared to Spark 4.0's, but validate end-to-end before promoting to production
The biggest practical change for most Scala teams is the platform requirement: if you're not on Spark 4.0 yet, Delta Lake 4.0 is the push to get there. If you're already on Spark 4.0 or 4.1, the Delta upgrade is straightforward — update the dependency, test, and start evaluating the new features.
For the full release details, see the Delta Lake 4.0 blog post and the GitHub releases page.