Job Board
Consulting

Delta Lake UniForm: Write Delta, Read Iceberg

Delta Lake's Universal Format generates Iceberg metadata alongside the Delta transaction log, against the same Parquet files, so Iceberg-native engines can read your Delta tables without conversion or duplication. With Delta 4.0.1 restoring Iceberg compat for Spark 4.0, UniForm is once again a usable option for Scala teams that need cross-engine reads — provided you understand the limitations.

For broader context on the two formats, see Apache Iceberg vs Delta Lake: Choosing a Table Format.

The Problem UniForm Solves

If you write Delta tables with Spark Scala and your only readers are also Spark, you don't need UniForm. The interesting case is when something else — Trino, Athena, Snowflake, BigQuery, Flink — needs to query the same data and only speaks Iceberg. The two historical answers were both bad: maintain two physical copies of the data in different formats, or run an ETL job that periodically converts one to the other. Both burn storage, both introduce staleness, both create reconciliation pain.

UniForm's pitch is that you don't need either. A Delta table with UniForm enabled writes one set of Parquet files and one Delta _delta_log/, then asynchronously emits Iceberg metadata files that point at the same Parquet. Iceberg-native readers see a perfectly normal Iceberg table; Spark Delta writers see a perfectly normal Delta table. No data duplication, no copy job, no scheduled conversion.

The trade-off is that the Iceberg side is read-only — and that's the right trade. Trying to coordinate writes from two different format protocols against shared files is a transactional nightmare nobody wants to debug at 3am.


Where UniForm Stands in 2026

UniForm with Iceberg has been GA since Delta Lake 3.2.0 in 2024, but the timeline got messy with the Spark 4.0 transition. The current state, as of May 2026:

  • Delta Lake 3.2.0–3.3.x — UniForm with Iceberg and Hudi, on Spark 3.x. Mature and widely deployed.
  • Delta Lake 4.0.0 (September 2025) — Iceberg UniForm dropped because Apache Iceberg didn't yet support Spark 4.0. The Delta Lake 4.0 release post called this out as a known gap.
  • Delta Lake 4.0.1+ — Iceberg compatibility restored for Spark 4.0, paired with Iceberg 1.10.x which shipped full Spark 4.0 support.
  • Delta Lake 4.1.x — Iceberg UniForm continues to work with Spark 4.0; Spark 4.1 support is still catching up at the time of writing.

If you're on Delta 4.0.0 specifically and depend on UniForm, the fix is to bump to 4.0.1 or newer. If you're still on Delta 3.x with no Spark 4.0 migration on the horizon, UniForm works as-is and there's nothing to change. The full picture is in the Delta UniForm documentation.

For the broader Delta 4.0 upgrade story, see Delta Lake 4.0: What Scala Engineers Need to Know.


Enabling UniForm on a New Table

UniForm with Iceberg requires three table properties and a runtime dependency. The properties enable the Iceberg compatibility writer protocol and turn on metadata generation:

// build.sbt — Delta plus the Iceberg compat artifact
libraryDependencies ++= Seq(
  "io.delta" %% "delta-spark"   % "3.3.0",
  "io.delta" %  "delta-iceberg_2.13" % "3.3.0"
)
-- Create a Delta table with UniForm Iceberg enabled
CREATE TABLE sales.orders (
  order_id   BIGINT,
  customer   STRING,
  order_date DATE,
  amount     DECIMAL(10, 2)
) USING DELTA
TBLPROPERTIES (
  'delta.columnMapping.mode'              = 'name',
  'delta.enableIcebergCompatV2'           = 'true',
  'delta.universalFormat.enabledFormats'  = 'iceberg'
);

The three properties together do specific work:

  • delta.columnMapping.mode = 'name' — Required. Iceberg tracks columns by stable ID, and column mapping is what gives Delta the ID-based layer Iceberg needs to interoperate. Without it, schema changes on the Delta side wouldn't translate cleanly into Iceberg metadata.
  • delta.enableIcebergCompatV2 = 'true' — Enables the Iceberg compatibility writer feature, which constrains writes so the resulting Parquet files are readable by Iceberg v2 clients (no deletion vectors, no unsupported types).
  • delta.universalFormat.enabledFormats = 'iceberg' — Switches on the metadata generator. After every Delta commit, an Iceberg-compatible metadata.json is written alongside the Delta log.

A first insert produces both the Delta log entry and the corresponding Iceberg metadata:

import org.apache.spark.sql.SparkSession

val spark = SparkSession.builder()
  .appName("uniform-demo")
  .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension")
  .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")
  .getOrCreate()

import spark.implicits._

Seq(
  (1L, "alice", java.sql.Date.valueOf("2026-05-01"), BigDecimal("49.99")),
  (2L, "bob",   java.sql.Date.valueOf("2026-05-02"), BigDecimal("12.50"))
).toDF("order_id", "customer", "order_date", "amount")
  .write.format("delta").mode("append").saveAsTable("sales.orders")

After that write, the table directory contains both protocols' metadata side-by-side:

s3://warehouse/sales/orders/
  _delta_log/
    00000000000000000000.json        -- Delta commit
    00000000000000000001.json        -- Delta commit
  metadata/
    v1.metadata.json                 -- Iceberg metadata pointer
    v2.metadata.json
    snap-001-uuid.avro               -- Iceberg manifest list
    manifest-001-uuid.avro           -- Iceberg manifest
  part-00000-uuid.snappy.parquet     -- Shared data files
  part-00001-uuid.snappy.parquet

Both the Delta log and the Iceberg metadata reference the same Parquet files. There's no second copy of the data.


Enabling UniForm on an Existing Table

For tables created without UniForm, Delta 3.3+ supports an in-place enable via ALTER TABLE — no data rewrite required, as long as the table's existing data is already Iceberg-compatible:

ALTER TABLE sales.orders SET TBLPROPERTIES (
  'delta.columnMapping.mode'              = 'name',
  'delta.enableIcebergCompatV2'           = 'true',
  'delta.universalFormat.enabledFormats'  = 'iceberg'
);

If the table has features that conflict with Iceberg compat — deletion vectors, unsupported types, partitioning that doesn't translate — the ALTER TABLE will fail with a clear error. In that case you have two options:

  1. Drop the conflicting feature first. For deletion vectors specifically, you can use the DROP FEATURE workflow from Delta 4.0 to remove the feature without truncating history, then enable UniForm.
  2. Use REORG TABLE ... APPLY (UPGRADE UNIFORM(...)) to enable UniForm and rewrite the data files in one step. This costs a full rewrite but resolves all compatibility issues at once.
-- Drop deletion vectors, then enable UniForm
ALTER TABLE sales.orders DROP FEATURE deletionVectors;

ALTER TABLE sales.orders SET TBLPROPERTIES (
  'delta.columnMapping.mode'              = 'name',
  'delta.enableIcebergCompatV2'           = 'true',
  'delta.universalFormat.enabledFormats'  = 'iceberg'
);

-- Or rewrite in one step
REORG TABLE sales.orders APPLY (UPGRADE UNIFORM(ICEBERG_COMPAT_VERSION = 2));

After enabling, the next Delta write triggers the first Iceberg metadata generation. If you want metadata immediately without waiting for a write, run MSCK REPAIR TABLE sales.orders SYNC METADATA to force generation against the current Delta snapshot.


Reading From the Iceberg Side

The Delta tables generated by UniForm look like ordinary Iceberg tables to any Iceberg-aware engine. The connection details depend on what the engine reads:

Engines that speak Hive Metastore + Iceberg — Trino, Spark with the Iceberg catalog, Flink — register UniForm tables through HMS the same way as any other Iceberg table. UniForm writes the table location and metadata pointer into HMS as part of its metadata generation step (which is why HMS is currently a hard requirement on the write side).

-- Trino reading the UniForm table as Iceberg
SELECT order_id, customer, amount
FROM iceberg_catalog.sales.orders
WHERE order_date >= DATE '2026-05-01';

Engines that read metadata files by path — useful for ad-hoc queries or services that don't share a catalog — point at the latest metadata.json directly. UniForm writes the metadata at <table-path>/metadata/v<version>-<uuid>.metadata.json, and the version number increases with each Iceberg-emitted snapshot.

// Spark reading the UniForm table as Iceberg (via the Iceberg connector)
val df = spark.read
  .format("iceberg")
  .load("s3://warehouse/sales/orders")

df.filter($"amount" > 25).show()

The catalog requirement is real. As of Delta 3.3, UniForm only auto-generates Iceberg metadata when the table is accessed by name through HMS, not when accessed purely by path. If your writers use .save("s3://...") instead of .saveAsTable("db.name"), no Iceberg metadata gets written. This is the most common configuration mistake — and it fails silently, since the Delta side keeps working fine.

For a deeper look at how Iceberg catalogs work and the broader REST catalog story, see Apache Polaris: The Open Standard Iceberg Catalog.


Async Generation and Read Consistency

UniForm generates Iceberg metadata asynchronously after each Delta commit completes. This is the right choice — synchronous generation would put Iceberg metadata writes in the critical path of every Delta transaction, slowing writes for the sake of readers that may not exist. But it has consequences worth understanding:

  • Iceberg reads can lag Delta writes. A Spark job that writes to a Delta table and a Trino query that immediately reads the Iceberg view may see stale data — the Iceberg snapshot reflects the last successfully generated metadata, not the latest Delta commit.
  • Delta commit bundling. When Delta receives many small commits in quick succession (typical of streaming jobs), UniForm bundles multiple Delta commits into a single Iceberg snapshot. This avoids creating cascading metadata files but means Iceberg readers see commits at coarser granularity than Delta readers.
  • Failed generation doesn't fail the Delta commit. If Iceberg metadata generation errors out (e.g., HMS unavailable), the Delta write still succeeds. The next successful generation catches up by reflecting the current Delta state — but until then, Iceberg readers are stale.

For most analytical workloads this is fine; the Iceberg side is a read-optimized projection of the Delta source-of-truth, and a few seconds of lag is invisible to dashboards and ad-hoc queries. For tight read-after-write semantics across the format boundary, UniForm is not the right tool — keep both sides on Delta.


What UniForm Doesn't Do

The limitations matter as much as the capabilities. Before enabling UniForm in production, work through this list:

  • No writes from the Iceberg side. Iceberg clients can read; they cannot write. An Iceberg write would create files outside Delta's transaction log and almost certainly corrupt the table. Lock down catalog permissions accordingly.
  • No deletion vectors in v2. Iceberg compat v2 requires positional delete files, not deletion vectors. If your table benefits from deletion vectors (high-frequency point updates, MERGE-heavy workloads), you can't have both. Iceberg v3 supports deletion vectors, so a future UniForm Iceberg v3 mode should resolve this — but that's not shipped yet.
  • No VOID type. Tables with VOID columns can't enable UniForm. Cast or drop the column first.
  • No materialized views or streaming tables. UniForm is for regular Delta tables only.
  • No Change Data Feed via Iceberg. Delta's CDF works for Delta consumers; Iceberg readers don't see change data, only the current snapshot.
  • No Delta Sharing for Iceberg readers. If your data sharing strategy depends on Delta Sharing, Iceberg clients won't pick up the share.
  • HMS dependency. UniForm currently requires Hive Metastore as the configured catalog on the write side. Pure path-based access doesn't trigger metadata generation.

The deletion vectors restriction is the one that bites teams unexpectedly. If your Delta table was originally created with deletion vectors enabled (the default in newer versions), enabling UniForm requires either dropping that feature or a full table rewrite via REORG.


When UniForm Is the Right Answer

UniForm is a bridge, not a destination. It makes sense when:

  • Your writers are all Spark + Delta, and you can't migrate them. Maybe you have years of MERGE INTO logic, a stable Databricks setup, or downstream Delta consumers (Delta Live Tables, Delta Sharing) you don't want to break.
  • Your readers include engines that don't speak Delta well. Trino, Athena, Snowflake, and BigQuery all have stronger Iceberg integrations than Delta integrations. UniForm gets you those reads without rewriting your write path.
  • Read-only Iceberg access is sufficient. No engine on the Iceberg side needs to write back.
  • You can tolerate seconds-to-minutes of write-to-read lag on the Iceberg side.

UniForm is the wrong answer when:

  • You need writes from multiple format protocols — pick one format and standardize.
  • You need deletion vectors and broad cross-engine reads simultaneously — wait for Iceberg v3 UniForm support or use native Iceberg.
  • You're greenfield with no existing Delta investment — pick the format that matches your engine mix directly. If you need multi-engine reads, native Apache Iceberg on a REST catalog is simpler than Delta + UniForm.

Checklist

If you're about to enable UniForm on a production table:

  • [ ] On Delta 3.2.0+ (Spark 3.x) or Delta 4.0.1+ (Spark 4.0). Skip Delta 4.0.0 — Iceberg compat was dropped in that release
  • [ ] Hive Metastore configured as the Spark catalog
  • [ ] Writers access the table by name (saveAsTable("db.t")), not by path (.save("s3://..."))
  • [ ] Column mapping mode set to name before enabling UniForm
  • [ ] Deletion vectors dropped (or never enabled), or you've planned a REORG rewrite
  • [ ] No VOID columns, materialized views, or streaming tables
  • [ ] Iceberg readers point at the metadata via HMS or by latest metadata.json path
  • [ ] Monitoring in place to alert on Iceberg metadata generation failures — Delta commits succeed regardless, so silent staleness is the risk

UniForm doesn't change the fundamentals of either format. It's a pragmatic interop layer that buys you cross-engine reads without the cost of dual writes. For Spark Scala teams already invested in Delta who need to feed Iceberg-native readers, it's the lowest-friction option available today.

For the full reference, see the Delta UniForm documentation, the Databricks GA announcement, and the Delta Lake GitHub releases.

Article Details

Created: 2026-05-30

Last Updated: 2026-05-30 11:19:25 PM