Job Board
Consulting

The State of Native Execution for Spark: Photon, Gluten, and Fabric

There are three credible paths to native C++ execution for Spark in 2026: Databricks Photon (proprietary, mature, locked in), Apache Gluten with Velox (open source, newly graduated to ASF Top-Level Project), and Microsoft Fabric's Native Execution Engine (Gluten-based, managed). The choice maps cleanly onto your deployment model — here's how Scala teams should think about it.

The Thesis

Native execution for Spark is no longer experimental. Photon has been GA on Databricks since 2022 and is now the default on most Databricks SKUs. Apache Gluten graduated to ASF Top-Level Project in March 2026. Microsoft Fabric's Native Execution Engine went GA in Fabric Runtime 1.3. AWS EMR ships a proprietary native engine for some workloads. Apache DataFusion Comet, the Rust-and-Arrow alternative, is gaining traction for Parquet-heavy scan workloads.

The benchmarks have stopped being controversial. Vectorized C++ (or Rust) execution on columnar batches is 2-5x faster than JVM whole-stage codegen on analytical workloads. The JVM is great at coordination and terrible at tight inner loops over numeric data; that's not a controversial claim anymore.

What is still in flux is which native execution path you should bet on as a Scala team. The answer depends almost entirely on where you run Spark, not on the technical merits of the engines themselves. The three engines are converging on similar performance numbers and similar operator coverage — the meaningful differences are in governance, portability, and what you give up to use each one.

Photon: The Mature One, With Strings Attached

Photon is Databricks' proprietary vectorized C++ execution engine. The 2022 SIGMOD paper explains the design: Catalyst still plans the query, but at execution time, supported operators run in a C++ runtime over Apache Arrow columnar batches instead of the JVM whole-stage codegen path. Operators that aren't supported fall back to vanilla Spark, with conversions at the boundary.

The numbers Databricks publishes are real. Most analytical SQL workloads see 2-5x wall-clock speedups; aggregation- and scan-heavy queries can hit 10x. Photon doesn't require code changes — turn it on at the cluster level and your existing Scala JARs get the speedup wherever operators are supported.

What it costs you, beyond money:

  • Lock-in. Photon only runs on Databricks. There is no "self-hosted Photon" option, no plan to open-source it, and no portable artifact you can take with you. Adopting Photon is adopting Databricks.
  • The DBU multiplier. Photon-enabled compute is billed at roughly 2x the DBU rate of the same instance type without Photon. The math only works if your average speedup exceeds 2x on the queries you're actually running. For balanced workloads it usually does; for UDF-heavy or streaming workloads it often doesn't.
  • Opacity. When Photon doesn't accelerate a query, the diagnosis path is "read the Spark UI and look for non-Photon operators." There's no source code to inspect, no upstream issue tracker, and the supported-operator matrix changes between DBR versions without a public changelog you can easily diff.

The practical framing: Photon is the most polished native engine for Spark because it is proprietary and tightly coupled to one runtime. If you've already made the Databricks decision, Photon is a no-brainer for the right workloads. If you haven't, Photon is not a reason on its own to make it — the alternatives have closed enough of the gap that the broader Databricks vs open-source trade-off is what should drive the call.

Gluten + Velox: The Open Source One

Apache Gluten is what happens when you ask "can we have Photon, but open source, and running on stock Apache Spark?" The answer turns out to be yes, with caveats. Gluten rewrites Spark's physical plan into Substrait, ships it over JNI to a native backend (Velox by default, ClickHouse for some users), and runs the columnar work in C++ while the JVM keeps coordinating.

The full architecture is covered in the Apache Gluten deep dive. For this piece, the things that matter at the landscape level:

  • It runs on stock Apache Spark. No fork, no custom runtime, no vendor binding. Drop a JAR onto your EMR or Kubernetes cluster, set a handful of spark.* configs, done.
  • Benchmarks land in the 3-4x range on TPC-H/TPC-DS, with peak speedups around 23x on individual queries. That's competitive with Photon on the same workload shapes, though Photon retains an edge on some heavily-optimized Databricks scenarios.
  • ASF governance. Graduating to Top-Level Project in March 2026 matters because it signals the project has the contributor base and governance maturity to be a durable bet. Intel, Kyligence, Alibaba Cloud, Meituan, Microsoft, IBM, and Google are all active contributors.
  • Pluggable backends. Velox is the default and the most mature. ClickHouse is the alternative if your team already has ClickHouse operational expertise. Bolt, ByteDance's LLVM JIT engine, is on the 2026 roadmap as a third option.

Where Gluten loses to Photon today:

  • Operator coverage is still catching up on some edge cases — exotic window-frame specifications, certain ANSI-mode behaviors, and some complex-type writes still fall back. Each release narrows the gap.
  • Streaming. Structured Streaming is not supported, including the new Real-Time Mode. Photon supports streaming workloads.
  • The setup is more involved. Photon is a checkbox. Gluten is dropping a fat JAR, configuring the columnar shuffle manager, sizing off-heap memory, and watching the Spark UI to confirm operators are running natively. Not hard, but more deliberate.

For self-managed Spark on Kubernetes or EMR, Gluten is the obvious native execution choice in 2026. The setup cost is small, the rollback is trivial (remove the JAR), and the downside on workloads it doesn't fit is usually no-op rather than regression.

Microsoft Fabric: Gluten, but Managed

Microsoft Fabric's Native Execution Engine is the most interesting datapoint of the last 18 months. It went GA in Fabric Runtime 1.3, and the technical foundation is Apache Gluten + Velox — the same open-source stack that runs on stock Spark.

Microsoft's published benchmarks claim ~4x speedups on 1TB TPC-DS and up to 6x in end-to-end customer trials. Those numbers are in the same ballpark as Photon's published numbers, which is the point — when both engines run vectorized C++ over Arrow batches, they end up at roughly the same place.

What Fabric adds on top of stock Gluten:

  • Operational management. Fabric runs the engine for you. No JAR placement, no shuffle manager configuration, no off-heap tuning. Spark capacity is the unit of consumption.
  • Spark 3.5 compatibility with both Parquet and Delta. The engine activates transparently — existing Spark queries get faster without code changes.
  • Tighter Power BI / Fabric integration. If you're already living in the Microsoft data stack, the integration story is real.

What's striking is the strategic implication: Microsoft chose to back the open-source native execution stack rather than build a proprietary engine. The Gluten contribution graph reflects this — Microsoft is one of the largest active contributors to Gluten itself. The bet is that managed orchestration is the differentiator, not the engine internals. That bet, if it pays off, also means improvements Microsoft makes for Fabric flow back upstream to anyone running stock Spark with Gluten.

The Fabric trade-off is the same as any managed-cloud trade-off: you pay a margin in exchange for not running it yourself, and you accept cloud and ecosystem lock-in (to Azure, OneLake, and the broader Fabric stack) in exchange for the integration.

Where the Three Paths Sit Side by Side

The honest table for Scala teams in 2026:

Factor Photon (Databricks) Gluten + Velox (OSS) Fabric NEE (Microsoft)
License Proprietary Apache 2.0 Proprietary wrapper on Apache 2.0
Where it runs Databricks only Stock Apache Spark Microsoft Fabric only
Spark compatibility DBR (forked Spark) Spark 3.4, 3.5, 4.0 Spark 3.5
Streaming Yes No No (yet)
Code changes required None None (config only) None
Operational overhead None (managed) Real (you run it) None (managed)
Vendor lock-in High None High (Azure/Fabric)
Cost model 2x DBU multiplier Included with Spark Included in Fabric capacity
Diagnose fallbacks Spark UI Spark UI + open source code Spark UI

The cells everyone fixates on — "speedup on TPC-DS" — are the least interesting cells in this table. The three engines are at roughly the same place on raw throughput. The columns that actually drive the call for Scala teams are where it runs and what lock-in it implies.

Decision Framework for Scala Teams

The native execution choice is largely downstream of the deployment choice, so it tends to resolve quickly once you know your deployment model.

If you run on Databricks: Use Photon, with eyes open about the DBU multiplier. Measure per-workload — Photon's economics are workload-dependent. Some pipelines come out cost-neutral or cost-positive at 2x DBU, others come out cost-negative. The Spark UI's Photon section makes the per-stage breakdown clear; use it.

If you run on Microsoft Fabric: Use the Native Execution Engine. It's GA, it's free (within capacity), there's no code change, and there's no realistic reason to leave it off. The interesting thing about Fabric is that you're effectively running open-source Gluten with managed ops — the engineering investment compounds with the upstream project.

If you run on EMR, Dataproc, or self-managed Kubernetes: Pilot Gluten with the Velox backend on a representative analytical workload. The setup cost is one engineer for a day, the speedup is real on Parquet-heavy aggregation and join workloads, and there is no vendor coupling. If the workload pattern doesn't fit (UDF-heavy, streaming-heavy, tiny data, sub-30-second jobs), you'll know quickly from the Spark UI fallback rate.

If you have a Parquet-scan-dominated workload and don't want the full Gluten footprint: Consider Apache DataFusion Comet. It's a Rust-and-Arrow plugin that targets a narrower slice of the plan than Gluten — primarily scans and filters — and is lighter-weight to enable. Comet is earlier than Gluten but worth watching if your queries are scan-bound rather than join-bound.

The one path that's hard to defend in 2026 is "vanilla Spark with no native execution plugin, on a workload that would benefit from one." The setup cost for Gluten is small enough, the speedup on analytical workloads is large enough, and the rollback is trivial enough that not at least trying it is leaving meaningful performance on the table.

The Forward-Looking Take

The most important thing in this landscape is not which engine wins. It's that Microsoft's commitment to Gluten — backing the open-source stack instead of building a proprietary alternative — significantly raises the floor for everyone running non-Databricks Spark.

Five years ago, native execution was a Databricks moat. Three years ago, it was a Databricks moat with a credible open-source response (early Gluten). Today, it's a commodity that runs in three places, with two of those places sharing a code base. That convergence will continue. Expect Gluten and the Fabric NEE to close most of the remaining gap to Photon over the next 18 months, particularly on streaming and ANSI-mode coverage.

For Scala teams choosing a deployment model in 2026, native execution is no longer the tiebreaker it was. The decision shifts back to the things that have always mattered: operational maturity, cost predictability, ecosystem coupling, and how much platform work your team has the appetite to do. The Spark on Kubernetes vs YARN trade-off and the Databricks vs open-source call are still the primary decisions. Native execution is now a feature you turn on once those decisions are made.

For further reading, see the Apache Gluten project site, the Photon SIGMOD paper, and the Microsoft Fabric NEE announcement.

Article Details

Created: 2026-06-12

Last Updated: 2026-06-12 10:30:55 PM