Spark Real-Time Mode vs Apache Flink: Is Spark Finally a True Streaming Engine?

Spark 4.1's Real-Time Mode delivers single-digit millisecond latency for stateless queries and targets sub-300ms p99 on the Databricks runtime, putting Spark within striking distance of Flink for the first time. For most analytics streaming, CDC, and ML feature workloads, RTM closes the gap. For sub-10ms requirements, complex event processing, and true event-at-a-time semantics, Flink still wins — and probably will for a long time.

The Old Story Has Changed

For a decade, the rule for choosing a streaming engine in the JVM world was simple. If you wanted millisecond latency or event-at-a-time semantics, you used Apache Flink. If you wanted Spark's API, ecosystem, and unified batch/streaming model, you accepted Structured Streaming's microbatch floor — typically 100ms to several seconds — and built around it.

That rule held because of architecture, not implementation effort. Structured Streaming chops the input into discrete batches, processes each batch through sequential stages, and writes output. Even with Trigger.ProcessingTime("0 seconds"), you paid for batch boundaries, sequential stage execution, and disk-based shuffle between stages. Flink, designed as a streaming engine from day one, processes events as they arrive through a continuously running dataflow graph. Different model, different latency floor.

Spark 4.1 changes this. Real-Time Mode (RTM) ships in open-source Spark 4.1 with continuous data flow, concurrent stage scheduling, and an in-memory streaming shuffle that bypasses disk. The result: single-digit millisecond p99 latency for stateless queries on the open-source release, and sub-300ms p99 for a broad set of stateless and stateful queries on the Databricks runtime where the full feature set is GA. Databricks's own benchmark claims RTM beats Flink by up to 92% on the workloads they tested.

That's a real shift. It does not, however, mean Flink is obsolete. The honest read on RTM vs Flink in 2026 is more nuanced than the marketing on either side suggests.

What Vendor Benchmarks Don't Tell You

The "92% faster than Flink" claim deserves the same skepticism any vendor benchmark deserves. Databricks ran the comparison on workloads that played to RTM's strengths — feature encoding, broadcast joins, and group-by-count aggregations on representative ML feature pipelines. Those are exactly the workloads RTM was engineered for, and the result is genuine and reproducible on those shapes. It is not a general claim that Spark beats Flink at streaming.

Independent comparisons and the Flink community's own production deployments tell a more measured story. Flink still operates on a per-event basis and routinely achieves sub-millisecond per-record latency on tuned pipelines. RTM, even at its best, operates in the single-digit-to-low-tens of milliseconds. If your SLA is "p99 under 5ms" or "p99.9 under 20ms," Flink has more headroom. If your SLA is "p99 under 200ms" — which is most analytics streaming, most fraud detection, most personalization, most dashboarding — both engines comfortably meet it, and the choice comes down to other factors.

Where Flink Still Wins

Five categories of work are still genuinely better on Flink, and probably will be for years.

Sub-10ms latency requirements. Algorithmic trading, network telemetry, ad bidding at the inner-loop level, high-frequency monitoring. When your tail latency budget is in single-digit milliseconds, the architectural overhead of even the leanest microbatch-derived system shows up. RTM moves Spark closer, but Flink's true event-at-a-time pipeline still has a lower floor.

Complex event processing. FlinkCEP is a mature pattern-matching library with multi-event sequences, time-windowed patterns, NOT/FOLLOWED-BY/UNTIL operators, and the surrounding ecosystem. If you're detecting "user did A, then B within 10 seconds, then NOT C in the next 30 seconds," Flink has a real CEP API; Spark has stateful streaming primitives and a lot of glue code to write yourself.

True event-at-a-time semantics with exactly-once. RTM in open-source Spark 4.1 is at-least-once. The Databricks runtime extends RTM to more operators with exactly-once on supported sinks, but the per-record commit model that Flink offers across its entire pipeline is a different design point. For workloads where every duplicate is a real correctness problem and downstream deduplication isn't an option, Flink's checkpoint-aligned exactly-once model is the safer choice.

Stateful operations on stricter SLAs. Flink's RocksDB state backend, incremental checkpointing, and fine-grained state TTL have years of production hardening. Spark's transformWithState API in 4.1 is solid but newer, and the open-source RTM release doesn't support stateful operations yet — that lives in the Databricks runtime. If you're running heavy stateful pipelines on the open-source path, Flink remains more mature.

Native streaming-first ergonomics. Watermarks, event time, allowed lateness, side outputs, broadcast state, operator chaining — these are Flink's first-class API concepts. Spark exposes equivalents, but the API was built around DataFrames first and streaming was layered on. If your team thinks in events and operator graphs, Flink feels native. If they think in tables and queries, Spark feels native.

Where RTM Is Now Good Enough

The flip side is where the conversation has genuinely moved. For the workloads most Spark Scala teams actually run, RTM closes the latency gap to "doesn't matter anymore."

Analytics streaming with sub-second SLAs. Dashboards refreshing every few seconds, near-real-time metrics, hourly windowed aggregations updated continuously. The microbatch floor of "a few hundred ms with tuning" was always good enough here; RTM just makes the latency a non-conversation.

CDC pipelines. Reading from Debezium-style change streams, applying light transformations, writing to a destination warehouse, lake table, or Kafka topic. These pipelines are the textbook RTM use case: stateless or lightly stateful, throughput-sensitive, latency-tolerant in the 50–500ms range. RTM is a single-line trigger change away.

ML feature serving. Computing features over event streams for online inference. The Databricks benchmark targeted this specifically and the numbers are credible — feature encoding and broadcast joins are exactly what Trigger.RealTime() is good at, and the at-least-once semantics are fine because feature writes are idempotent.

Real-time personalization and recommendations. Update user state, score against a model, emit a recommendation. Sub-100ms is the bar, RTM clears it, and duplicate emissions are harmless because the downstream system serves the most recent value anyway.

Kafka-to-Kafka ETL. Parse, enrich, filter, route. The bread-and-butter streaming workload. Stateless, throughput-bound, and one of the supported operator shapes in open-source RTM.

For every one of these, the previous answer was either "use microbatch and accept the latency" or "stand up a Flink cluster alongside your Spark stack." RTM removes the second option as a default.

The Scala API Question

Both Flink and Spark have JVM APIs, so on paper the language story is a wash. In practice it is not.

Spark's Scala API is the production default for the project — written in Scala, used by most of the codebase, with first-class Dataset support and the typed encoders that make refactoring large pipelines tractable. If you're already on Scala for the type system, Spark gives you a native experience.

Flink's Scala API has had a harder road. The DataStream Scala API was deprecated in Flink 1.18 and removed in 1.19, leaving Java as the only first-party JVM API. The community-maintained flink-extended/flink-scala-api provides a Scala 2.13 / Scala 3 wrapper that many teams use, but it sits behind upstream Flink with a lag and is not part of Flink's release cadence. For Scala-first teams, this is a real strike against Flink in 2026 that would not have been on the list five years ago.

The honest framing: if your team writes Scala and wants Scala to be first-class in the engine they pick, Spark is now the better-supported choice. If you're comfortable on Java for streaming code and want the most mature streaming engine, Flink is still excellent. If your team is mixed and one half writes Python and the other writes Scala, Spark's unified PySpark + Scala API story is hard to beat.

A Decision Framework

Strip away the latency arms race and the choice comes down to a small number of practical questions.

Pick Flink if:

Your tail latency SLA is single-digit milliseconds and you can prove it matters
You need real complex event processing — FlinkCEP-style pattern matching, not group-by aggregations
You need exactly-once semantics end-to-end on operations RTM doesn't yet support in open source
You're already deeply invested in Flink with a team that knows it, and you don't have a Spark batch story that would benefit from unification
Your workload is streaming-first and your batch needs are minimal or handled elsewhere

Pick Spark RTM if:

Your team already runs Spark for batch and you want one engine for both
Your latency target is 50–500ms — which is most production streaming
Your operations are stateless or lightly stateful (Kafka-to-Kafka, CDC, feature pipelines)
You write Scala and want first-class language support in the engine
You depend on Spark's broader ecosystem — Iceberg, Delta, the cloud warehouse connectors, MLflow, the catalog story
You're on Databricks and the full RTM feature set (stateful operators, exactly-once on supported sinks) is available to you

Use both if you have a clear bifurcation: an ultra-low-latency Flink pipeline for a small, high-stakes workload and Spark for everything else. Plenty of organizations run this way. The cost is two engines to operate, two skill sets to maintain, and two ways for things to break. That's worth it if the Flink workload genuinely needs Flink. It is not worth it for a "we might want microsecond latency someday" hypothetical.

What Changed and What Didn't

The big change in 2025–2026 is that Spark's streaming latency story is no longer a categorical disadvantage. For most workloads where a team would have evaluated Flink purely because Spark "wasn't fast enough," RTM removes that reason. That is genuinely new.

What didn't change: Flink is still the more mature streaming-first engine, with a deeper bench of streaming-specific abstractions (CEP, side outputs, fine-grained state, operator chaining) and a tighter event-at-a-time latency floor. Spark is still the better unified batch + streaming + ML engine, with a richer ecosystem and a Scala-first API for teams who care about that.

The right question in 2026 is no longer "which engine is faster?" It's "what does my team actually need, and which engine matches it?" For most Spark Scala shops, that answer is Spark with RTM where you need low latency and microbatch where you don't. For a small set of teams with genuinely extreme requirements, Flink is still the right tool. The era of choosing Flink-by-default for any streaming workload, though, is over.