The New Apache Spark Kubernetes Operator: Getting Started
The official Apache Spark Kubernetes Operator launched as an ASF subproject in May 2025, built from scratch instead of forking the aging Kubeflow operator. A year of rapid releases later, it's at 0.9.0 and is the path the Spark community is steering toward for running Scala jobs on Kubernetes.
Why a New Operator at All
If you ran Spark on Kubernetes before 2025, you probably used one of two patterns: raw spark-submit with the built-in Kubernetes resource manager, or the Kubeflow spark-operator (originally Google's GoogleCloudPlatform/spark-on-k8s-operator). The Kubeflow operator was the de facto standard for years, but momentum stalled. Hundreds of open issues piled up, releases got infrequent, and the project never made it to a stable 1.0. It worked, but it was a community project built for Spark 2.3-era assumptions, maintained outside the Spark project itself.
In May 2025, the Apache Spark community made the call to build a new operator from scratch, under ASF governance, as an official Spark subproject. Not a fork of the Kubeflow operator — a clean rebuild aimed at Spark 3.5+ with modern Kubernetes features in mind from day one. The 0.1.0 release shipped on May 8, 2025, and as of 0.9.0 (May 14, 2026) it's had nine releases in twelve months. That release cadence is the part to pay attention to — this is where the official ecosystem is putting its weight.
For Scala teams moving off YARN, this matters. The official operator is what Spark's own committers are building against, which means the long-term integration story (Spark Connect, declarative pipelines, native execution) lands here first.
What the Operator Gives You
The operator manages two custom resources:
SparkApplication— a single Spark job. Submit it, the operator launches a driver pod, the driver spawns executor pods, the job runs to completion, pods clean up. This is the equivalent of runningspark-submitonce.SparkCluster— a long-running Spark cluster (standalone-mode style) with configurable workers. Useful for Spark Connect servers, notebook backends, or any persistent service-style deployment.
Both are Kubernetes-native: you kubectl apply -f a YAML manifest and the operator reconciles to the desired state. No more wrapping spark-submit in a CI script and parsing exit codes.
Prerequisites
Before installing, check that you have:
- Apache Spark 3.5 or newer (4.0 and 4.1 are fully supported)
- Kubernetes 1.34 or newer
- Helm 3.0 or newer
- A container registry the cluster can pull from (for your Spark application image)
The Kubernetes version requirement is worth flagging — 1.34 is a recent release, so older managed clusters may need an upgrade before this works.
Installing the Operator
Installation is a two-command Helm install:
helm repo add spark https://apache.github.io/spark-kubernetes-operator
helm repo update
helm install spark spark/spark-kubernetes-operator --namespace spark-operator --create-namespace
That deploys the operator controller pod and registers the SparkApplication and SparkCluster CRDs. Verify it's healthy:
kubectl get pods -n spark-operator
kubectl get crds | grep spark.apache.org
Submitting Your First Scala Application
The standard "does this work" smoke test is the Spark Pi example. Here's a minimal SparkApplication manifest that runs the example JAR shipped with the Spark image:
apiVersion: spark.apache.org/v1
kind: SparkApplication
metadata:
name: pi
namespace: default
spec:
mainClass: org.apache.spark.examples.SparkPi
jars: "local:///opt/spark/examples/jars/spark-examples.jar"
sparkConf:
spark.kubernetes.container.image: apache/spark:4.0.0
spark.executor.instances: "2"
spark.executor.cores: "1"
spark.executor.memory: "512m"
spark.driver.cores: "1"
spark.driver.memory: "512m"
Apply it and watch the pods come up:
kubectl apply -f pi.yaml
kubectl get sparkapplications
kubectl get pods -w
You'll see a driver pod start first, then two executor pods. When the job finishes, the executors terminate and the driver pod remains in Completed state with logs available via kubectl logs.
Submitting Your Own Scala Job
For a real Scala application, you build a fat JAR with sbt-assembly, package it into a container image that extends apache/spark, and reference your main class. The image build is the standard Spark-on-K8s pattern — nothing operator-specific:
# Dockerfile
FROM apache/spark:4.0.0
COPY target/scala-2.13/my-spark-job-assembly.jar /opt/my-job.jar
Then the SparkApplication points at your image and main class:
apiVersion: spark.apache.org/v1
kind: SparkApplication
metadata:
name: orders-etl
namespace: default
spec:
mainClass: com.example.OrdersETL
jars: "local:///opt/my-job.jar"
sparkConf:
spark.kubernetes.container.image: my-registry.example.com/orders-etl:1.0.0
spark.executor.instances: "10"
spark.executor.cores: "4"
spark.executor.memory: "8g"
spark.driver.memory: "4g"
spark.sql.shuffle.partitions: "200"
The Scala build doesn't change. Your existing sbt-assembly setup produces the same JAR; the operator just wraps the spark-submit invocation that places it on the cluster.
Long-Running Clusters and Spark Connect
SparkCluster is the resource you want for long-running services — most notably a Spark Connect server that thin Scala clients connect to over gRPC. Instead of one cluster per job, you stand up a cluster once and reuse it:
apiVersion: spark.apache.org/v1
kind: SparkCluster
metadata:
name: connect-server
spec:
workers: 3
sparkConf:
spark.kubernetes.container.image: apache/spark:4.0.0
spark.connect.server.bindAddress: "0.0.0.0"
spark.connect.server.port: "15002"
This is one of the cleaner deployment patterns the new operator enables: a long-running Connect server on Kubernetes, with thin Scala clients in your applications talking to it. The operator handles worker scaling, pod restarts, and lifecycle.
Apache YuniKorn for Gang Scheduling
The default Kubernetes scheduler treats Spark driver and executor pods as independent units. That's fine for small jobs but breaks down with larger ones — partial scheduling leaves you with a driver waiting on executors that never get resources, while other jobs that could have started don't because the partial allocation is holding nodes.
The operator integrates with Apache YuniKorn, a Kubernetes scheduler designed for batch workloads. YuniKorn does gang scheduling: a Spark application either gets all its pods scheduled or none of them. It also supports hierarchical queues, which is what you actually want for multi-tenant Spark clusters where teams have separate resource budgets.
The repo ships an example showing the integration:
apiVersion: spark.apache.org/v1
kind: SparkApplication
metadata:
name: pi-on-yunikorn
annotations:
yunikorn.apache.org/app-id: "spark-pi-app"
yunikorn.apache.org/queue: "root.spark"
spec:
mainClass: org.apache.spark.examples.SparkPi
jars: "local:///opt/spark/examples/jars/spark-examples.jar"
sparkConf:
spark.kubernetes.container.image: apache/spark:4.0.0
spark.executor.instances: "4"
If you're running Spark on Kubernetes at any meaningful multi-tenant scale, plan on YuniKorn (or an equivalent batch-aware scheduler). The default scheduler is fine for a few jobs at a time; it falls over once you have real contention.
Day-to-Day Operations
The day-to-day surface is just kubectl:
# List all Spark applications
kubectl get sparkapplications
# Detailed view including state and pod counts
kubectl describe sparkapplication orders-etl
# Tail driver logs
kubectl logs -f sparkapplication-orders-etl-driver
# Delete a running job (terminates all pods)
kubectl delete sparkapplication orders-etl
That's the whole interface for routine operations. If you have existing Kubernetes observability — Prometheus, Grafana, Loki — Spark pods are just regular pods and show up in your existing dashboards.
Moving From YARN
If your team is on YARN today, the mental model shift is the bigger move than the operator itself. A few things to keep in mind:
- Queues become namespaces or YuniKorn queues. YARN's queue hierarchy doesn't map directly. For simple cases, Kubernetes namespaces with
ResourceQuotaare enough. For real multi-tenancy, use YuniKorn queues. - Resource requests are explicit. YARN's elasticity is replaced by Kubernetes' more rigid request/limit model. You'll spend time tuning
spark.executor.cores,spark.executor.memory, and the corresponding pod requests. - Locality goes away. HDFS data locality was a YARN strength. On Kubernetes you typically read from object storage (S3, GCS, ADLS), so plan on cloud storage performance and IO tuning rather than node-local reads.
- Logs live in containers. No more YARN log aggregation — set up Loki, CloudWatch, or your standard Kubernetes log collector before you start migrating production jobs.
The Spark code itself doesn't change. Your DataFrame transformations, your sbt-assembly fat JAR, your test suite — none of that cares whether Spark runs on YARN or Kubernetes. The change is entirely operational.
Should You Use This Operator Today?
Yes, if you're starting fresh on Kubernetes. The 0.x version number is honest but the foundation is solid, the release cadence is fast, and this is where the official Spark community is investing. Building on the operator now means you'll get Spark Connect, native execution, and other forthcoming features without migrating off a parallel ecosystem operator.
Probably, if you're on the Kubeflow operator already. Migration isn't trivial — the CRD APIs are different — but the Kubeflow operator's slowing development is a real risk. Plan a migration on a 6-12 month horizon rather than treating it as urgent.
Wait, if you're on YARN and it's working. There's no rush. The new operator makes Kubernetes a more credible target than it was, but a working YARN cluster is still a working YARN cluster. Migrate when the broader cloud-native story (autoscaling, multi-tenant resource sharing, cost) is what you actually want, not because the tooling changed.
Quick Checklist
To get a basic working setup:
- Confirm Kubernetes 1.34+, Helm 3.0+, and Spark 3.5+ images available
helm installthe operator into a dedicated namespace- Submit the Pi example to verify the install
- Build a container image with your fat JAR on top of
apache/spark:4.0.0 - Write a
SparkApplicationmanifest andkubectl apply - For multi-tenant clusters, plan YuniKorn alongside the operator
- For Spark Connect deployments, use
SparkClusterfor the long-running server
For broader context on Kubernetes vs YARN as a deployment target, see the Spark 4.0 overview and the Spark Connect article for the thin-client pattern this operator enables.