The New Apache Spark Kubernetes Operator: Getting Started

The official Apache Spark Kubernetes Operator launched as an ASF subproject in May 2025, built from scratch instead of forking the aging Kubeflow operator. A year of rapid releases later, it's at 0.9.0 and is the path the Spark community is steering toward for running Scala jobs on Kubernetes.

Why a New Operator at All

If you ran Spark on Kubernetes before 2025, you probably used one of two patterns: raw spark-submit with the built-in Kubernetes resource manager, or the Kubeflow spark-operator (originally Google's GoogleCloudPlatform/spark-on-k8s-operator). The Kubeflow operator was the de facto standard for years, but momentum stalled. Hundreds of open issues piled up, releases got infrequent, and the project never made it to a stable 1.0. It worked, but it was a community project built for Spark 2.3-era assumptions, maintained outside the Spark project itself.

In May 2025, the Apache Spark community made the call to build a new operator from scratch, under ASF governance, as an official Spark subproject. Not a fork of the Kubeflow operator — a clean rebuild aimed at Spark 3.5+ with modern Kubernetes features in mind from day one. The 0.1.0 release shipped on May 8, 2025, and as of 0.9.0 (May 14, 2026) it's had nine releases in twelve months. That release cadence is the part to pay attention to — this is where the official ecosystem is putting its weight.

For Scala teams moving off YARN, this matters. The official operator is what Spark's own committers are building against, which means the long-term integration story (Spark Connect, declarative pipelines, native execution) lands here first.

What the Operator Gives You

The operator manages two custom resources:

SparkApplication — a single Spark job. Submit it, the operator launches a driver pod, the driver spawns executor pods, the job runs to completion, pods clean up. This is the equivalent of running spark-submit once.
SparkCluster — a long-running Spark cluster (standalone-mode style) with configurable workers. Useful for Spark Connect servers, notebook backends, or any persistent service-style deployment.

Both are Kubernetes-native: you kubectl apply -f a YAML manifest and the operator reconciles to the desired state. No more wrapping spark-submit in a CI script and parsing exit codes.

Prerequisites

Before installing, check that you have:

Apache Spark 3.5 or newer (4.0 and 4.1 are fully supported)
Kubernetes 1.34 or newer
Helm 3.0 or newer
A container registry the cluster can pull from (for your Spark application image)

The Kubernetes version requirement is worth flagging — 1.34 is a recent release, so older managed clusters may need an upgrade before this works.

Installing the Operator

Installation is a two-command Helm install:

helm repo add spark https://apache.github.io/spark-kubernetes-operator
helm repo update

helm install spark spark/spark-kubernetes-operator   --namespace spark-operator   --create-namespace

That deploys the operator controller pod and registers the SparkApplication and SparkCluster CRDs. Verify it's healthy:

kubectl get pods -n spark-operator
kubectl get crds | grep spark.apache.org

Submitting Your First Scala Application

The standard "does this work" smoke test is the Spark Pi example. Here's a minimal SparkApplication manifest that runs the example JAR shipped with the Spark image:

apiVersion: spark.apache.org/v1
kind: SparkApplication
metadata:
  name: pi
  namespace: default
spec:
  mainClass: org.apache.spark.examples.SparkPi
  jars: "local:///opt/spark/examples/jars/spark-examples.jar"
  sparkConf:
    spark.kubernetes.container.image: apache/spark:4.0.0
    spark.executor.instances: "2"
    spark.executor.cores: "1"
    spark.executor.memory: "512m"
    spark.driver.cores: "1"
    spark.driver.memory: "512m"

Apply it and watch the pods come up:

kubectl apply -f pi.yaml
kubectl get sparkapplications
kubectl get pods -w

You'll see a driver pod start first, then two executor pods. When the job finishes, the executors terminate and the driver pod remains in Completed state with logs available via kubectl logs.

Submitting Your Own Scala Job

For a real Scala application, you build a fat JAR with sbt-assembly, package it into a container image that extends apache/spark, and reference your main class. The image build is the standard Spark-on-K8s pattern — nothing operator-specific:

# Dockerfile
FROM apache/spark:4.0.0
COPY target/scala-2.13/my-spark-job-assembly.jar /opt/my-job.jar

Then the SparkApplication points at your image and main class:

apiVersion: spark.apache.org/v1
kind: SparkApplication
metadata:
  name: orders-etl
  namespace: default
spec:
  mainClass: com.example.OrdersETL
  jars: "local:///opt/my-job.jar"
  sparkConf:
    spark.kubernetes.container.image: my-registry.example.com/orders-etl:1.0.0
    spark.executor.instances: "10"
    spark.executor.cores: "4"
    spark.executor.memory: "8g"
    spark.driver.memory: "4g"
    spark.sql.shuffle.partitions: "200"

The Scala build doesn't change. Your existing sbt-assembly setup produces the same JAR; the operator just wraps the spark-submit invocation that places it on the cluster.

Long-Running Clusters and Spark Connect

SparkCluster is the resource you want for long-running services — most notably a Spark Connect server that thin Scala clients connect to over gRPC. Instead of one cluster per job, you stand up a cluster once and reuse it:

apiVersion: spark.apache.org/v1
kind: SparkCluster
metadata:
  name: connect-server
spec:
  workers: 3
  sparkConf:
    spark.kubernetes.container.image: apache/spark:4.0.0
    spark.connect.server.bindAddress: "0.0.0.0"
    spark.connect.server.port: "15002"

This is one of the cleaner deployment patterns the new operator enables: a long-running Connect server on Kubernetes, with thin Scala clients in your applications talking to it. The operator handles worker scaling, pod restarts, and lifecycle.

Apache YuniKorn for Gang Scheduling

The default Kubernetes scheduler treats Spark driver and executor pods as independent units. That's fine for small jobs but breaks down with larger ones — partial scheduling leaves you with a driver waiting on executors that never get resources, while other jobs that could have started don't because the partial allocation is holding nodes.

The operator integrates with Apache YuniKorn, a Kubernetes scheduler designed for batch workloads. YuniKorn does gang scheduling: a Spark application either gets all its pods scheduled or none of them. It also supports hierarchical queues, which is what you actually want for multi-tenant Spark clusters where teams have separate resource budgets.

The repo ships an example showing the integration:

apiVersion: spark.apache.org/v1
kind: SparkApplication
metadata:
  name: pi-on-yunikorn
  annotations:
    yunikorn.apache.org/app-id: "spark-pi-app"
    yunikorn.apache.org/queue: "root.spark"
spec:
  mainClass: org.apache.spark.examples.SparkPi
  jars: "local:///opt/spark/examples/jars/spark-examples.jar"
  sparkConf:
    spark.kubernetes.container.image: apache/spark:4.0.0
    spark.executor.instances: "4"

If you're running Spark on Kubernetes at any meaningful multi-tenant scale, plan on YuniKorn (or an equivalent batch-aware scheduler). The default scheduler is fine for a few jobs at a time; it falls over once you have real contention.

Day-to-Day Operations

The day-to-day surface is just kubectl:

# List all Spark applications
kubectl get sparkapplications

# Detailed view including state and pod counts
kubectl describe sparkapplication orders-etl

# Tail driver logs
kubectl logs -f sparkapplication-orders-etl-driver

# Delete a running job (terminates all pods)
kubectl delete sparkapplication orders-etl

That's the whole interface for routine operations. If you have existing Kubernetes observability — Prometheus, Grafana, Loki — Spark pods are just regular pods and show up in your existing dashboards.

Moving From YARN

If your team is on YARN today, the mental model shift is the bigger move than the operator itself. A few things to keep in mind:

Queues become namespaces or YuniKorn queues. YARN's queue hierarchy doesn't map directly. For simple cases, Kubernetes namespaces with ResourceQuota are enough. For real multi-tenancy, use YuniKorn queues.
Resource requests are explicit. YARN's elasticity is replaced by Kubernetes' more rigid request/limit model. You'll spend time tuning spark.executor.cores, spark.executor.memory, and the corresponding pod requests.
Locality goes away. HDFS data locality was a YARN strength. On Kubernetes you typically read from object storage (S3, GCS, ADLS), so plan on cloud storage performance and IO tuning rather than node-local reads.
Logs live in containers. No more YARN log aggregation — set up Loki, CloudWatch, or your standard Kubernetes log collector before you start migrating production jobs.

The Spark code itself doesn't change. Your DataFrame transformations, your sbt-assembly fat JAR, your test suite — none of that cares whether Spark runs on YARN or Kubernetes. The change is entirely operational.

Should You Use This Operator Today?

Yes, if you're starting fresh on Kubernetes. The 0.x version number is honest but the foundation is solid, the release cadence is fast, and this is where the official Spark community is investing. Building on the operator now means you'll get Spark Connect, native execution, and other forthcoming features without migrating off a parallel ecosystem operator.

Probably, if you're on the Kubeflow operator already. Migration isn't trivial — the CRD APIs are different — but the Kubeflow operator's slowing development is a real risk. Plan a migration on a 6-12 month horizon rather than treating it as urgent.

Wait, if you're on YARN and it's working. There's no rush. The new operator makes Kubernetes a more credible target than it was, but a working YARN cluster is still a working YARN cluster. Migrate when the broader cloud-native story (autoscaling, multi-tenant resource sharing, cost) is what you actually want, not because the tooling changed.

Quick Checklist

To get a basic working setup:

Confirm Kubernetes 1.34+, Helm 3.0+, and Spark 3.5+ images available
helm install the operator into a dedicated namespace
Submit the Pi example to verify the install
Build a container image with your fat JAR on top of apache/spark:4.0.0
Write a SparkApplication manifest and kubectl apply
For multi-tenant clusters, plan YuniKorn alongside the operator
For Spark Connect deployments, use SparkCluster for the long-running server

For the broader decision between deployment targets, see Spark on Kubernetes vs YARN in 2026. For the thin-client pattern this operator enables, see the Spark Connect article.