Publishing Spark Scala Libraries from GitHub Actions to a Private Maven Repository
Once your team is sharing internal Spark libraries through a private Nexus, Artifactory, or CodeArtifact, you need a reliable way to publish new versions. Doing it from developer laptops causes version drift and credential sprawl. GitHub Actions can give you tagged, reproducible publishes — but wiring publishTo, credentials, and versioning into a workflow has a few sharp edges.
Why Publish from CI
The first version of an internal library usually ships from a developer's laptop: someone clones the repo, runs sbt publish, and the artifact appears in Nexus. That works exactly once. After that, several problems compound:
- Version drift. Two developers publish slightly different builds of
1.4.0because their working trees diverged. Whichever overwrites wins; the team consuming the previous one finds its build subtly broken. - Untraceable artifacts. A jar in the registry has no link back to a specific commit. When something goes wrong, "which version of the source produced this artifact" becomes guesswork.
- Credential sprawl. Every developer who can publish needs production-write credentials on their laptop. Onboarding and offboarding turn into a manual checklist that nobody updates.
- Forgotten publishes. A merge happens, nobody publishes, downstream projects keep consuming the old version, and a week later somebody asks why their bug fix isn't in production.
CI-driven publishing replaces all of that with a single rule: every published artifact corresponds to a specific git ref, built by a workflow that anyone on the team can audit. The rest of this article walks through the sbt and GitHub Actions configuration to make that work for a Spark Scala library.
Step 1: Configure publishTo in build.sbt
publishTo tells sbt where to send artifacts when you run sbt publish. For an internal repository, point it at the same Nexus or Artifactory host you resolve dependencies from, but at the deploy path:
// build.sbt
publishTo := {
val nexus = "https://nexus.acme.internal/repository/"
if (isSnapshot.value)
Some("Acme Snapshots" at nexus + "maven-snapshots/")
else
Some("Acme Releases" at nexus + "maven-releases/")
}
Two things this does:
- Routes by version type.
isSnapshot.valueistruewhen the project version ends in-SNAPSHOT. Most internal repos have separate paths for snapshots (mutable, overwritable) and releases (immutable, write-once). Sending a release version to the snapshots path or vice versa typically fails on the server side, so let sbt pick the right one based on the version itself. - Names the resolver. The string before
at("Acme Snapshots", "Acme Releases") is what sbt uses for log output and credential matching.
Also disable a few defaults that are aimed at Maven Central rather than internal repos:
// build.sbt
publishMavenStyle := true
Test / publishArtifact := false
pomIncludeRepository := { _ => false }
publishMavenStyle := true ensures sbt writes a Maven-layout POM — the format every internal repo expects. Test / publishArtifact := false skips publishing the test jar; most internal libraries don't expose tests. pomIncludeRepository := { _ => false } keeps your internal resolver URLs out of the published POM so consumers don't end up with stale references to your hostnames.
Step 2: Wire Credentials from Environment Variables
The credentials shape mirrors what you'd do for resolving private dependencies, but there's no ~/.sbt/.credentials file in CI. Build the credentials object from environment variables instead:
// build.sbt
credentials += Credentials(
"Sonatype Nexus Repository Manager",
"nexus.acme.internal",
sys.env.getOrElse("NEXUS_USER", ""),
sys.env.getOrElse("NEXUS_PASSWORD", ""),
)
The four fields have the same constraints as the file-based version:
- Realm must match the server's
WWW-Authenticateresponse. Find it withcurl -vif you're not sure. Wrong realm = no credentials sent = a 401 that looks like a missing artifact. - Host is the bare hostname — no scheme, no path, no trailing slash.
- User and password should come from a service account with publish permission, not a personal account. Use an API token, not a password.
For local development, fall back to the file-based credentials so a developer can still run sbt publish manually for hotfixes:
// build.sbt
credentials ++= {
(sys.env.get("NEXUS_USER"), sys.env.get("NEXUS_PASSWORD")) match {
case (Some(u), Some(p)) =>
Seq(Credentials("Sonatype Nexus Repository Manager", "nexus.acme.internal", u, p))
case _ =>
Seq(Credentials(Path.userHome / ".sbt" / ".credentials"))
}
}
When the workflow sets NEXUS_USER and NEXUS_PASSWORD, env-based credentials apply. Anywhere else, the file is consulted.
Step 3: Pick a Versioning Strategy
CI publishes only make sense if every commit produces a unique, traceable version. Two patterns cover almost every team:
Tag-based releases with sbt-dynver. Add the sbt-dynver plugin and let it derive the version from git:
// project/plugins.sbt
addSbtPlugin("com.github.sbt" % "sbt-dynver" % "5.0.1")
With dynver enabled, version becomes a function of the current git state:
- On a tag like
v1.4.0→ version is1.4.0(a release). - N commits past a tag → version is
1.4.0+12-abcdef-SNAPSHOT(a snapshot, with the SHA embedded). - Dirty working tree → an additional
+YYYYMMDD-HHmmsuffix is appended.
This means sbt publish on a tagged commit produces a release; the same command on a regular merge to main produces a snapshot, with an embedded SHA so every artifact is traceable to its source commit without any extra bookkeeping.
Manual versions via a version.sbt file. If you'd rather control versions explicitly, the sbt-release plugin manages a version.sbt file and bumps it during the release task. The CI workflow for this pattern triggers on push to a release branch rather than on tag creation.
The rest of this article uses the dynver pattern because it requires the least bookkeeping.
Step 4: A Minimal GitHub Actions Workflow
A single workflow that publishes snapshots on every push to main and releases when a v* tag is pushed:
# .github/workflows/publish.yml
name: Publish
on:
push:
branches: [main]
tags: ['v*']
jobs:
publish:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- uses: actions/setup-java@v4
with:
distribution: temurin
java-version: '11'
cache: sbt
- uses: sbt/setup-sbt@v1
- name: Publish
env:
NEXUS_USER: ${{ secrets.NEXUS_USER }}
NEXUS_PASSWORD: ${{ secrets.NEXUS_PASSWORD }}
run: sbt +publish
A few specific choices in this workflow:
fetch-depth: 0— sbt-dynver derives the version fromgit describe, which needs the full tag history. The default checkout fetches only the latest commit, which makes dynver fall back to0.0.0+...and produce useless versions. This is the single most common mistake in CI publishing.- Java 11 — required by Spark 3.4.x. If your library targets a different Spark version, match its supported JDK.
sbt +publish— the leading+cross-publishes against every Scala version listed incrossScalaVersions. Spark libraries usually target both 2.12 and 2.13 because clusters in the wild run both. Drop the+only if you're certain you support a single Scala version.- Secrets via
env:— GitHub Actions exposessecrets.*only when explicitly mapped to environment variables. The names here (NEXUS_USER,NEXUS_PASSWORD) match whatbuild.sbtreads fromsys.env.
To set up the secrets: in GitHub, go to Settings → Secrets and variables → Actions and add NEXUS_USER and NEXUS_PASSWORD with credentials from your repository's service account.
Step 5: Cross-Compilation for Multiple Scala Versions
Spark libraries usually need to publish for both Scala 2.12 and 2.13 — many production clusters still run 2.12, and Spark 3.4.x supports both:
// build.sbt
scalaVersion := "2.13.11"
crossScalaVersions := Seq("2.12.18", "2.13.11")
val sparkVersion = "3.4.1"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
)
With crossScalaVersions defined, sbt +publish builds and uploads the artifact with both _2.12 and _2.13 suffixes. Consumers pick the matching binary version automatically when they declare the dependency with %%.
The provided scope on Spark itself stays the same regardless of Scala version — your library exposes the Spark API at compile time but doesn't bundle Spark into the published jar.
Step 6: Splitting Snapshots and Releases into Separate Workflows
The single-workflow approach above publishes both flavors but doesn't enforce the difference. For tighter control, split them in two:
# .github/workflows/snapshot.yml
name: Snapshot
on:
push:
branches: [main]
jobs:
publish-snapshot:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with: { fetch-depth: 0 }
- uses: actions/setup-java@v4
with: { distribution: temurin, java-version: '11', cache: sbt }
- uses: sbt/setup-sbt@v1
- env:
NEXUS_USER: ${{ secrets.NEXUS_USER }}
NEXUS_PASSWORD: ${{ secrets.NEXUS_PASSWORD }}
run: sbt +publish
# .github/workflows/release.yml
name: Release
on:
push:
tags: ['v*']
jobs:
publish-release:
runs-on: ubuntu-latest
environment: production
steps:
- uses: actions/checkout@v4
with: { fetch-depth: 0 }
- uses: actions/setup-java@v4
with: { distribution: temurin, java-version: '11', cache: sbt }
- uses: sbt/setup-sbt@v1
- env:
NEXUS_USER: ${{ secrets.NEXUS_USER }}
NEXUS_PASSWORD: ${{ secrets.NEXUS_PASSWORD }}
run: sbt +publish
The release workflow gates on environment: production. In GitHub, configure that environment to require manual approval, restrict deployments to specific reviewers, or scope its secrets separately from the default Actions secrets. The snapshot workflow stays unrestricted — any merge to main produces a snapshot artifact.
This split also lets you put the production Nexus credentials (write to maven-releases) in the production environment and use a less-privileged credential for snapshots — useful if your repo manager exposes finer-grained roles.
Step 7: Verify the Published Artifact
After the workflow runs, confirm the artifact actually landed before relying on it:
curl -u "$NEXUS_USER:$NEXUS_PASSWORD" \
https://nexus.acme.internal/repository/maven-releases/com/acme/spark-data-utils_2.13/1.4.0/spark-data-utils_2.13-1.4.0.pom
A 200 with the POM XML means the publish worked. A 404 means either the workflow failed silently (check the Actions logs) or the path is wrong — releases vs snapshots, wrong Scala version suffix, missing group-id segment.
For end-to-end confirmation, add a fresh dependency in a downstream project and resolve it:
// downstream build.sbt
libraryDependencies += "com.acme" %% "spark-data-utils" % "1.4.0"
Run sbt update and watch for a download from your internal repo. If resolution succeeds against a consumer service account that's distinct from the publishing account, you've validated both halves of the contract.
Common Mistakes
A handful of CI-publish failure modes that look mysterious until you've seen them once:
- Missing
fetch-depth: 0. sbt-dynver silently produces a0.0.0+...version becausegit describecan't see any tags. The publish "succeeds" against a useless version. The fix is one line in the checkout step. - Hardcoded
versioninbuild.sbt. Ifversion := "1.4.0"is committed to the source, every push attempts to overwrite the same release. Most repos reject this with a server-side 400 or 409. Either use dynver, or gate the release workflow on tag pushes only. - Realm mismatch in env-based credentials. The
Credentials(realm, host, user, password)constructor takes the realm as its first argument. A typo there fails the same way as in a credentials file: 401 on PUT, with no obvious indication that the issue is the realm string. - Snapshot caching on consumers. Most internal repos let you overwrite
1.4.0+12-abcdef-SNAPSHOTindefinitely, but Ivy on the consumer side caches snapshots based on the version string. If two snapshots share a version, consumers may keep the stale one. Embed the SHA in the snapshot version (dynver does this automatically) so each snapshot is uniquely addressable. sbt publishinstead ofsbt +publish. Without the+, only the activescalaVersionis published. Clusters running a different Scala binary version will fail resolution with "module not found" — the kind of bug that doesn't surface until a downstream team upgrades.- Service account without write permission. Read-only credentials work for
sbt updatebut fail at upload time with a 403. The error usually identifies the offending PUT request, but it's easy to mistake for a network error if you're not looking for it.
Once Publishing Works
With CI handling the publish step, the rest of the workflow becomes a discipline rather than a config exercise:
- Tag releases consistently. A protected branch policy that requires release tags to come from a CI-passing commit gives you the audit trail that motivated this whole setup.
- Pin downstream consumers. Once snapshots are flowing automatically, downstream Spark jobs that depend on the library should pin to specific release versions, not float on snapshots. The private Maven repository tutorial covers the consumer-side resolver setup.
- Watch for cluster-side resolution. Publishing the artifact is half the contract; the cluster running
spark-submitalso needs to resolve it. If you go the--packagesroute, see using spark-submit with private Maven dependencies for the Ivy-side credential setup. If you bundle the library into a fat jar instead, merge strategies andprovidedscope are what you'll be configuring next.
The handful of YAML lines and a few build.sbt settings are usually a one-time investment. After that, every merge produces an artifact, every tag produces a release, and "which version is in production" becomes a question with a single, traceable answer.