Configuring sbt for Private Maven Repositories in Spark Scala Projects
Most Spark teams keep shared utility libraries — schema definitions, custom UDFs, internal connectors — in a private Maven repository like Nexus or Artifactory. sbt resolves dependencies from Maven Central by default; pointing it at an internal repo and supplying credentials takes a handful of settings, but the order and the file locations matter.
Why Private Repositories for Spark Code
Once a Spark codebase grows beyond a single project, teams almost always end up extracting common code into shared libraries: row-level encryption helpers, schema registries, feature-store clients, custom data sources. Those artifacts can't go to Maven Central — they reference internal services, contain proprietary logic, or simply aren't open source.
The typical answer is a self-hosted artifact repository: Nexus Repository OSS, JFrog Artifactory, AWS CodeArtifact, or GitHub Packages. These all expose Maven-compatible HTTP endpoints, which means sbt can resolve from them as long as you tell it where they are and how to authenticate.
Step 1: Add the Resolver
A resolver tells sbt about an additional repository to check when resolving dependencies. Add one to build.sbt:
// build.sbt
resolvers += "Acme Internal Releases" at "https://nexus.acme.internal/repository/maven-releases/"
The string before at is the resolver name. It's used in log output and in ~/.sbt/credentials (more on that in Step 2). The URL is the base of the Maven layout — the path that contains com/acme/myartifact/1.0.0/myartifact-1.0.0.jar underneath.
For most internal repositories you'll want both releases and snapshots:
// build.sbt
resolvers ++= Seq(
"Acme Internal Releases" at "https://nexus.acme.internal/repository/maven-releases/",
"Acme Internal Snapshots" at "https://nexus.acme.internal/repository/maven-snapshots/",
)
Maven distinguishes between releases (immutable versions like 1.2.3) and snapshots (mutable versions like 1.2.3-SNAPSHOT). Most repositories serve them on different paths, and sbt's caching behavior differs between the two — snapshots are checked for updates, releases are cached forever. Configure both even if you only consume one for now; the cost is zero and you'll need it eventually.
Step 2: Supply Credentials
Once the resolver is in place, sbt will try to download from it. If the repository requires authentication (almost always the case for internal repos), unauthenticated requests get a 401 and sbt fails:
[error] failed to download from https://nexus.acme.internal/repository/maven-releases/com/acme/myartifact/1.0.0/myartifact-1.0.0.pom
[error] 401 Unauthorized
Credentials should never live in build.sbt. The standard approach is a separate file in your home directory that sbt loads automatically:
// ~/.sbt/.credentials
realm=Sonatype Nexus Repository Manager
host=nexus.acme.internal
user=jane.doe
password=abc123-personal-token
Then point sbt at that file from build.sbt:
// build.sbt
credentials += Credentials(Path.userHome / ".sbt" / ".credentials")
The four fields all matter:
realm— must exactly match the realm string the server sends in theWWW-Authenticateheader. Nexus typically usesSonatype Nexus Repository Manager. Artifactory usesArtifactory Realm. If this is wrong, sbt won't send the credentials and you'll see a 401 even though the credentials file exists.host— must match the hostname in the resolver URL exactly, with no scheme and no trailing slash.nexus.acme.internal, nothttps://nexus.acme.internal/.userandpassword— for most modern repositories, the password should be an API token rather than your account password. Nexus and Artifactory both let you generate these from the user profile UI.
To find the realm string for your server, run a curl -v against any artifact path and look at the WWW-Authenticate response header:
curl -v https://nexus.acme.internal/repository/maven-releases/test 2>&1 | grep -i 'www-authenticate'
// < www-authenticate: BASIC realm="Sonatype Nexus Repository Manager"
Copy the realm string verbatim — case and spacing matter.
Step 3: Verify Resolution
Add an internal dependency and run sbt to confirm everything works:
// build.sbt
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-sql" % "3.4.1" % "provided",
"com.acme" %% "spark-data-utils" % "1.4.0",
"com.lihaoyi" %% "utest" % "0.8.1" % "test",
)
sbt update
If sbt logs a download from nexus.acme.internal and the build succeeds, the resolver and credentials are wired correctly. If you instead see:
[warn] module not found: com.acme#spark-data-utils_2.13;1.4.0
[warn] ==== Acme Internal Releases: tried
[warn] https://nexus.acme.internal/repository/maven-releases/com/acme/spark-data-utils_2.13/1.4.0/spark-data-utils_2.13-1.4.0.pom
[warn] ==== Maven Central: tried
[warn] https://repo1.maven.org/maven2/com/acme/spark-data-utils_2.13/1.4.0/spark-data-utils_2.13-1.4.0.pom
The resolver is being consulted but the artifact wasn't found. Either the version doesn't exist on the server, or the credentials are missing/wrong and the request was rejected (some servers return 404 instead of 401 on unauthenticated requests to private artifacts, which is more confusing than helpful).
CI/CD: Credentials from Environment Variables
Hardcoded ~/.sbt/.credentials works for individual developers but not for CI. Build agents need credentials supplied via environment variables, secrets managers, or task-scoped tokens.
Use sys.env in build.sbt to construct credentials at runtime when CI environment variables are present, and fall back to the file otherwise:
// build.sbt
credentials ++= {
val envUser = sys.env.get("NEXUS_USER")
val envPass = sys.env.get("NEXUS_PASSWORD")
(envUser, envPass) match {
case (Some(u), Some(p)) =>
Seq(Credentials(
"Sonatype Nexus Repository Manager",
"nexus.acme.internal",
u,
p,
))
case _ =>
Seq(Credentials(Path.userHome / ".sbt" / ".credentials"))
}
}
In CI, set NEXUS_USER and NEXUS_PASSWORD from your secrets manager. Local builds keep using the file. Both paths work without anyone needing to remember which one is active.
A common variation is to use a separate credentials file path for CI rather than env vars directly — useful when the CI system writes a complete file rather than individual variables:
// build.sbt
credentials += Credentials(
file(sys.env.getOrElse("SBT_CREDENTIALS", (Path.userHome / ".sbt" / ".credentials").getAbsolutePath))
)
The CI job writes the credentials file to a known location and sets SBT_CREDENTIALS to its path.
Restricting Resolution to Internal Mirrors
Some organizations require all dependencies — including public ones — to come through their internal repository. This usually means the repo manager is configured as a proxy/mirror of Maven Central, and direct internet access is blocked from build machines.
To force sbt to use only the internal repository, override the default resolver chain instead of appending to it:
// build.sbt
ThisBuild / externalResolvers := Seq(
"Acme Internal" at "https://nexus.acme.internal/repository/maven-public/",
)
externalResolvers is the full list sbt consults after its internal cache. Setting it (rather than using +=) replaces the defaults entirely, including Maven Central. The internal repo's maven-public group should be configured to proxy Maven Central transparently, so all your normal dependencies still resolve — just through your infrastructure.
This is also the right setup for air-gapped builds where build machines have no internet access. The internal repo is the single source of truth for every artifact.
A Complete build.sbt
Putting it all together, including the provided scope and test JVM settings that every Spark project needs:
// build.sbt
name := "myproject"
scalaVersion := "2.13.11"
val sparkVersion = "3.4.1"
resolvers ++= Seq(
"Acme Internal Releases" at "https://nexus.acme.internal/repository/maven-releases/",
"Acme Internal Snapshots" at "https://nexus.acme.internal/repository/maven-snapshots/",
)
credentials ++= {
(sys.env.get("NEXUS_USER"), sys.env.get("NEXUS_PASSWORD")) match {
case (Some(u), Some(p)) =>
Seq(Credentials("Sonatype Nexus Repository Manager", "nexus.acme.internal", u, p))
case _ =>
Seq(Credentials(Path.userHome / ".sbt" / ".credentials"))
}
}
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
"com.acme" %% "spark-data-utils" % "1.4.0",
"com.lihaoyi" %% "utest" % "0.8.1" % "test",
)
testFrameworks += new TestFramework("utest.runner.Framework")
Test / fork := true
Test / javaOptions ++= Seq("-Xms512m", "-Xmx4g")
Test / parallelExecution := false
Common Mistakes
A few configuration errors that look mysterious until you've seen them once:
- Realm string mismatch. sbt sends credentials only when the realm in the credentials file matches the one in the server's
WWW-Authenticateheader. Mismatched realm = no credentials sent = 401. The server is happy to tell you the right realm viacurl -v. - Trailing slash in
host. Thehostfield is a hostname, not a URL. Puttinghttps://nexus.acme.internal/here silently fails to match against requests tonexus.acme.internal. - Using a password instead of a token. Most repository managers either require or strongly prefer API tokens for programmatic access. Account passwords often have shorter session timeouts, and some servers reject them outright for non-browser clients.
- Committing
build.sbtwith hardcoded credentials. Never putCredentials(realm, host, user, password)directly inbuild.sbtfor non-test code paths — even temporarily. Use the file or env var approach from the start. - Wrong artifact coordinates.
com.acme %% "spark-data-utils"adds the Scala binary version automatically (spark-data-utils_2.13). If your library was published without a Scala suffix, use%instead of%%.
Once Resolution Works
With the resolver and credentials in place, internal artifacts behave exactly like public ones — sbt update, sbt compile, and sbt test resolve them transparently. Beyond this baseline, two things commonly come up next:
- Version conflicts. Internal libraries built against different Spark versions can pull in incompatible transitive dependencies. The fix is the same as for any other dependency conflict: pin versions explicitly. See configuring sbt assembly merge strategies for the related problem of duplicate files in fat jars.
- Deployment. Local resolution being green doesn't mean the cluster can resolve the same artifacts. If you deploy fat jars built with
sbt assembly, the internal dependencies are baked into the jar and the cluster doesn't need credentials. If you instead use--packageswithspark-submit, the cluster nodes need their own credential configuration.