Spark Scala InitCap
The initcap function converts a string column to title case — capitalizing the first letter of each word and lowercasing the rest. It's useful for normalizing names, addresses, and other text where consistent capitalization matters.
def initcap(e: Column): Column
The function splits on whitespace to determine word boundaries, then capitalizes the first character of each word and lowercases everything else:
val df = Seq(
"alice johnson",
"BOB SMITH",
"charlie brown",
"DIANA PRINCE",
"eve torres",
).toDF("name")
val df2 = df
.withColumn("title_case", initcap(col("name")))
df2.show(false)
// +-------------+-------------+
// |name |title_case |
// +-------------+-------------+
// |alice johnson|Alice Johnson|
// |BOB SMITH |Bob Smith |
// |charlie brown|Charlie Brown|
// |DIANA PRINCE |Diana Prince |
// |eve torres |Eve Torres |
// +-------------+-------------+
Handling Nulls
When initcap encounters a null value, the result is null — no exceptions are thrown:
val df = Seq(
("new york", "united states"),
("san francisco", "united states"),
("rio de janeiro", "brazil"),
(null, "canada"),
("london", null),
).toDF("city", "country")
val df2 = df
.withColumn("city_formatted", initcap(col("city")))
.withColumn("country_formatted", initcap(col("country")))
df2.show(false)
// +--------------+-------------+--------------+-----------------+
// |city |country |city_formatted|country_formatted|
// +--------------+-------------+--------------+-----------------+
// |new york |united states|New York |United States |
// |san francisco |united states|San Francisco |United States |
// |rio de janeiro|brazil |Rio De Janeiro|Brazil |
// |null |canada |null |Canada |
// |london |null |London |null |
// +--------------+-------------+--------------+-----------------+
Word Boundary Behavior
initcap only treats whitespace as a word boundary. Hyphens, underscores, dots, and other punctuation do not trigger capitalization of the following character:
val df = Seq(
"hello-world",
"first_name last_name",
"order.total",
"left/right",
).toDF("value")
val df2 = df
.withColumn("result", initcap(col("value")))
df2.show(false)
// +--------------------+--------------------+
// |value |result |
// +--------------------+--------------------+
// |hello-world |Hello-world |
// |first_name last_name|First_name Last_name|
// |order.total |Order.total |
// |left/right |Left/right |
// +--------------------+--------------------+
Notice that hello-world becomes Hello-world (not Hello-World), and first_name last_name becomes First_name Last_name — only the space triggers a new word. If you need title case that respects hyphens or other delimiters, you'll need a combination of split, initcap, and concat_ws, or a custom UDF.
For other string case functions, see lower and upper for converting to all-lowercase or all-uppercase. For whitespace cleanup before applying initcap, check out trim, ltrim, and rtrim.