Job Board
Consulting

Spark Scala InitCap

The initcap function converts a string column to title case — capitalizing the first letter of each word and lowercasing the rest. It's useful for normalizing names, addresses, and other text where consistent capitalization matters.

def initcap(e: Column): Column

The function splits on whitespace to determine word boundaries, then capitalizes the first character of each word and lowercases everything else:

val df = Seq(
  "alice johnson",
  "BOB SMITH",
  "charlie brown",
  "DIANA PRINCE",
  "eve torres",
).toDF("name")

val df2 = df
  .withColumn("title_case", initcap(col("name")))

df2.show(false)
// +-------------+-------------+
// |name         |title_case   |
// +-------------+-------------+
// |alice johnson|Alice Johnson|
// |BOB SMITH    |Bob Smith    |
// |charlie brown|Charlie Brown|
// |DIANA PRINCE |Diana Prince |
// |eve torres   |Eve Torres   |
// +-------------+-------------+

Handling Nulls

When initcap encounters a null value, the result is null — no exceptions are thrown:

val df = Seq(
  ("new york", "united states"),
  ("san francisco", "united states"),
  ("rio de janeiro", "brazil"),
  (null, "canada"),
  ("london", null),
).toDF("city", "country")

val df2 = df
  .withColumn("city_formatted", initcap(col("city")))
  .withColumn("country_formatted", initcap(col("country")))

df2.show(false)
// +--------------+-------------+--------------+-----------------+
// |city          |country      |city_formatted|country_formatted|
// +--------------+-------------+--------------+-----------------+
// |new york      |united states|New York      |United States    |
// |san francisco |united states|San Francisco |United States    |
// |rio de janeiro|brazil       |Rio De Janeiro|Brazil           |
// |null          |canada       |null          |Canada           |
// |london        |null         |London        |null             |
// +--------------+-------------+--------------+-----------------+

Word Boundary Behavior

initcap only treats whitespace as a word boundary. Hyphens, underscores, dots, and other punctuation do not trigger capitalization of the following character:

val df = Seq(
  "hello-world",
  "first_name last_name",
  "order.total",
  "left/right",
).toDF("value")

val df2 = df
  .withColumn("result", initcap(col("value")))

df2.show(false)
// +--------------------+--------------------+
// |value               |result              |
// +--------------------+--------------------+
// |hello-world         |Hello-world         |
// |first_name last_name|First_name Last_name|
// |order.total         |Order.total         |
// |left/right          |Left/right          |
// +--------------------+--------------------+

Notice that hello-world becomes Hello-world (not Hello-World), and first_name last_name becomes First_name Last_name — only the space triggers a new word. If you need title case that respects hyphens or other delimiters, you'll need a combination of split, initcap, and concat_ws, or a custom UDF.

For other string case functions, see lower and upper for converting to all-lowercase or all-uppercase. For whitespace cleanup before applying initcap, check out trim, ltrim, and rtrim.

Example Details

Created: 2026-03-15 05:00:00 PM

Last Updated: 2026-03-15 05:00:00 PM