Spark Scala Lower and Upper Functions

The lower and upper functions convert string columns to lowercase and uppercase respectively. They're commonly used to normalize data for case-insensitive comparisons and consistent formatting.

def lower(e: Column): Column

def upper(e: Column): Column

Both functions take a single string column and return a new column with the case converted. Here's a quick example showing them side by side:

val df = Seq(
  "Alice Johnson",
  "BOB SMITH",
  "Charlie Brown",
  "diana prince",
  "EVE Torres",
).toDF("name")

val df2 = df
  .withColumn("lowered", lower(col("name")))
  .withColumn("uppered", upper(col("name")))

df2.show(false)
// +-------------+-------------+-------------+
// |name         |lowered      |uppered      |
// +-------------+-------------+-------------+
// |Alice Johnson|alice johnson|ALICE JOHNSON|
// |BOB SMITH    |bob smith    |BOB SMITH    |
// |Charlie Brown|charlie brown|CHARLIE BROWN|
// |diana prince |diana prince |DIANA PRINCE |
// |EVE Torres   |eve torres   |EVE TORRES   |
// +-------------+-------------+-------------+

Handling Nulls

When lower or upper encounter a null value, the result is null — no exceptions are thrown:

val df = Seq(
  ("Marketing", "ACTIVE"),
  ("engineering", "active"),
  ("SALES", "Inactive"),
  (null, "ACTIVE"),
  ("Support", null),
).toDF("department", "status")

val df2 = df
  .withColumn("dept_lower", lower(col("department")))
  .withColumn("status_upper", upper(col("status")))

df2.show(false)
// +-----------+--------+-----------+------------+
// |department |status  |dept_lower |status_upper|
// +-----------+--------+-----------+------------+
// |Marketing  |ACTIVE  |marketing  |ACTIVE      |
// |engineering|active  |engineering|ACTIVE      |
// |SALES      |Inactive|sales      |INACTIVE    |
// |null       |ACTIVE  |null       |ACTIVE      |
// |Support    |null    |support    |null        |
// +-----------+--------+-----------+------------+

SQL Aliases: lcase and ucase

Spark also provides lcase and ucase as SQL-compatible aliases for lower and upper. These aren't available as direct Scala API functions, but you can use them through expr():

val df = Seq(
  "Alice Johnson",
  "BOB SMITH",
  "diana prince",
).toDF("name")

val df2 = df
  .withColumn("lcase_name", expr("lcase(name)"))
  .withColumn("ucase_name", expr("ucase(name)"))

df2.show(false)
// +-------------+-------------+-------------+
// |name         |lcase_name   |ucase_name   |
// +-------------+-------------+-------------+
// |Alice Johnson|alice johnson|ALICE JOHNSON|
// |BOB SMITH    |bob smith    |BOB SMITH    |
// |diana prince |diana prince |DIANA PRINCE |
// +-------------+-------------+-------------+

The behavior is identical to lower and upper. Use whichever feels more natural — lower/upper are the standard Scala API functions, while lcase/ucase may be familiar if you're coming from a SQL background.

For other string manipulation functions, check out initcap for title case conversion, trim, ltrim, and rtrim for whitespace handling, or concat and concat_ws for combining strings.