Spark Scala Lower and Upper Functions
The lower and upper functions convert string columns to lowercase and uppercase respectively. They're commonly used to normalize data for case-insensitive comparisons and consistent formatting.
def lower(e: Column): Column
def upper(e: Column): Column
Both functions take a single string column and return a new column with the case converted. Here's a quick example showing them side by side:
val df = Seq(
"Alice Johnson",
"BOB SMITH",
"Charlie Brown",
"diana prince",
"EVE Torres",
).toDF("name")
val df2 = df
.withColumn("lowered", lower(col("name")))
.withColumn("uppered", upper(col("name")))
df2.show(false)
// +-------------+-------------+-------------+
// |name |lowered |uppered |
// +-------------+-------------+-------------+
// |Alice Johnson|alice johnson|ALICE JOHNSON|
// |BOB SMITH |bob smith |BOB SMITH |
// |Charlie Brown|charlie brown|CHARLIE BROWN|
// |diana prince |diana prince |DIANA PRINCE |
// |EVE Torres |eve torres |EVE TORRES |
// +-------------+-------------+-------------+
Handling Nulls
When lower or upper encounter a null value, the result is null — no exceptions are thrown:
val df = Seq(
("Marketing", "ACTIVE"),
("engineering", "active"),
("SALES", "Inactive"),
(null, "ACTIVE"),
("Support", null),
).toDF("department", "status")
val df2 = df
.withColumn("dept_lower", lower(col("department")))
.withColumn("status_upper", upper(col("status")))
df2.show(false)
// +-----------+--------+-----------+------------+
// |department |status |dept_lower |status_upper|
// +-----------+--------+-----------+------------+
// |Marketing |ACTIVE |marketing |ACTIVE |
// |engineering|active |engineering|ACTIVE |
// |SALES |Inactive|sales |INACTIVE |
// |null |ACTIVE |null |ACTIVE |
// |Support |null |support |null |
// +-----------+--------+-----------+------------+
SQL Aliases: lcase and ucase
Spark also provides lcase and ucase as SQL-compatible aliases for lower and upper. These aren't available as direct Scala API functions, but you can use them through expr():
val df = Seq(
"Alice Johnson",
"BOB SMITH",
"diana prince",
).toDF("name")
val df2 = df
.withColumn("lcase_name", expr("lcase(name)"))
.withColumn("ucase_name", expr("ucase(name)"))
df2.show(false)
// +-------------+-------------+-------------+
// |name |lcase_name |ucase_name |
// +-------------+-------------+-------------+
// |Alice Johnson|alice johnson|ALICE JOHNSON|
// |BOB SMITH |bob smith |BOB SMITH |
// |diana prince |diana prince |DIANA PRINCE |
// +-------------+-------------+-------------+
The behavior is identical to lower and upper. Use whichever feels more natural — lower/upper are the standard Scala API functions, while lcase/ucase may be familiar if you're coming from a SQL background.
For other string manipulation functions, check out initcap for title case conversion, trim, ltrim, and rtrim for whitespace handling, or concat and concat_ws for combining strings.