Spark Scala String Padding Functions
The lpad and rpad functions pad a string column to a specified length by adding characters to the left or right side. They're commonly used for zero-padding numbers, aligning text output, and formatting fixed-width fields.
def lpad(str: Column, len: Int, pad: String): Column
def rpad(str: Column, len: Int, pad: String): Column
Both functions take a string column, a target length, and a padding string. If the input is shorter than len, it gets padded. If it's already longer, it gets truncated to len characters.
Zero-Padding Numbers with lpad
A common use case for lpad is formatting numeric IDs with leading zeros:
val df = Seq(
("INV", 42),
("INV", 1587),
("INV", 3),
("INV", 99012),
("INV", 678),
).toDF("prefix", "number")
val df2 = df
.withColumn("padded_number", lpad(col("number").cast("string"), 6, "0"))
df2.show(false)
// +------+------+-------------+
// |prefix|number|padded_number|
// +------+------+-------------+
// |INV |42 |000042 |
// |INV |1587 |001587 |
// |INV |3 |000003 |
// |INV |99012 |099012 |
// |INV |678 |000678 |
// +------+------+-------------+
Note that lpad works on strings, so you need to cast numeric columns to string first.
Comparing lpad and rpad
Here's a side-by-side comparison showing both functions with space and dot padding:
val df = Seq(
"Alice",
"Bob",
"Charlotte",
"Dan",
"Eve",
).toDF("name")
val df2 = df
.withColumn("left_padded", lpad(col("name"), 12, " "))
.withColumn("right_padded", rpad(col("name"), 12, " "))
.withColumn("left_dot", lpad(col("name"), 12, "."))
.withColumn("right_dot", rpad(col("name"), 12, "."))
df2.show(false)
// +---------+------------+------------+------------+------------+
// |name |left_padded |right_padded|left_dot |right_dot |
// +---------+------------+------------+------------+------------+
// |Alice | Alice|Alice |.......Alice|Alice.......|
// |Bob | Bob|Bob |.........Bob|Bob.........|
// |Charlotte| Charlotte|Charlotte |...Charlotte|Charlotte...|
// |Dan | Dan|Dan |.........Dan|Dan.........|
// |Eve | Eve|Eve |.........Eve|Eve.........|
// +---------+------------+------------+------------+------------+
Truncation When the String Is Already Longer
When len is shorter than the input string, both lpad and rpad truncate from the right — they return the first len characters:
val df = Seq(
"Hello",
"Spark",
"Pad",
).toDF("value")
val df2 = df
.withColumn("lpad_short", lpad(col("value"), 3, "*"))
.withColumn("rpad_short", rpad(col("value"), 3, "*"))
df2.show(false)
// +-----+----------+----------+
// |value|lpad_short|rpad_short|
// +-----+----------+----------+
// |Hello|Hel |Hel |
// |Spark|Spa |Spa |
// |Pad |Pad |Pad |
// +-----+----------+----------+
This means lpad and rpad can double as a simple truncation mechanism when you need strings capped at a maximum length.
Handling Nulls
When lpad or rpad encounter a null value, the result is null:
val df = Seq(
"Alice",
null,
"Charlie",
null,
"Eve",
).toDF("name")
val df2 = df
.withColumn("padded", lpad(col("name"), 10, "."))
df2.show(false)
// +-------+----------+
// |name |padded |
// +-------+----------+
// |Alice |.....Alice|
// |null |null |
// |Charlie|...Charlie|
// |null |null |
// |Eve |.......Eve|
// +-------+----------+
For other string formatting functions, see lower and upper for case conversion, initcap for title case, or trim, ltrim, and rtrim for removing unwanted characters from the ends of strings.