Spark Scala translate
translate performs character-by-character substitution within a string column. Each character in matchingString is replaced by the corresponding character in replaceString. If replaceString is shorter than matchingString, characters without a replacement are deleted entirely.
def translate(src: Column, matchingString: String, replaceString: String): Column
matchingString and replaceString work as a positional mapping: the first character of matchingString maps to the first character of replaceString, the second to the second, and so on. Matching is case-sensitive. Null input returns null.
translate is a good fit when you need simple, fixed character substitutions — replacing separator characters in product codes, normalizing punctuation, or similar tasks. For pattern-based replacement, use regexp_replace instead.
Here's an example that normalizes SKU strings by replacing - and / separators with spaces:
val df = Seq(
"order-A1042/electronics",
"order-B2201/clothing",
"order-C3309/furniture",
"order-D4410/appliances",
).toDF("sku")
val df2 = df
.withColumn("normalized", translate(col("sku"), "-/", " "))
df2.show(false)
// +-----------------------+-----------------------+
// |sku |normalized |
// +-----------------------+-----------------------+
// |order-A1042/electronics|order A1042 electronics|
// |order-B2201/clothing |order B2201 clothing |
// |order-C3309/furniture |order C3309 furniture |
// |order-D4410/appliances |order D4410 appliances |
// +-----------------------+-----------------------+
Both - and / are replaced with spaces. The mapping is positional: "-/" → " ", so - maps to a space and / maps to a space.
Deleting characters
When replaceString is shorter than matchingString, the extra characters in matchingString are deleted rather than substituted. Passing an empty string for replaceString strips all matched characters.
A common use case is stripping formatting from phone numbers to extract the raw digits:
val df = Seq(
("Alice", "(555) 867-5309"),
("Bob", "555.234.7890"),
("Carol", "+1-800-555-0199"),
("Dave", "555 444 1234"),
).toDF("name", "phone")
val df2 = df
.withColumn("digits_only", translate(col("phone"), "()-.+ ", ""))
df2.show(false)
// +-----+---------------+-----------+
// |name |phone |digits_only|
// +-----+---------------+-----------+
// |Alice|(555) 867-5309 |5558675309 |
// |Bob |555.234.7890 |5552347890 |
// |Carol|+1-800-555-0199|18005550199|
// |Dave |555 444 1234 |5554441234 |
// +-----+---------------+-----------+
replaceString is "", so every character in matchingString (()-.+ and a space) is deleted when found in the source column.
Related functions
For pattern-based substitution, see regexp_replace. For trimming characters from the start or end of a string, see string-trim-functions.