Job Board
Consulting

Spark Scala translate

translate performs character-by-character substitution within a string column. Each character in matchingString is replaced by the corresponding character in replaceString. If replaceString is shorter than matchingString, characters without a replacement are deleted entirely.

def translate(src: Column, matchingString: String, replaceString: String): Column

matchingString and replaceString work as a positional mapping: the first character of matchingString maps to the first character of replaceString, the second to the second, and so on. Matching is case-sensitive. Null input returns null.

translate is a good fit when you need simple, fixed character substitutions — replacing separator characters in product codes, normalizing punctuation, or similar tasks. For pattern-based replacement, use regexp_replace instead.

Here's an example that normalizes SKU strings by replacing - and / separators with spaces:

val df = Seq(
  "order-A1042/electronics",
  "order-B2201/clothing",
  "order-C3309/furniture",
  "order-D4410/appliances",
).toDF("sku")

val df2 = df
  .withColumn("normalized", translate(col("sku"), "-/", "  "))

df2.show(false)
// +-----------------------+-----------------------+
// |sku                    |normalized             |
// +-----------------------+-----------------------+
// |order-A1042/electronics|order A1042 electronics|
// |order-B2201/clothing   |order B2201 clothing   |
// |order-C3309/furniture  |order C3309 furniture  |
// |order-D4410/appliances |order D4410 appliances |
// +-----------------------+-----------------------+

Both - and / are replaced with spaces. The mapping is positional: "-/"" ", so - maps to a space and / maps to a space.

Deleting characters

When replaceString is shorter than matchingString, the extra characters in matchingString are deleted rather than substituted. Passing an empty string for replaceString strips all matched characters.

A common use case is stripping formatting from phone numbers to extract the raw digits:

val df = Seq(
  ("Alice",   "(555) 867-5309"),
  ("Bob",     "555.234.7890"),
  ("Carol",   "+1-800-555-0199"),
  ("Dave",    "555 444 1234"),
).toDF("name", "phone")

val df2 = df
  .withColumn("digits_only", translate(col("phone"), "()-.+ ", ""))

df2.show(false)
// +-----+---------------+-----------+
// |name |phone          |digits_only|
// +-----+---------------+-----------+
// |Alice|(555) 867-5309 |5558675309 |
// |Bob  |555.234.7890   |5552347890 |
// |Carol|+1-800-555-0199|18005550199|
// |Dave |555 444 1234   |5554441234 |
// +-----+---------------+-----------+

replaceString is "", so every character in matchingString (()-.+ and a space) is deleted when found in the source column.

For pattern-based substitution, see regexp_replace. For trimming characters from the start or end of a string, see string-trim-functions.

Example Details

Created: 2026-03-23 10:52:00 PM

Last Updated: 2026-03-23 10:52:00 PM