Job Board
Consulting

Spark Scala Overlay

overlay replaces a portion of a string column starting at a given position with a replacement string. It works like the SQL standard OVERLAY function and is useful for masking, patching, or inserting text at a specific character position.

The overlay function first appeared in version 3.0.0 and is defined as:

def overlay(src: Column, replace: Column, pos: Column, len: Column): Column

src is the source string column, replace is the replacement string, pos is the 1-based starting position, and len is the number of characters to remove from the source before inserting the replacement. All parameters must be Column expressions — wrap literals with lit().

Here's an example that masks the first seven characters of a phone number:

val df = Seq(
  ("Alice", "555-867-5309"),
  ("Bob",   "555-234-7890"),
  ("Carol", "555-444-1234"),
  ("Dave",  "555-999-0001"),
).toDF("name", "phone")

val df2 = df
  .withColumn("masked", overlay(col("phone"), lit("***-***"), lit(1), lit(7)))

df2.show(false)
// +-----+------------+------------+
// |name |phone       |masked      |
// +-----+------------+------------+
// |Alice|555-867-5309|***-***-5309|
// |Bob  |555-234-7890|***-***-7890|
// |Carol|555-444-1234|***-***-1234|
// |Dave |555-999-0001|***-***-0001|
// +-----+------------+------------+

Starting at position 1, seven characters are removed and replaced with ***-***. The rest of the string (-5309) remains unchanged.

Replacing characters in the middle

overlay can target any position in the string, not just the beginning. Here it replaces two characters starting at position 4:

val df = Seq(
  "ABCDEFGHIJ",
  "KLMNOPQRST",
  "UVWXYZ1234",
).toDF("code")

val df2 = df
  .withColumn("replaced", overlay(col("code"), lit("xx"), lit(4), lit(2)))

df2.show(false)
// +----------+----------+
// |code      |replaced  |
// +----------+----------+
// |ABCDEFGHIJ|ABCxxFGHIJ|
// |KLMNOPQRST|KLMxxPQRST|
// |UVWXYZ1234|UVWxxZ1234|
// +----------+----------+

Characters at positions 4 and 5 (DE, NO, XY) are replaced with xx.

Inserting text without removing characters

Set len to 0 to insert text at a position without removing anything from the source:

val df = Seq(
  "Hello World",
  "Good Morning",
  "Nice Weather",
).toDF("greeting")

val df2 = df
  .withColumn("inserted", overlay(col("greeting"), lit("Beautiful "), lit(1), lit(0)))

df2.show(false)
// +------------+----------------------+
// |greeting    |inserted              |
// +------------+----------------------+
// |Hello World |Beautiful Hello World |
// |Good Morning|Beautiful Good Morning|
// |Nice Weather|Beautiful Nice Weather|
// +------------+----------------------+

With len set to 0, nothing is removed — Beautiful is inserted before position 1.

Without the len parameter

There's also a three-argument version that omits len:

def overlay(src: Column, replace: Column, pos: Column): Column

When len is omitted, the number of characters removed equals the length of the replacement string. This swaps the characters one-for-one:

val df = Seq(
  "2024-01-15",
  "2024-06-30",
  "2024-12-25",
).toDF("date_str")

val df2 = df
  .withColumn("new_year", overlay(col("date_str"), lit("2025"), lit(1)))

df2.show(false)
// +----------+----------+
// |date_str  |new_year  |
// +----------+----------+
// |2024-01-15|2025-01-15|
// |2024-06-30|2025-06-30|
// |2024-12-25|2025-12-25|
// +----------+----------+

"2025" is four characters, so four characters starting at position 1 are replaced — effectively swapping the year.

For character-by-character substitution (mapping individual characters to replacements), see translate. For pattern-based replacement using regular expressions, see regexp_replace. For extracting a portion of a string by position, see substring.

Example Details

Created: 2026-03-24 10:06:00 PM

Last Updated: 2026-03-24 10:06:00 PM