Spark Scala Repeat and Space
The repeat function duplicates a string column a specified number of times. The space SQL function generates a string of spaces — it's shorthand for repeat(" ", n). Together they cover common needs like building separators, indenting text, and padding output.
The repeat function is defined as:
def repeat(str: Column, n: Int): Column
It returns a new string formed by concatenating str with itself n times. If n is zero or negative, the result is an empty string. If str is null, the result is null.
Repeating String Values
val df = Seq(
("Warning", 3),
("Go", 5),
("Stop", 1),
("Hello", 2),
).toDF("word", "times")
val df2 = df
.withColumn("repeated", repeat(col("word"), 3))
df2.show(false)
// +-------+-----+---------------------+
// |word |times|repeated |
// +-------+-----+---------------------+
// |Warning|3 |WarningWarningWarning|
// |Go |5 |GoGoGo |
// |Stop |1 |StopStopStop |
// |Hello |2 |HelloHelloHello |
// +-------+-----+---------------------+
Note that the n parameter is a fixed Int, not a column reference — every row gets the same repeat count.
Building Separators and Dividers
A handy use for repeat is generating separator lines or visual dividers from a single character:
val df = Seq(
"-",
"=",
"*",
"#",
).toDF("char")
val df2 = df
.withColumn("separator", repeat(col("char"), 20))
df2.show(false)
// +----+--------------------+
// |char|separator |
// +----+--------------------+
// |- |--------------------|
// |= |====================|
// |* |********************|
// |# |####################|
// +----+--------------------+
Generating Whitespace with space
The space function isn't available directly in org.apache.spark.sql.functions, but you can call it through expr. It returns a string of n space characters — equivalent to repeat(lit(" "), n):
val df = Seq(
"Alice",
"Bob",
"Charlotte",
).toDF("name")
val df2 = df
.withColumn("indented", concat(expr("space(4)"), col("name")))
df2.show(false)
// +---------+-------------+
// |name |indented |
// +---------+-------------+
// |Alice | Alice |
// |Bob | Bob |
// |Charlotte| Charlotte|
// +---------+-------------+
Handling Nulls
When repeat encounters a null input, the result is null — it does not produce an empty string:
val df = Seq(
("Alice", null.asInstanceOf[String]),
("Bob", "Hey"),
(null, "World"),
(null, null.asInstanceOf[String]),
).toDF("name", "greeting")
val df2 = df
.withColumn("name_repeated", repeat(col("name"), 2))
.withColumn("greeting_repeated", repeat(col("greeting"), 3))
df2.show(false)
// +-----+--------+-------------+-----------------+
// |name |greeting|name_repeated|greeting_repeated|
// +-----+--------+-------------+-----------------+
// |Alice|null |AliceAlice |null |
// |Bob |Hey |BobBob |HeyHeyHey |
// |null |World |null |WorldWorldWorld |
// |null |null |null |null |
// +-----+--------+-------------+-----------------+
For other string formatting functions, see lpad and rpad for padding strings to a fixed length, concat and concat_ws for joining strings together, or overlay for replacing characters at a specific position.