Spark Scala Base64 Encoding and Decoding
base64 encodes a binary or string column into a Base64-encoded string. unbase64 does the reverse — it decodes a Base64 string back into binary. Together they let you safely represent binary data as printable text, which is useful when passing data through systems that only handle strings.
def base64(e: Column): Column
base64 takes a column and returns the Base64-encoded representation as a string. When the input is a string column, Spark converts it to bytes using UTF-8 before encoding. Null input returns null.
Here's an example encoding sensor readings into Base64:
val df = Seq(
"sensor-A:temp=72.1",
"sensor-B:temp=68.5",
"sensor-C:temp=75.3",
"sensor-D:temp=69.8",
).toDF("reading")
val df2 = df
.withColumn("encoded", base64(col("reading")))
df2.show(false)
// +------------------+------------------------+
// |reading |encoded |
// +------------------+------------------------+
// |sensor-A:temp=72.1|c2Vuc29yLUE6dGVtcD03Mi4x|
// |sensor-B:temp=68.5|c2Vuc29yLUI6dGVtcD02OC41|
// |sensor-C:temp=75.3|c2Vuc29yLUM6dGVtcD03NS4z|
// |sensor-D:temp=69.8|c2Vuc29yLUQ6dGVtcD02OS44|
// +------------------+------------------------+
Each reading is now a safe ASCII string that can be embedded in JSON, XML, or any other text format without worrying about special characters.
Decoding with unbase64
def unbase64(e: Column): Column
unbase64 takes a Base64-encoded string column and returns the decoded value as binary (Array[Byte]). To get a readable string back, cast the result to StringType. Null input returns null.
val df = Seq(
"c2Vuc29yLUE6dGVtcD03Mi4x",
"c2Vuc29yLUI6dGVtcD02OC41",
"c2Vuc29yLUM6dGVtcD03NS4z",
"c2Vuc29yLUQ6dGVtcD02OS44",
).toDF("encoded")
val df2 = df
.withColumn("decoded_binary", unbase64(col("encoded")))
.withColumn("decoded_string", col("decoded_binary").cast("string"))
df2.show(false)
// +------------------------+-------------------------------------------------------+------------------+
// |encoded |decoded_binary |decoded_string |
// +------------------------+-------------------------------------------------------+------------------+
// |c2Vuc29yLUE6dGVtcD03Mi4x|[73 65 6E 73 6F 72 2D 41 3A 74 65 6D 70 3D 37 32 2E 31]|sensor-A:temp=72.1|
// |c2Vuc29yLUI6dGVtcD02OC41|[73 65 6E 73 6F 72 2D 42 3A 74 65 6D 70 3D 36 38 2E 35]|sensor-B:temp=68.5|
// |c2Vuc29yLUM6dGVtcD03NS4z|[73 65 6E 73 6F 72 2D 43 3A 74 65 6D 70 3D 37 35 2E 33]|sensor-C:temp=75.3|
// |c2Vuc29yLUQ6dGVtcD02OS44|[73 65 6E 73 6F 72 2D 44 3A 74 65 6D 70 3D 36 39 2E 38]|sensor-D:temp=69.8|
// +------------------------+-------------------------------------------------------+------------------+
The decoded_binary column shows the raw hex bytes. Casting to "string" interprets those bytes as UTF-8 and gives back the original readable text.
Null handling
Both base64 and unbase64 return null when the input is null. This follows Spark's standard null propagation — no special handling is needed.
val df = Seq(
("Alice", "alice@example.com"),
("Bob", "bob@example.com"),
("Carol", null),
("Dave", "dave@example.com"),
).toDF("name", "email")
val df2 = df
.withColumn("encoded_email", base64(col("email")))
df2.show(false)
// +-----+-----------------+------------------------+
// |name |email |encoded_email |
// +-----+-----------------+------------------------+
// |Alice|alice@example.com|YWxpY2VAZXhhbXBsZS5jb20=|
// |Bob |bob@example.com |Ym9iQGV4YW1wbGUuY29t |
// |Carol|null |null |
// |Dave |dave@example.com |ZGF2ZUBleGFtcGxlLmNvbQ==|
// +-----+-----------------+------------------------+
Carol's null email produces a null encoded value — no exception, no empty string.
Related functions
For cryptographic hashing of data, see the hashing functions (md5, sha1, sha2). For other encoding operations, Spark also provides hex and unhex for hexadecimal conversion.