Job Board
Consulting

Spark Scala Base64 Encoding and Decoding

base64 encodes a binary or string column into a Base64-encoded string. unbase64 does the reverse — it decodes a Base64 string back into binary. Together they let you safely represent binary data as printable text, which is useful when passing data through systems that only handle strings.

def base64(e: Column): Column

base64 takes a column and returns the Base64-encoded representation as a string. When the input is a string column, Spark converts it to bytes using UTF-8 before encoding. Null input returns null.

Here's an example encoding sensor readings into Base64:

val df = Seq(
  "sensor-A:temp=72.1",
  "sensor-B:temp=68.5",
  "sensor-C:temp=75.3",
  "sensor-D:temp=69.8",
).toDF("reading")

val df2 = df
  .withColumn("encoded", base64(col("reading")))

df2.show(false)
// +------------------+------------------------+
// |reading           |encoded                 |
// +------------------+------------------------+
// |sensor-A:temp=72.1|c2Vuc29yLUE6dGVtcD03Mi4x|
// |sensor-B:temp=68.5|c2Vuc29yLUI6dGVtcD02OC41|
// |sensor-C:temp=75.3|c2Vuc29yLUM6dGVtcD03NS4z|
// |sensor-D:temp=69.8|c2Vuc29yLUQ6dGVtcD02OS44|
// +------------------+------------------------+

Each reading is now a safe ASCII string that can be embedded in JSON, XML, or any other text format without worrying about special characters.

Decoding with unbase64

def unbase64(e: Column): Column

unbase64 takes a Base64-encoded string column and returns the decoded value as binary (Array[Byte]). To get a readable string back, cast the result to StringType. Null input returns null.

val df = Seq(
  "c2Vuc29yLUE6dGVtcD03Mi4x",
  "c2Vuc29yLUI6dGVtcD02OC41",
  "c2Vuc29yLUM6dGVtcD03NS4z",
  "c2Vuc29yLUQ6dGVtcD02OS44",
).toDF("encoded")

val df2 = df
  .withColumn("decoded_binary", unbase64(col("encoded")))
  .withColumn("decoded_string", col("decoded_binary").cast("string"))

df2.show(false)
// +------------------------+-------------------------------------------------------+------------------+
// |encoded                 |decoded_binary                                         |decoded_string    |
// +------------------------+-------------------------------------------------------+------------------+
// |c2Vuc29yLUE6dGVtcD03Mi4x|[73 65 6E 73 6F 72 2D 41 3A 74 65 6D 70 3D 37 32 2E 31]|sensor-A:temp=72.1|
// |c2Vuc29yLUI6dGVtcD02OC41|[73 65 6E 73 6F 72 2D 42 3A 74 65 6D 70 3D 36 38 2E 35]|sensor-B:temp=68.5|
// |c2Vuc29yLUM6dGVtcD03NS4z|[73 65 6E 73 6F 72 2D 43 3A 74 65 6D 70 3D 37 35 2E 33]|sensor-C:temp=75.3|
// |c2Vuc29yLUQ6dGVtcD02OS44|[73 65 6E 73 6F 72 2D 44 3A 74 65 6D 70 3D 36 39 2E 38]|sensor-D:temp=69.8|
// +------------------------+-------------------------------------------------------+------------------+

The decoded_binary column shows the raw hex bytes. Casting to "string" interprets those bytes as UTF-8 and gives back the original readable text.

Null handling

Both base64 and unbase64 return null when the input is null. This follows Spark's standard null propagation — no special handling is needed.

val df = Seq(
  ("Alice", "alice@example.com"),
  ("Bob",   "bob@example.com"),
  ("Carol", null),
  ("Dave",  "dave@example.com"),
).toDF("name", "email")

val df2 = df
  .withColumn("encoded_email", base64(col("email")))

df2.show(false)
// +-----+-----------------+------------------------+
// |name |email            |encoded_email           |
// +-----+-----------------+------------------------+
// |Alice|alice@example.com|YWxpY2VAZXhhbXBsZS5jb20=|
// |Bob  |bob@example.com  |Ym9iQGV4YW1wbGUuY29t    |
// |Carol|null             |null                    |
// |Dave |dave@example.com |ZGF2ZUBleGFtcGxlLmNvbQ==|
// +-----+-----------------+------------------------+

Carol's null email produces a null encoded value — no exception, no empty string.

For converting between strings and binary with a specific character set, see decode and encode. For cryptographic hashing of data, see the hashing functions (md5, sha1, sha2). For hexadecimal encoding and decoding, see hex and unhex.

Example Details

Created: 2026-03-29 10:33:37 PM

Last Updated: 2026-03-29 10:33:37 PM