Job Board
Consulting

Spark Scala Base64 Encoding and Decoding

base64 encodes a binary or string column into a Base64-encoded string. unbase64 does the reverse — it decodes a Base64 string back into binary. Together they let you safely represent binary data as printable text, which is useful when passing data through systems that only handle strings.

def base64(e: Column): Column

base64 takes a column and returns the Base64-encoded representation as a string. When the input is a string column, Spark converts it to bytes using UTF-8 before encoding. Null input returns null.

Here's an example encoding sensor readings into Base64:

val df = Seq(
  "sensor-A:temp=72.1",
  "sensor-B:temp=68.5",
  "sensor-C:temp=75.3",
  "sensor-D:temp=69.8",
).toDF("reading")

val df2 = df
  .withColumn("encoded", base64(col("reading")))

df2.show(false)
// +------------------+------------------------+
// |reading           |encoded                 |
// +------------------+------------------------+
// |sensor-A:temp=72.1|c2Vuc29yLUE6dGVtcD03Mi4x|
// |sensor-B:temp=68.5|c2Vuc29yLUI6dGVtcD02OC41|
// |sensor-C:temp=75.3|c2Vuc29yLUM6dGVtcD03NS4z|
// |sensor-D:temp=69.8|c2Vuc29yLUQ6dGVtcD02OS44|
// +------------------+------------------------+

Each reading is now a safe ASCII string that can be embedded in JSON, XML, or any other text format without worrying about special characters.

Decoding with unbase64

def unbase64(e: Column): Column

unbase64 takes a Base64-encoded string column and returns the decoded value as binary (Array[Byte]). To get a readable string back, cast the result to StringType. Null input returns null.

val df = Seq(
  "c2Vuc29yLUE6dGVtcD03Mi4x",
  "c2Vuc29yLUI6dGVtcD02OC41",
  "c2Vuc29yLUM6dGVtcD03NS4z",
  "c2Vuc29yLUQ6dGVtcD02OS44",
).toDF("encoded")

val df2 = df
  .withColumn("decoded_binary", unbase64(col("encoded")))
  .withColumn("decoded_string", col("decoded_binary").cast("string"))

df2.show(false)
// +------------------------+-------------------------------------------------------+------------------+
// |encoded                 |decoded_binary                                         |decoded_string    |
// +------------------------+-------------------------------------------------------+------------------+
// |c2Vuc29yLUE6dGVtcD03Mi4x|[73 65 6E 73 6F 72 2D 41 3A 74 65 6D 70 3D 37 32 2E 31]|sensor-A:temp=72.1|
// |c2Vuc29yLUI6dGVtcD02OC41|[73 65 6E 73 6F 72 2D 42 3A 74 65 6D 70 3D 36 38 2E 35]|sensor-B:temp=68.5|
// |c2Vuc29yLUM6dGVtcD03NS4z|[73 65 6E 73 6F 72 2D 43 3A 74 65 6D 70 3D 37 35 2E 33]|sensor-C:temp=75.3|
// |c2Vuc29yLUQ6dGVtcD02OS44|[73 65 6E 73 6F 72 2D 44 3A 74 65 6D 70 3D 36 39 2E 38]|sensor-D:temp=69.8|
// +------------------------+-------------------------------------------------------+------------------+

The decoded_binary column shows the raw hex bytes. Casting to "string" interprets those bytes as UTF-8 and gives back the original readable text.

Null handling

Both base64 and unbase64 return null when the input is null. This follows Spark's standard null propagation — no special handling is needed.

val df = Seq(
  ("Alice", "alice@example.com"),
  ("Bob",   "bob@example.com"),
  ("Carol", null),
  ("Dave",  "dave@example.com"),
).toDF("name", "email")

val df2 = df
  .withColumn("encoded_email", base64(col("email")))

df2.show(false)
// +-----+-----------------+------------------------+
// |name |email            |encoded_email           |
// +-----+-----------------+------------------------+
// |Alice|alice@example.com|YWxpY2VAZXhhbXBsZS5jb20=|
// |Bob  |bob@example.com  |Ym9iQGV4YW1wbGUuY29t    |
// |Carol|null             |null                    |
// |Dave |dave@example.com |ZGF2ZUBleGFtcGxlLmNvbQ==|
// +-----+-----------------+------------------------+

Carol's null email produces a null encoded value — no exception, no empty string.

For cryptographic hashing of data, see the hashing functions (md5, sha1, sha2). For other encoding operations, Spark also provides hex and unhex for hexadecimal conversion.

Example Details

Created: 2026-03-29 10:33:37 PM

Last Updated: 2026-03-29 10:33:37 PM