Spark Scala Hex and Unhex

hex converts an integer or string column to its hexadecimal representation. unhex does the reverse — it decodes a hex string back to binary. These are useful when working with low-level data formats, color codes, or any system that uses hex encoding.

Converting integers to hex

def hex(column: Column): Column

When applied to an integer column, hex returns the hexadecimal string representation of that number. This works for any integer type — Int, Long, Short, or Byte.

val df = Seq(
  255,
  1024,
  42,
  0,
  65535,
).toDF("value")

val df2 = df
  .withColumn("hex_value", hex(col("value")))

df2.show(false)
// +-----+---------+
// |value|hex_value|
// +-----+---------+
// |255  |FF       |
// |1024 |400      |
// |42   |2A       |
// |0    |0        |
// |65535|FFFF     |
// +-----+---------+

Converting strings to hex

When applied to a string column, hex encodes each character as its UTF-8 byte value in hexadecimal. Each byte becomes two hex characters.

val df = Seq(
  "Hello",
  "Spark",
  "Scala",
  "DataFrame",
).toDF("text")

val df2 = df
  .withColumn("hex_text", hex(col("text")))

df2.show(false)
// +---------+------------------+
// |text     |hex_text          |
// +---------+------------------+
// |Hello    |48656C6C6F        |
// |Spark    |537061726B        |
// |Scala    |5363616C61        |
// |DataFrame|446174614672616D65|
// +---------+------------------+

Each letter maps to its ASCII hex code — H is 48, e is 65, and so on.

Decoding with unhex

def unhex(column: Column): Column

unhex takes a hex-encoded string and returns the decoded value as binary (Array[Byte]). To get a readable string back, cast the result to StringType.

val df = Seq(
  "48656C6C6F",
  "537061726B",
  "5363616C61",
  "446174614672616D65",
).toDF("hex_string")

val df2 = df
  .withColumn("decoded_binary", unhex(col("hex_string")))
  .withColumn("decoded_text", col("decoded_binary").cast("string"))

df2.show(false)
// +------------------+----------------------------+------------+
// |hex_string        |decoded_binary              |decoded_text|
// +------------------+----------------------------+------------+
// |48656C6C6F        |[48 65 6C 6C 6F]            |Hello       |
// |537061726B        |[53 70 61 72 6B]            |Spark       |
// |5363616C61        |[53 63 61 6C 61]            |Scala       |
// |446174614672616D65|[44 61 74 61 46 72 61 6D 65]|DataFrame   |
// +------------------+----------------------------+------------+

The decoded_binary column shows the raw bytes in hex notation. Casting to "string" interprets those bytes as UTF-8.

Null handling

Both hex and unhex return null when the input is null. This follows Spark's standard null propagation.

val df = Seq(
  ("Alice", "seattle"),
  ("Bob",   null),
  ("Carol", "portland"),
  ("Dave",  "austin"),
).toDF("name", "city")

val df2 = df
  .withColumn("city_hex", hex(col("city")))
  .withColumn("city_unhex", unhex(col("city_hex")))
  .withColumn("city_restored", col("city_unhex").cast("string"))

df2.show(false)
// +-----+--------+----------------+-------------------------+-------------+
// |name |city    |city_hex        |city_unhex               |city_restored|
// +-----+--------+----------------+-------------------------+-------------+
// |Alice|seattle |73656174746C65  |[73 65 61 74 74 6C 65]   |seattle      |
// |Bob  |null    |null            |null                     |null         |
// |Carol|portland|706F72746C616E64|[70 6F 72 74 6C 61 6E 64]|portland     |
// |Dave |austin  |61757374696E    |[61 75 73 74 69 6E]      |austin       |
// +-----+--------+----------------+-------------------------+-------------+

Bob's null city flows through as null for both hex and unhex — no exception, no empty string.

For Base64 encoding and decoding, see base64 and unbase64. For cryptographic hashing that produces hex output, see the hashing functions (md5, sha1, sha2).

Spark Scala Hex and Unhex

Converting integers to hex

Converting strings to hex

Decoding with unhex

Null handling

Related functions