Spark Scala Hex and Unhex
hex converts an integer or string column to its hexadecimal representation. unhex does the reverse — it decodes a hex string back to binary. These are useful when working with low-level data formats, color codes, or any system that uses hex encoding.
Converting integers to hex
def hex(column: Column): Column
When applied to an integer column, hex returns the hexadecimal string representation of that number. This works for any integer type — Int, Long, Short, or Byte.
val df = Seq(
255,
1024,
42,
0,
65535,
).toDF("value")
val df2 = df
.withColumn("hex_value", hex(col("value")))
df2.show(false)
// +-----+---------+
// |value|hex_value|
// +-----+---------+
// |255 |FF |
// |1024 |400 |
// |42 |2A |
// |0 |0 |
// |65535|FFFF |
// +-----+---------+
Converting strings to hex
When applied to a string column, hex encodes each character as its UTF-8 byte value in hexadecimal. Each byte becomes two hex characters.
val df = Seq(
"Hello",
"Spark",
"Scala",
"DataFrame",
).toDF("text")
val df2 = df
.withColumn("hex_text", hex(col("text")))
df2.show(false)
// +---------+------------------+
// |text |hex_text |
// +---------+------------------+
// |Hello |48656C6C6F |
// |Spark |537061726B |
// |Scala |5363616C61 |
// |DataFrame|446174614672616D65|
// +---------+------------------+
Each letter maps to its ASCII hex code — H is 48, e is 65, and so on.
Decoding with unhex
def unhex(column: Column): Column
unhex takes a hex-encoded string and returns the decoded value as binary (Array[Byte]). To get a readable string back, cast the result to StringType.
val df = Seq(
"48656C6C6F",
"537061726B",
"5363616C61",
"446174614672616D65",
).toDF("hex_string")
val df2 = df
.withColumn("decoded_binary", unhex(col("hex_string")))
.withColumn("decoded_text", col("decoded_binary").cast("string"))
df2.show(false)
// +------------------+----------------------------+------------+
// |hex_string |decoded_binary |decoded_text|
// +------------------+----------------------------+------------+
// |48656C6C6F |[48 65 6C 6C 6F] |Hello |
// |537061726B |[53 70 61 72 6B] |Spark |
// |5363616C61 |[53 63 61 6C 61] |Scala |
// |446174614672616D65|[44 61 74 61 46 72 61 6D 65]|DataFrame |
// +------------------+----------------------------+------------+
The decoded_binary column shows the raw bytes in hex notation. Casting to "string" interprets those bytes as UTF-8.
Null handling
Both hex and unhex return null when the input is null. This follows Spark's standard null propagation.
val df = Seq(
("Alice", "seattle"),
("Bob", null),
("Carol", "portland"),
("Dave", "austin"),
).toDF("name", "city")
val df2 = df
.withColumn("city_hex", hex(col("city")))
.withColumn("city_unhex", unhex(col("city_hex")))
.withColumn("city_restored", col("city_unhex").cast("string"))
df2.show(false)
// +-----+--------+----------------+-------------------------+-------------+
// |name |city |city_hex |city_unhex |city_restored|
// +-----+--------+----------------+-------------------------+-------------+
// |Alice|seattle |73656174746C65 |[73 65 61 74 74 6C 65] |seattle |
// |Bob |null |null |null |null |
// |Carol|portland|706F72746C616E64|[70 6F 72 74 6C 61 6E 64]|portland |
// |Dave |austin |61757374696E |[61 75 73 74 69 6E] |austin |
// +-----+--------+----------------+-------------------------+-------------+
Bob's null city flows through as null for both hex and unhex — no exception, no empty string.
Related functions
For Base64 encoding and decoding, see base64 and unbase64. For cryptographic hashing that produces hex output, see the hashing functions (md5, sha1, sha2).