Job Board
Consulting

Spark Scala Hex and Unhex

hex converts an integer or string column to its hexadecimal representation. unhex does the reverse — it decodes a hex string back to binary. These are useful when working with low-level data formats, color codes, or any system that uses hex encoding.

Converting integers to hex

def hex(column: Column): Column

When applied to an integer column, hex returns the hexadecimal string representation of that number. This works for any integer type — Int, Long, Short, or Byte.

val df = Seq(
  255,
  1024,
  42,
  0,
  65535,
).toDF("value")

val df2 = df
  .withColumn("hex_value", hex(col("value")))

df2.show(false)
// +-----+---------+
// |value|hex_value|
// +-----+---------+
// |255  |FF       |
// |1024 |400      |
// |42   |2A       |
// |0    |0        |
// |65535|FFFF     |
// +-----+---------+

Converting strings to hex

When applied to a string column, hex encodes each character as its UTF-8 byte value in hexadecimal. Each byte becomes two hex characters.

val df = Seq(
  "Hello",
  "Spark",
  "Scala",
  "DataFrame",
).toDF("text")

val df2 = df
  .withColumn("hex_text", hex(col("text")))

df2.show(false)
// +---------+------------------+
// |text     |hex_text          |
// +---------+------------------+
// |Hello    |48656C6C6F        |
// |Spark    |537061726B        |
// |Scala    |5363616C61        |
// |DataFrame|446174614672616D65|
// +---------+------------------+

Each letter maps to its ASCII hex code — H is 48, e is 65, and so on.

Decoding with unhex

def unhex(column: Column): Column

unhex takes a hex-encoded string and returns the decoded value as binary (Array[Byte]). To get a readable string back, cast the result to StringType.

val df = Seq(
  "48656C6C6F",
  "537061726B",
  "5363616C61",
  "446174614672616D65",
).toDF("hex_string")

val df2 = df
  .withColumn("decoded_binary", unhex(col("hex_string")))
  .withColumn("decoded_text", col("decoded_binary").cast("string"))

df2.show(false)
// +------------------+----------------------------+------------+
// |hex_string        |decoded_binary              |decoded_text|
// +------------------+----------------------------+------------+
// |48656C6C6F        |[48 65 6C 6C 6F]            |Hello       |
// |537061726B        |[53 70 61 72 6B]            |Spark       |
// |5363616C61        |[53 63 61 6C 61]            |Scala       |
// |446174614672616D65|[44 61 74 61 46 72 61 6D 65]|DataFrame   |
// +------------------+----------------------------+------------+

The decoded_binary column shows the raw bytes in hex notation. Casting to "string" interprets those bytes as UTF-8.

Null handling

Both hex and unhex return null when the input is null. This follows Spark's standard null propagation.

val df = Seq(
  ("Alice", "seattle"),
  ("Bob",   null),
  ("Carol", "portland"),
  ("Dave",  "austin"),
).toDF("name", "city")

val df2 = df
  .withColumn("city_hex", hex(col("city")))
  .withColumn("city_unhex", unhex(col("city_hex")))
  .withColumn("city_restored", col("city_unhex").cast("string"))

df2.show(false)
// +-----+--------+----------------+-------------------------+-------------+
// |name |city    |city_hex        |city_unhex               |city_restored|
// +-----+--------+----------------+-------------------------+-------------+
// |Alice|seattle |73656174746C65  |[73 65 61 74 74 6C 65]   |seattle      |
// |Bob  |null    |null            |null                     |null         |
// |Carol|portland|706F72746C616E64|[70 6F 72 74 6C 61 6E 64]|portland     |
// |Dave |austin  |61757374696E    |[61 75 73 74 69 6E]      |austin       |
// +-----+--------+----------------+-------------------------+-------------+

Bob's null city flows through as null for both hex and unhex — no exception, no empty string.

For Base64 encoding and decoding, see base64 and unbase64. For cryptographic hashing that produces hex output, see the hashing functions (md5, sha1, sha2).

Example Details

Created: 2026-03-30 10:59:22 PM

Last Updated: 2026-03-30 10:59:22 PM