Spark Scala Bin and Conv
bin returns the binary string representation of a long integer. conv is the more general tool — it converts a number string from one base to another, covering decimal, hex, octal, binary, or anything in between. Use them when you're working with bit flags, color codes, or any data where the representation matters as much as the value.
Converting longs to binary with bin
def bin(e: Column): Column
def bin(columnName: String): Column
bin takes a long integer column and returns its binary representation as a string. The two overloads are equivalent — one takes a Column, the other takes a column name. There's no leading zero padding; the output is just the minimal binary digits needed.
val df = Seq(
0L,
5L,
12L,
255L,
1024L,
).toDF("value")
val df2 = df
.withColumn("binary", bin(col("value")))
df2.show(false)
// +-----+-----------+
// |value|binary |
// +-----+-----------+
// |0 |0 |
// |5 |101 |
// |12 |1100 |
// |255 |11111111 |
// |1024 |10000000000|
// +-----+-----------+
5 becomes 101, 12 becomes 1100, and 255 (the largest value that fits in a byte) becomes eight ones.
Negative numbers and nulls
bin operates on the underlying 64-bit two's complement representation, so negative numbers come out as 64-bit binary strings. Null inputs produce null outputs.
val df = Seq(
("Alice", Some(7L)),
("Bob", None),
("Carol", Some(-1L)),
("Dave", Some(42L)),
).toDF("name", "flags")
val df2 = df
.withColumn("flags_binary", bin(col("flags")))
df2.show(false)
// +-----+-----+----------------------------------------------------------------+
// |name |flags|flags_binary |
// +-----+-----+----------------------------------------------------------------+
// |Alice|7 |111 |
// |Bob |null |null |
// |Carol|-1 |1111111111111111111111111111111111111111111111111111111111111111|
// |Dave |42 |101010 |
// +-----+-----+----------------------------------------------------------------+
-1 becomes 64 ones — that's the two's complement representation of -1 in a 64-bit signed integer. If you only care about positive values, this is rarely an issue, but it's worth knowing if you might pass negative numbers in.
Converting between arbitrary bases with conv
def conv(num: Column, fromBase: Int, toBase: Int): Column
conv is the general-purpose base converter. Pass a string column containing a number, the base it's currently in, and the base you want it converted to. Both bases can be anywhere from 2 to 36.
The input is a string column — not a numeric one — because the digits in bases above 10 (like A-F for hex) aren't valid numbers.
val df = Seq(
"0",
"10",
"255",
"1024",
"65535",
).toDF("decimal")
val df2 = df
.withColumn("binary", conv(col("decimal"), 10, 2))
.withColumn("hex", conv(col("decimal"), 10, 16))
.withColumn("octal", conv(col("decimal"), 10, 8))
df2.show(false)
// +-------+----------------+----+------+
// |decimal|binary |hex |octal |
// +-------+----------------+----+------+
// |0 |0 |0 |0 |
// |10 |1010 |A |12 |
// |255 |11111111 |FF |377 |
// |1024 |10000000000 |400 |2000 |
// |65535 |1111111111111111|FFFF|177777|
// +-------+----------------+----+------+
The same input produces different representations depending on the target base. Hex output uses uppercase A-F.
Converting hex back to decimal or binary
conv is symmetric — flip the fromBase and toBase arguments to go the other direction. Hex strings are common in color codes, memory addresses, and binary data dumps.
val df = Seq(
"FF",
"1A",
"DEADBEEF",
"100",
).toDF("hex")
val df2 = df
.withColumn("decimal", conv(col("hex"), 16, 10))
.withColumn("binary", conv(col("hex"), 16, 2))
df2.show(false)
// +--------+----------+--------------------------------+
// |hex |decimal |binary |
// +--------+----------+--------------------------------+
// |FF |255 |11111111 |
// |1A |26 |11010 |
// |DEADBEEF|3735928559|11011110101011011011111011101111|
// |100 |256 |100000000 |
// +--------+----------+--------------------------------+
Note that 100 in hex is 256 in decimal — the input is interpreted in the source base, not as the literal string of digits.
Converting binary strings
conv also handles binary strings — useful when you have a stored representation and want to recover the original number or convert it to a more compact form.
val df = Seq(
"10",
"1010",
"11111111",
"100000000000",
).toDF("binary")
val df2 = df
.withColumn("decimal", conv(col("binary"), 2, 10))
.withColumn("hex", conv(col("binary"), 2, 16))
df2.show(false)
// +------------+-------+---+
// |binary |decimal|hex|
// +------------+-------+---+
// |10 |2 |2 |
// |1010 |10 |A |
// |11111111 |255 |FF |
// |100000000000|2048 |800|
// +------------+-------+---+
bin vs conv
bin(x) is equivalent to conv(cast(x, "string"), 10, 2) for non-negative values — both produce the binary representation. The differences:
- Input type:
bintakes a long column directly.convrequires a string column. - Negative numbers:
binuses 64-bit two's complement (negative numbers become long strings of ones).convdoesn't natively handle negative number conversion in the same way — it works on the unsigned interpretation of the digits you give it. - Flexibility:
convcan target any base from 2 to 36;binonly produces binary.
If you have a numeric column and just want binary, use bin. If you need any other base or you're starting from a string, use conv.
Related functions
For converting integers and strings to hexadecimal specifically, see hex and unhex. For other low-level encoding tasks, see base64 and unbase64 and decode and encode.