Spark Scala ASCII and Char Functions
The ascii function returns the numeric code point of the first character in a string column. The chr and char SQL functions do the reverse — they convert an integer code point back to a character. Together they let you move between characters and their numeric representations.
def ascii(e: Column): Column
ascii takes a string column and returns the numeric value (ASCII/Unicode code point) of its first character as an integer. If the string has multiple characters, only the first one matters. Null input returns null, and an empty string returns 0.
val df = Seq(
"Alice",
"Bob",
"carol",
"123 Main St",
"!@#",
).toDF("text")
val df2 = df
.withColumn("ascii_value", ascii(col("text")))
df2.show(false)
// +-----------+-----------+
// |text |ascii_value|
// +-----------+-----------+
// |Alice |65 |
// |Bob |66 |
// |carol |99 |
// |123 Main St|49 |
// |!@# |33 |
// +-----------+-----------+
Uppercase A is 65, B is 66, lowercase c is 99, 1 is 49, and ! is 33. These are standard ASCII values.
Converting code points to characters with chr
The chr and char functions are the inverse of ascii — they take an integer code point and return the corresponding character. These aren't available as Scala API functions, but you can use them through expr():
val df = Seq(
65, 90, 97, 122, 48, 33,
).toDF("code_point")
val df2 = df
.withColumn("character", expr("chr(code_point)"))
df2.show(false)
// +----------+---------+
// |code_point|character|
// +----------+---------+
// |65 |A |
// |90 |Z |
// |97 |a |
// |122 |z |
// |48 |0 |
// |33 |! |
// +----------+---------+
chr and char behave identically — use whichever you prefer. Both are SQL-only functions in Spark 3.4.1.
Nulls and empty strings
ascii returns null for null input and 0 for an empty string. Round-tripping with chr(ascii(...)) preserves these semantics — chr(null) returns null, and chr(0) returns an empty string:
val df = Seq(
"Hello",
"World",
null,
"",
).toDF("word")
val df2 = df
.withColumn("code", ascii(col("word")))
.withColumn("back_to_char", expr("chr(ascii(word))"))
df2.show(false)
// +-----+----+------------+
// |word |code|back_to_char|
// +-----+----+------------+
// |Hello|72 |H |
// |World|87 |W |
// |null |null|null |
// | |0 | |
// +-----+----+------------+
Note that round-tripping only recovers the first character — ascii discards everything after it.
Related functions
For other character-level string operations, see translate for character-by-character substitution, substring for extracting portions of strings, or split for breaking strings into arrays.