The Reference You Need
Spark Scala Examples
Simple spark scala examples to help you quickly complete your data etl pipelines. Save time digging through the spark scala function api and instead get right to the code you need...
Page 1 of 4
-
base64 and unbase64 in Spark Scala: Encode and Decode Binary Data in DataFrames
base64 encodes a binary or string column into a Base64-encoded string. unbase64 does the reverse — it decodes a Base64 string back into binary. Together they let you safely represent binary data as printable text, which is useful when passing data through systems that only handle strings.
-
ASCII and Char: Convert Between Characters and Code Points in Spark Scala DataFrames
The ascii function returns the numeric code point of the first character in a string column. The chr and char SQL functions do the reverse — they convert an integer code point back to a character. Together they let you move between characters and their numeric representations.
-
reverse in Spark Scala: Reverse Strings in a DataFrame Column
The reverse function reverses the character order of a string column. It also works on array columns, reversing the element order — but this article focuses on string usage.
-
Repeat and Space: Duplicate Strings in Spark Scala DataFrames
The repeat function duplicates a string column a specified number of times. The space SQL function generates a string of spaces — it's shorthand for repeat(" ", n). Together they cover common needs like building separators, indenting text, and padding output.
-
replace in Spark Scala: Replace Substrings in a DataFrame Column
replace substitutes all occurrences of a substring within a string column. It's the straightforward choice when you need a literal find-and-replace without regular expressions.
-
overlay in Spark Scala: Replace or Insert Characters by Position in a DataFrame Column
overlay replaces a portion of a string column starting at a given position with a replacement string. It works like the SQL standard OVERLAY function and is useful for masking, patching, or inserting text at a specific character position.
-
translate in Spark Scala: Replace or Delete Characters in a DataFrame Column
translate performs character-by-character substitution within a string column. Each character in matchingString is replaced by the corresponding character in replaceString. If replaceString is shorter than matchingString, characters without a replacement are deleted entirely.
-
like, ilike, and rlike in Spark Scala DataFrames
like, ilike, and rlike are Column methods that match a string column against a pattern — like uses SQL LIKE wildcards, ilike does the same case-insensitively, and rlike matches against a full regular expression. All three return a boolean column.
-
contains, startsWith, and endsWith in Spark Scala DataFrames
contains, startsWith, and endsWith are Column methods that check whether a string column contains a substring, begins with a prefix, or ends with a suffix. They each return a boolean column — true, false, or null for null input.
-
instr and locate in Spark Scala: Find Substring Position in a DataFrame
instr and locate both find the position of a substring within a string column. They return the same result — the difference is just argument order and the fact that locate has an optional start-position parameter for finding occurrences beyond the first.