The Reference You Need
Spark Scala Examples
Simple spark scala examples to help you quickly complete your data etl pipelines. Save time digging through the spark scala function api and instead get right to the code you need...
Page 3 of 8
-
datediff and date_diff in Spark Scala: Days Between Two Dates in a DataFrame
datediff returns the number of days between two date columns. It's the go-to function for computing things like "days to ship", "days since signup", or "age in days". Spark 3.4 added a SQL-only alias date_diff that does the same thing — useful when you're writing a SQL expression but otherwise interchangeable.
-
date_add, date_sub, and add_months in Spark Scala: Shift Dates in a DataFrame
date_add, date_sub, and add_months shift a date column forward or backward by a fixed number of days or months. They're the bread-and-butter functions for computing things like "30 days from order date", "one month before renewal", or "next billing cycle". Spark 3.4 added a one-word alias, dateadd, that's only reachable through expr().
-
date_part, datepart, and extract in Spark Scala: Generic Date Part Extraction
date_part, datepart, and extract are generic ways to pull a single field out of a date, timestamp, or interval column. They cover the same ground as the dedicated functions like year, month, dayofmonth and hour, minute, second, but with one function call where the field name is a parameter — useful when the field you need is decided at runtime or driven by config.
-
Hour, Minute, Second Extraction in Spark Scala: hour, minute, second
Spark provides hour, minute, and second for pulling the time-of-day components out of a timestamp column. They're the time-side counterparts to year, month, and day and are useful for bucketing events by hour of day, filtering work hours, or building time-based features.
-
Year, Month, Day Extraction in Spark Scala: year, month, dayofmonth, dayofweek, dayofyear, weekofyear, quarter
Spark provides a small family of functions for pulling individual date parts — year, month, day, week, quarter — out of a date or timestamp column. They're the building blocks for grouping by month, filtering by quarter, or partitioning a table by year.
-
Current Date and Timestamp in Spark Scala: current_date, current_timestamp, localtimestamp, curdate, now, current_timezone
Spark provides a small family of functions for getting the current date, time, and session timezone — useful for tagging records with a load time, calculating ages, or filtering on "today". They all return the value at the start of query evaluation, so every call within a single query sees the same value.
-
elt in Spark Scala: Pick the N-th Value from a List of DataFrame Columns
elt returns the n-th value from a list of column expressions, where n is a 1-based index. It's handy when you have an integer column that points at one of several sibling columns (or literal values) and you want to materialize the chosen value into a single result column.
-
bin and conv in Spark Scala: Number Base Conversion in DataFrames
bin returns the binary string representation of a long integer. conv is the more general tool — it converts a number string from one base to another, covering decimal, hex, octal, binary, or anything in between. Use them when you're working with bit flags, color codes, or any data where the representation matters as much as the value.
-
Advanced Regex Functions in Spark Scala: regexp_count, regexp_extract_all, regexp_instr, regexp_like, regexp_substr
Beyond the familiar regexp_replace and regexp_extract, Spark 3.4 ships a set of regex helpers — regexp_count, regexp_extract_all, regexp_instr, regexp_like, and regexp_substr — that cover the common "count / extract-all / locate / test / grab-first" jobs that otherwise require chaining helpers or writing UDFs. They're SQL-only, so you reach them through expr() in Spark Scala.
-
url_encode, url_decode, and parse_url in Spark Scala: Work With URLs in a DataFrame
url_encode converts a string into application/x-www-form-urlencoded format so it can be safely used in a URL. url_decode reverses that transformation. parse_url extracts pieces of a URL — the host, path, query string, or a specific query parameter. All three are Spark SQL functions, so you call them through expr().