The Reference You Need
Spark Scala Examples
Simple spark scala examples to help you quickly complete your data etl pipelines. Save time digging through the spark scala function api and instead get right to the code you need...
Page 1 of 6
-
date_trunc and trunc in Spark Scala: Truncate Date and Timestamp Columns in a DataFrame
date_trunc and trunc round a date or timestamp column down to a coarser unit — the start of the hour, day, month, quarter, or year. They're the go-to functions for bucketing events into time windows for grouping, partitioning, or aligning data to a calendar boundary.
-
date_format in Spark Scala: Format Date and Timestamp Columns in a DataFrame
date_format converts a date, timestamp, or string column into a formatted string column using a pattern made of letters like yyyy, MM, dd, HH, and EEEE. Use it whenever you need a human-readable date label, a custom export format, or a partition-friendly column like 2026-01.
-
datediff and date_diff in Spark Scala: Days Between Two Dates in a DataFrame
datediff returns the number of days between two date columns. It's the go-to function for computing things like "days to ship", "days since signup", or "age in days". Spark 3.4 added a SQL-only alias date_diff that does the same thing — useful when you're writing a SQL expression but otherwise interchangeable.
-
date_add, date_sub, and add_months in Spark Scala: Shift Dates in a DataFrame
date_add, date_sub, and add_months shift a date column forward or backward by a fixed number of days or months. They're the bread-and-butter functions for computing things like "30 days from order date", "one month before renewal", or "next billing cycle". Spark 3.4 added a one-word alias, dateadd, that's only reachable through expr().
-
date_part, datepart, and extract in Spark Scala: Generic Date Part Extraction
date_part, datepart, and extract are generic ways to pull a single field out of a date, timestamp, or interval column. They cover the same ground as the dedicated functions like year, month, dayofmonth and hour, minute, second, but with one function call where the field name is a parameter — useful when the field you need is decided at runtime or driven by config.
-
Hour, Minute, Second Extraction in Spark Scala: hour, minute, second
Spark provides hour, minute, and second for pulling the time-of-day components out of a timestamp column. They're the time-side counterparts to year, month, and day and are useful for bucketing events by hour of day, filtering work hours, or building time-based features.
-
Year, Month, Day Extraction in Spark Scala: year, month, dayofmonth, dayofweek, dayofyear, weekofyear, quarter
Spark provides a small family of functions for pulling individual date parts — year, month, day, week, quarter — out of a date or timestamp column. They're the building blocks for grouping by month, filtering by quarter, or partitioning a table by year.
-
Current Date and Timestamp in Spark Scala: current_date, current_timestamp, localtimestamp, curdate, now, current_timezone
Spark provides a small family of functions for getting the current date, time, and session timezone — useful for tagging records with a load time, calculating ages, or filtering on "today". They all return the value at the start of query evaluation, so every call within a single query sees the same value.
-
elt in Spark Scala: Pick the N-th Value from a List of DataFrame Columns
elt returns the n-th value from a list of column expressions, where n is a 1-based index. It's handy when you have an integer column that points at one of several sibling columns (or literal values) and you want to materialize the chosen value into a single result column.
-
bin and conv in Spark Scala: Number Base Conversion in DataFrames
bin returns the binary string representation of a long integer. conv is the more general tool — it converts a number string from one base to another, covering decimal, hex, octal, binary, or anything in between. Use them when you're working with bit flags, color codes, or any data where the representation matters as much as the value.