The Reference You Need
Spark Scala Examples
Simple spark scala examples to help you quickly complete your data etl pipelines. Save time digging through the spark scala function api and instead get right to the code you need...
Page 2 of 8
-
make_date and make_timestamp in Spark Scala: Build Dates and Timestamps from Parts in a DataFrame
The make_date and make_timestamp family of functions construct date and timestamp values from separate numeric columns — year, month, day, and (for timestamps) hour, minute, and second. They're the inverse of extracting parts from a date: instead of pulling fields out, you assemble fields into a single value.
-
months_between in Spark Scala: Months Between Two Dates in a DataFrame
months_between returns the number of months between two date or timestamp columns as a Double. It's the right tool when you want a months gap rather than a days gap — tenure in months, age of a record, billing cycles, anything where calendar months matter more than raw day counts.
-
last_day and next_day in Spark Scala: Month-End and Next Weekday in a DataFrame
last_day returns the last day of the month that a given date falls in, and next_day returns the first date after a given date that lands on a particular weekday. Both are handy for building reporting periods, billing cycles, and scheduling logic.
-
date_from_unix_date and unix_date in Spark Scala: Convert Between Dates and Day Counts
date_from_unix_date converts a day count (days since 1970-01-01) into a calendar date, and unix_date does the reverse — turning a date into the number of days since the epoch. They're useful when your data uses integer day offsets for compact storage or interoperability with other systems.
-
from_utc_timestamp, to_utc_timestamp, and convert_timezone in Spark Scala: Shift Timestamps Between Time Zones in a DataFrame
from_utc_timestamp, to_utc_timestamp, and convert_timezone shift a timestamp column between time zones. Use them when your data is stored in one zone (almost always UTC for warehouse data) but you need to report, filter, or join in another — they handle daylight saving transitions for you, so you don't have to do offset math by hand.
-
timestamp_seconds, timestamp_millis, timestamp_micros, unix_seconds, unix_millis, unix_micros in Spark Scala: Convert Between Epoch Numbers and Timestamps in a DataFrame
Six paired functions convert between Unix epoch numbers and timestamp columns at three precision levels: timestamp_seconds, timestamp_millis, and timestamp_micros go from a numeric column to a timestamp, and unix_seconds, unix_millis, and unix_micros go the other way. Reach for them when you're ingesting long epoch values from Kafka, log files, or APIs and need a real timestamp type to filter, format, or join on — or when you need to emit timestamps as numeric values for downstream systems.
-
unix_timestamp and from_unixtime in Spark Scala: Convert Between Strings and Unix Epoch Seconds in a DataFrame
unix_timestamp and from_unixtime are the bridge between human-readable timestamp strings and Unix epoch seconds. Reach for them when integrating with systems that store time as a long — Kafka payloads, log files, REST APIs — or when you need to do timestamp math that's easier in seconds than in calendar fields. to_unix_timestamp is the SQL-only sibling of unix_timestamp, called through expr().
-
to_date and to_timestamp in Spark Scala: Parse Strings into Date and Timestamp DataFrame Columns
to_date and to_timestamp parse string columns into proper date and timestamp types. They're the functions you reach for whenever raw data lands as text — CSV files, JSON payloads, or upstream queries — and you need to do anything date-related with it. Spark 3.4 also adds to_timestamp_ntz and to_timestamp_ltz for explicit time-zone semantics.
-
date_trunc and trunc in Spark Scala: Truncate Date and Timestamp Columns in a DataFrame
date_trunc and trunc round a date or timestamp column down to a coarser unit — the start of the hour, day, month, quarter, or year. They're the go-to functions for bucketing events into time windows for grouping, partitioning, or aligning data to a calendar boundary.
-
date_format in Spark Scala: Format Date and Timestamp Columns in a DataFrame
date_format converts a date, timestamp, or string column into a formatted string column using a pattern made of letters like yyyy, MM, dd, HH, and EEEE. Use it whenever you need a human-readable date label, a custom export format, or a partition-friendly column like 2026-01.