Spark Scala Year, Month, and Day Extraction
Spark provides a small family of functions for pulling individual date parts — year, month, day, week, quarter — out of a date or timestamp column. They're the building blocks for grouping by month, filtering by quarter, or partitioning a table by year.
All of these functions accept a date, timestamp, or ISO-formatted date string and return an integer.
Year, Month, and Day of Month
The three most common extractions are year, month, and dayofmonth:
def year(e: Column): Column
def month(e: Column): Column
def dayofmonth(e: Column): Column
val df = Seq(
"2026-01-15",
"2025-07-04",
"2024-12-31",
"2023-03-08",
).toDF("event_date")
val df2 = df
.withColumn("year", year(col("event_date")))
.withColumn("month", month(col("event_date")))
.withColumn("day", dayofmonth(col("event_date")))
df2.show(false)
// +----------+----+-----+---+
// |event_date|year|month|day|
// +----------+----+-----+---+
// |2026-01-15|2026|1 |15 |
// |2025-07-04|2025|7 |4 |
// |2024-12-31|2024|12 |31 |
// |2023-03-08|2023|3 |8 |
// +----------+----+-----+---+
month returns 1 through 12 and dayofmonth returns 1 through 31. The input column can be a date, timestamp, or a string in yyyy-MM-dd format — Spark will parse it for you.
Day: the SQL alias for dayofmonth
Spark also exposes a SQL function called day that does the same thing as dayofmonth. It isn't available directly in org.apache.spark.sql.functions, so call it through expr():
def day(date): Column — via expr()
val df = Seq(
"2026-01-15",
"2025-07-04",
"2024-12-31",
).toDF("event_date")
val df2 = df
.withColumn("dayofmonth", dayofmonth(col("event_date")))
.withColumn("day", expr("day(event_date)"))
df2.show(false)
// +----------+----------+---+
// |event_date|dayofmonth|day|
// +----------+----------+---+
// |2026-01-15|15 |15 |
// |2025-07-04|4 |4 |
// |2024-12-31|31 |31 |
// +----------+----------+---+
For new Scala code, prefer dayofmonth — it's the canonical name and doesn't require expr(). The day alias is mainly there for SQL compatibility.
Day of Week, Day of Year, and Week of Year
Beyond the calendar date itself, Spark exposes a few different "day" extractions for working with weeks and seasonal patterns:
def dayofweek(e: Column): Column
def weekday(date): Column — via expr()
def dayofyear(e: Column): Column
def weekofyear(e: Column): Column
The dayofweek function first appeared in version 2.3.0, and weekday first appeared in version 2.4.0.
The two day-of-week functions return different numbering schemes, which is the most common gotcha in this group:
dayofweekreturns1for Sunday through7for Saturday.weekdayreturns0for Monday through6for Sunday.
dayofyear returns 1 through 366, and weekofyear returns the ISO 8601 week number — 1 through 53.
val df = Seq(
"2026-01-15",
"2025-07-04",
"2024-12-31",
"2023-03-08",
"2026-12-25",
).toDF("event_date")
val df2 = df
.withColumn("dayofweek", dayofweek(col("event_date")))
.withColumn("weekday", expr("weekday(event_date)"))
.withColumn("dayofyear", dayofyear(col("event_date")))
.withColumn("weekofyear", weekofyear(col("event_date")))
df2.show(false)
// +----------+---------+-------+---------+----------+
// |event_date|dayofweek|weekday|dayofyear|weekofyear|
// +----------+---------+-------+---------+----------+
// |2026-01-15|5 |3 |15 |3 |
// |2025-07-04|6 |4 |185 |27 |
// |2024-12-31|3 |1 |366 |1 |
// |2023-03-08|4 |2 |67 |10 |
// |2026-12-25|6 |4 |359 |52 |
// +----------+---------+-------+---------+----------+
A couple of things worth pointing out from the output:
- December 31, 2024 has
dayofyear = 366because 2024 was a leap year. - That same date has
weekofyear = 1, not53. ISO 8601 weeks belong to the year that contains their Thursday, so the week of December 31, 2024 (a Tuesday) is the first week of 2025. If you're partitioning data by(year, weekofyear), be aware that the year part of the timestamp might not match the ISO week's year.
Quarter
quarter returns 1 through 4 based on the calendar month:
def quarter(e: Column): Column
val df = Seq(
"2026-01-15",
"2026-04-30",
"2026-07-04",
"2026-10-12",
"2026-12-31",
).toDF("event_date")
val df2 = df
.withColumn("month", month(col("event_date")))
.withColumn("quarter", quarter(col("event_date")))
df2.show(false)
// +----------+-----+-------+
// |event_date|month|quarter|
// +----------+-----+-------+
// |2026-01-15|1 |1 |
// |2026-04-30|4 |2 |
// |2026-07-04|7 |3 |
// |2026-10-12|10 |4 |
// |2026-12-31|12 |4 |
// +----------+-----+-------+
January through March is 1, April through June is 2, and so on. There's no built-in fiscal-quarter offset — if your fiscal year doesn't start in January, you'll need to derive it yourself from month.
Null handling
All of these functions return null when the input is null, so they're safe to use on columns that may have missing dates without extra guarding:
val df = Seq(
Some("2026-01-15"),
None,
Some("2025-07-04"),
).toDF("event_date")
val df2 = df
.withColumn("year", year(col("event_date")))
.withColumn("month", month(col("event_date")))
.withColumn("day", dayofmonth(col("event_date")))
df2.show(false)
// +----------+----+-----+----+
// |event_date|year|month|day |
// +----------+----+-----+----+
// |2026-01-15|2026|1 |15 |
// |null |null|null |null|
// |2025-07-04|2025|7 |4 |
// +----------+----+-----+----+
If the input is a string that can't be parsed as a date, the result is also null rather than an error.
Related Functions
For getting the current date or timestamp to extract from, see current_date and current_timestamp.