Job Board
Consulting

Spark Scala Year, Month, and Day Extraction

Spark provides a small family of functions for pulling individual date parts — year, month, day, week, quarter — out of a date or timestamp column. They're the building blocks for grouping by month, filtering by quarter, or partitioning a table by year.

All of these functions accept a date, timestamp, or ISO-formatted date string and return an integer.

Year, Month, and Day of Month

The three most common extractions are year, month, and dayofmonth:

def year(e: Column): Column

def month(e: Column): Column

def dayofmonth(e: Column): Column
val df = Seq(
  "2026-01-15",
  "2025-07-04",
  "2024-12-31",
  "2023-03-08",
).toDF("event_date")

val df2 = df
  .withColumn("year",  year(col("event_date")))
  .withColumn("month", month(col("event_date")))
  .withColumn("day",   dayofmonth(col("event_date")))

df2.show(false)
// +----------+----+-----+---+
// |event_date|year|month|day|
// +----------+----+-----+---+
// |2026-01-15|2026|1    |15 |
// |2025-07-04|2025|7    |4  |
// |2024-12-31|2024|12   |31 |
// |2023-03-08|2023|3    |8  |
// +----------+----+-----+---+

month returns 1 through 12 and dayofmonth returns 1 through 31. The input column can be a date, timestamp, or a string in yyyy-MM-dd format — Spark will parse it for you.

Day: the SQL alias for dayofmonth

Spark also exposes a SQL function called day that does the same thing as dayofmonth. It isn't available directly in org.apache.spark.sql.functions, so call it through expr():

def day(date): Column — via expr()
val df = Seq(
  "2026-01-15",
  "2025-07-04",
  "2024-12-31",
).toDF("event_date")

val df2 = df
  .withColumn("dayofmonth", dayofmonth(col("event_date")))
  .withColumn("day",        expr("day(event_date)"))

df2.show(false)
// +----------+----------+---+
// |event_date|dayofmonth|day|
// +----------+----------+---+
// |2026-01-15|15        |15 |
// |2025-07-04|4         |4  |
// |2024-12-31|31        |31 |
// +----------+----------+---+

For new Scala code, prefer dayofmonth — it's the canonical name and doesn't require expr(). The day alias is mainly there for SQL compatibility.

Day of Week, Day of Year, and Week of Year

Beyond the calendar date itself, Spark exposes a few different "day" extractions for working with weeks and seasonal patterns:

def dayofweek(e: Column): Column

def weekday(date): Column — via expr()

def dayofyear(e: Column): Column

def weekofyear(e: Column): Column

The dayofweek function first appeared in version 2.3.0, and weekday first appeared in version 2.4.0.

The two day-of-week functions return different numbering schemes, which is the most common gotcha in this group:

  • dayofweek returns 1 for Sunday through 7 for Saturday.
  • weekday returns 0 for Monday through 6 for Sunday.

dayofyear returns 1 through 366, and weekofyear returns the ISO 8601 week number — 1 through 53.

val df = Seq(
  "2026-01-15",
  "2025-07-04",
  "2024-12-31",
  "2023-03-08",
  "2026-12-25",
).toDF("event_date")

val df2 = df
  .withColumn("dayofweek",  dayofweek(col("event_date")))
  .withColumn("weekday",    expr("weekday(event_date)"))
  .withColumn("dayofyear",  dayofyear(col("event_date")))
  .withColumn("weekofyear", weekofyear(col("event_date")))

df2.show(false)
// +----------+---------+-------+---------+----------+
// |event_date|dayofweek|weekday|dayofyear|weekofyear|
// +----------+---------+-------+---------+----------+
// |2026-01-15|5        |3      |15       |3         |
// |2025-07-04|6        |4      |185      |27        |
// |2024-12-31|3        |1      |366      |1         |
// |2023-03-08|4        |2      |67       |10        |
// |2026-12-25|6        |4      |359      |52        |
// +----------+---------+-------+---------+----------+

A couple of things worth pointing out from the output:

  • December 31, 2024 has dayofyear = 366 because 2024 was a leap year.
  • That same date has weekofyear = 1, not 53. ISO 8601 weeks belong to the year that contains their Thursday, so the week of December 31, 2024 (a Tuesday) is the first week of 2025. If you're partitioning data by (year, weekofyear), be aware that the year part of the timestamp might not match the ISO week's year.

Quarter

quarter returns 1 through 4 based on the calendar month:

def quarter(e: Column): Column
val df = Seq(
  "2026-01-15",
  "2026-04-30",
  "2026-07-04",
  "2026-10-12",
  "2026-12-31",
).toDF("event_date")

val df2 = df
  .withColumn("month",   month(col("event_date")))
  .withColumn("quarter", quarter(col("event_date")))

df2.show(false)
// +----------+-----+-------+
// |event_date|month|quarter|
// +----------+-----+-------+
// |2026-01-15|1    |1      |
// |2026-04-30|4    |2      |
// |2026-07-04|7    |3      |
// |2026-10-12|10   |4      |
// |2026-12-31|12   |4      |
// +----------+-----+-------+

January through March is 1, April through June is 2, and so on. There's no built-in fiscal-quarter offset — if your fiscal year doesn't start in January, you'll need to derive it yourself from month.

Null handling

All of these functions return null when the input is null, so they're safe to use on columns that may have missing dates without extra guarding:

val df = Seq(
  Some("2026-01-15"),
  None,
  Some("2025-07-04"),
).toDF("event_date")

val df2 = df
  .withColumn("year",  year(col("event_date")))
  .withColumn("month", month(col("event_date")))
  .withColumn("day",   dayofmonth(col("event_date")))

df2.show(false)
// +----------+----+-----+----+
// |event_date|year|month|day |
// +----------+----+-----+----+
// |2026-01-15|2026|1    |15  |
// |null      |null|null |null|
// |2025-07-04|2025|7    |4   |
// +----------+----+-----+----+

If the input is a string that can't be parsed as a date, the result is also null rather than an error.

For getting the current date or timestamp to extract from, see current_date and current_timestamp.

Example Details

Created: 2026-05-03 11:55:08 AM

Last Updated: 2026-05-03 11:55:08 AM