Job Board
Consulting

Spark Scala date_part, datepart, and extract

date_part, datepart, and extract are generic ways to pull a single field out of a date, timestamp, or interval column. They cover the same ground as the dedicated functions like year, month, dayofmonth and hour, minute, second, but with one function call where the field name is a parameter — useful when the field you need is decided at runtime or driven by config.

All three are Spark SQL functions. They aren't available directly in org.apache.spark.sql.functions, so you call them through expr():

def date_part(field, source): Column — via expr()

def datepart(field, source): Column — via expr()

def extract(field FROM source): Column — via expr()

date_part first appeared in version 3.0.0, extract first appeared in version 3.0.0, and datepart first appeared in version 3.4.0. All three return the same result for the same field — the only differences are the spelling and, for extract, the SQL FROM syntax.

The field argument is a string naming the part to extract. Common values include YEAR, QUARTER, MONTH, WEEK, DAY (alias DAYOFMONTH), DAYOFWEEK, DOW, DOY (alias DAYOFYEAR), HOUR, MINUTE, and SECOND. Field names are case-insensitive.

Extracting fields from a timestamp

Here's the typical use of date_part — pulling several different parts out of a timestamp column in one pass:

val df = Seq(
  "2026-01-15 09:05:30",
  "2025-07-04 14:30:45",
  "2024-12-31 23:59:59",
  "2023-03-08 00:00:00",
).toDF("event_time")

val df2 = df
  .withColumn("year",   expr("date_part('YEAR', event_time)"))
  .withColumn("month",  expr("date_part('MONTH', event_time)"))
  .withColumn("day",    expr("date_part('DAY', event_time)"))
  .withColumn("hour",   expr("date_part('HOUR', event_time)"))
  .withColumn("minute", expr("date_part('MINUTE', event_time)"))
  .withColumn("second", expr("date_part('SECOND', event_time)"))

df2.show(false)
// +-------------------+----+-----+---+----+------+---------+
// |event_time         |year|month|day|hour|minute|second   |
// +-------------------+----+-----+---+----+------+---------+
// |2026-01-15 09:05:30|2026|1    |15 |9   |5     |30.000000|
// |2025-07-04 14:30:45|2025|7    |4  |14  |30    |45.000000|
// |2024-12-31 23:59:59|2024|12   |31 |23  |59    |59.000000|
// |2023-03-08 00:00:00|2023|3    |8  |0   |0     |0.000000 |
// +-------------------+----+-----+---+----+------+---------+

One thing to watch for: SECOND returns a DECIMAL(8, 6) rather than an integer because it carries fractional seconds down to microseconds. If you only want whole seconds, cast the result or use the dedicated second function, which returns an INT.

datepart: the Spark 3.4 alias

datepart was added in Spark 3.4.0 as a one-word alias for date_part. It takes the same arguments and returns the same result — pick whichever spelling reads better in your codebase:

val df = Seq(
  "2026-01-15",
  "2025-07-04",
  "2024-12-31",
  "2023-03-08",
).toDF("event_date")

val df2 = df
  .withColumn("date_part", expr("date_part('QUARTER', event_date)"))
  .withColumn("datepart",  expr("datepart('QUARTER', event_date)"))

df2.show(false)
// +----------+---------+--------+
// |event_date|date_part|datepart|
// +----------+---------+--------+
// |2026-01-15|1        |1       |
// |2025-07-04|3        |3       |
// |2024-12-31|4        |4       |
// |2023-03-08|1        |1       |
// +----------+---------+--------+

If you're targeting Spark 3.0 through 3.3, only date_part is available — the datepart alias didn't exist yet.

extract: the SQL-standard syntax

extract does the same job, but uses the SQL-standard extract(field FROM source) syntax. Notice the field is an unquoted identifier here, not a string literal:

val df = Seq(
  "2026-01-15 09:05:30",
  "2025-07-04 14:30:45",
  "2024-12-31 23:59:59",
).toDF("event_time")

val df2 = df
  .withColumn("year",      expr("extract(YEAR FROM event_time)"))
  .withColumn("month",     expr("extract(MONTH FROM event_time)"))
  .withColumn("day",       expr("extract(DAY FROM event_time)"))
  .withColumn("dayofweek", expr("extract(DAYOFWEEK FROM event_time)"))
  .withColumn("week",      expr("extract(WEEK FROM event_time)"))

df2.show(false)
// +-------------------+----+-----+---+---------+----+
// |event_time         |year|month|day|dayofweek|week|
// +-------------------+----+-----+---+---------+----+
// |2026-01-15 09:05:30|2026|1    |15 |5        |3   |
// |2025-07-04 14:30:45|2025|7    |4  |6        |27  |
// |2024-12-31 23:59:59|2024|12   |31 |3        |1   |
// +-------------------+----+-----+---+---------+----+

A couple of things worth noting from this output:

  • DAYOFWEEK returns 1 for Sunday through 7 for Saturday, matching the dayofweek Scala function.
  • WEEK returns the ISO 8601 week number. December 31, 2024 (a Tuesday) belongs to ISO week 1 of 2025, not week 53 of 2024 — the same gotcha that weekofyear has.

Use extract when you want SQL-standard syntax (e.g., for portability with Postgres) or when you're writing the expression as part of a larger SQL string. For Scala-side composition, date_part is usually less awkward because the field is a normal string.

Extracting parts from an interval

date_part also works on interval columns, which makes it useful for breaking a duration down into days, hours, and minutes:

val df = Seq(
  ("task_a", "INTERVAL '5 12:30:45' DAY TO SECOND"),
  ("task_b", "INTERVAL '2 03:15:00' DAY TO SECOND"),
  ("task_c", "INTERVAL '0 00:45:30' DAY TO SECOND"),
).toDF("task", "duration_literal")

val df2 = df
  .withColumn("duration", expr("CAST(duration_literal AS INTERVAL DAY TO SECOND)"))
  .withColumn("days",    expr("date_part('DAY', duration)"))
  .withColumn("hours",   expr("date_part('HOUR', duration)"))
  .withColumn("minutes", expr("date_part('MINUTE', duration)"))

df2.select("task", "duration", "days", "hours", "minutes").show(false)
// +------+-----------------------------------+----+-----+-------+
// |task  |duration                           |days|hours|minutes|
// +------+-----------------------------------+----+-----+-------+
// |task_a|INTERVAL '5 12:30:45' DAY TO SECOND|5   |12   |30     |
// |task_b|INTERVAL '2 03:15:00' DAY TO SECOND|2   |3    |15     |
// |task_c|INTERVAL '0 00:45:30' DAY TO SECOND|0   |0    |45     |
// +------+-----------------------------------+----+-----+-------+

The dedicated year, month, hour, etc. functions don't accept interval inputs — date_part is the only way to get at the components of an interval column. The fields you can pull depend on the interval type: a DAY TO SECOND interval supports DAY, HOUR, MINUTE, and SECOND, while a YEAR TO MONTH interval supports YEAR and MONTH.

When to reach for date_part vs the dedicated functions

For most code, the dedicated functions like year, month, dayofmonth, hour, minute, second are easier to read and type-check at compile time — there's no string field name to misspell. Reach for date_part, datepart, or extract when:

  • The field to extract is decided at runtime (driven by config, a UI selection, or a column itself).
  • You're working with an interval column, where the dedicated functions don't apply.
  • You want SQL-standard syntax (extract) for compatibility with another engine.

For the dedicated date-part extractors, see year, month, and day. For the time-of-day equivalents, see hour, minute, and second. For getting the current date or timestamp to extract from, see current_date and current_timestamp.

Example Details

Created: 2026-05-03 10:17:09 PM

Last Updated: 2026-05-03 10:17:09 PM