Job Board
Consulting

Spark Scala date_trunc and trunc

date_trunc and trunc round a date or timestamp column down to a coarser unit — the start of the hour, day, month, quarter, or year. They're the go-to functions for bucketing events into time windows for grouping, partitioning, or aligning data to a calendar boundary.

The two functions cover different inputs and outputs. date_trunc works on a timestamp and returns a timestamp, supporting time-of-day units like hour, minute, and second in addition to date units. trunc works on a date and returns a date, so it only supports date-level units like month, year, week, and quarter.

def date_trunc(format: String, timestamp: Column): Column

def trunc(date: Column, format: String): Column

Note the argument order is different between the two — date_trunc takes the format first, while trunc takes the date first. This ordering matches the underlying SQL functions (date_trunc(unit, ts) and trunc(date, unit)).

The date_trunc function first appeared in version 2.3.0.

Truncating timestamps to common units

date_trunc zeroes out everything finer than the unit you pass. Truncating to hour keeps the date and hour but sets the minutes and seconds to zero. Truncating to month resets the day to the 1st and the time to midnight.

val df = Seq(
  "2026-01-15 09:30:45",
  "2026-02-04 14:05:12",
  "2025-12-20 23:59:59",
  "2025-07-04 00:00:01",
).toDF("event_ts")

val df2 = df
  .withColumn("trunc_hour",  date_trunc("hour",  col("event_ts").cast("timestamp")))
  .withColumn("trunc_day",   date_trunc("day",   col("event_ts").cast("timestamp")))
  .withColumn("trunc_month", date_trunc("month", col("event_ts").cast("timestamp")))
  .withColumn("trunc_year",  date_trunc("year",  col("event_ts").cast("timestamp")))

df2.show(false)
// +-------------------+-------------------+-------------------+-------------------+-------------------+
// |event_ts           |trunc_hour         |trunc_day          |trunc_month        |trunc_year         |
// +-------------------+-------------------+-------------------+-------------------+-------------------+
// |2026-01-15 09:30:45|2026-01-15 09:00:00|2026-01-15 00:00:00|2026-01-01 00:00:00|2026-01-01 00:00:00|
// |2026-02-04 14:05:12|2026-02-04 14:00:00|2026-02-04 00:00:00|2026-02-01 00:00:00|2026-01-01 00:00:00|
// |2025-12-20 23:59:59|2025-12-20 23:00:00|2025-12-20 00:00:00|2025-12-01 00:00:00|2025-01-01 00:00:00|
// |2025-07-04 00:00:01|2025-07-04 00:00:00|2025-07-04 00:00:00|2025-07-01 00:00:00|2025-01-01 00:00:00|
// +-------------------+-------------------+-------------------+-------------------+-------------------+

The return type is always a timestamp, even when you truncate to month or year. If you'd rather have a date column for those cases, cast the result with .cast("date") or use trunc instead.

The event_ts column above is a string, so it's cast to timestamp before truncation. If your column is already a timestamp, you can skip the .cast(...). Spark will also auto-cast strings in standard yyyy-MM-dd HH:mm:ss form, but being explicit avoids surprises if the format ever drifts.

Smaller units: minute, second, week, quarter

date_trunc accepts a wider set of units than trunc. You can drop down to minute and second for sub-day bucketing, or up to week and quarter for calendar bucketing.

val df = Seq(
  "2026-01-15 09:30:45",
  "2026-02-04 14:05:12",
  "2025-12-20 23:59:59",
  "2025-07-04 00:00:01",
).toDF("event_ts")

val df2 = df
  .withColumn("trunc_minute",  date_trunc("minute",  col("event_ts").cast("timestamp")))
  .withColumn("trunc_second",  date_trunc("second",  col("event_ts").cast("timestamp")))
  .withColumn("trunc_week",    date_trunc("week",    col("event_ts").cast("timestamp")))
  .withColumn("trunc_quarter", date_trunc("quarter", col("event_ts").cast("timestamp")))

df2.show(false)
// +-------------------+-------------------+-------------------+-------------------+-------------------+
// |event_ts           |trunc_minute       |trunc_second       |trunc_week         |trunc_quarter      |
// +-------------------+-------------------+-------------------+-------------------+-------------------+
// |2026-01-15 09:30:45|2026-01-15 09:30:00|2026-01-15 09:30:45|2026-01-12 00:00:00|2026-01-01 00:00:00|
// |2026-02-04 14:05:12|2026-02-04 14:05:00|2026-02-04 14:05:12|2026-02-02 00:00:00|2026-01-01 00:00:00|
// |2025-12-20 23:59:59|2025-12-20 23:59:00|2025-12-20 23:59:59|2025-12-15 00:00:00|2025-10-01 00:00:00|
// |2025-07-04 00:00:01|2025-07-04 00:00:00|2025-07-04 00:00:01|2025-06-30 00:00:00|2025-07-01 00:00:00|
// +-------------------+-------------------+-------------------+-------------------+-------------------+

A few things worth noting. week rounds to the start of the ISO week, which is Monday2026-01-15 is a Thursday, so its week starts on Monday 2026-01-12. quarter snaps to the first day of the calendar quarter (Jan 1, Apr 1, Jul 1, Oct 1). And second is essentially a no-op when the input has no sub-second precision.

The full list of supported formats for date_trunc is: year / yyyy / yy, quarter, month / mon / mm, week, day / dd, hour, minute, second, millisecond, microsecond. The format string is case-insensitive.

Truncating dates with trunc

When your column is already a date (no time component), use trunc. The output is a date, not a timestamp, which is usually what you want for grouping by month or quarter.

val df = Seq(
  "2026-01-15",
  "2026-02-04",
  "2025-12-20",
  "2025-07-04",
).toDF("event_date")

val df2 = df
  .withColumn("trunc_month",   trunc(col("event_date"), "month"))
  .withColumn("trunc_year",    trunc(col("event_date"), "year"))
  .withColumn("trunc_week",    trunc(col("event_date"), "week"))
  .withColumn("trunc_quarter", trunc(col("event_date"), "quarter"))

df2.show(false)
// +----------+-----------+----------+----------+-------------+
// |event_date|trunc_month|trunc_year|trunc_week|trunc_quarter|
// +----------+-----------+----------+----------+-------------+
// |2026-01-15|2026-01-01 |2026-01-01|2026-01-12|2026-01-01   |
// |2026-02-04|2026-02-01 |2026-01-01|2026-02-02|2026-01-01   |
// |2025-12-20|2025-12-01 |2025-01-01|2025-12-15|2025-10-01   |
// |2025-07-04|2025-07-01 |2025-01-01|2025-06-30|2025-07-01   |
// +----------+-----------+----------+----------+-------------+

trunc only accepts date-level units: year / yyyy / yy, quarter, month / mon / mm, and week. Time-based units like hour are silently rejected — see the next section.

Invalid formats return null

trunc returns null if you pass a format it doesn't support, rather than throwing an error. This includes any time-based unit like hour or minute, since trunc operates on dates that have no time component.

val df = Seq(
  "2026-01-15",
  "2026-02-04",
  "2025-12-20",
  "2025-07-04",
).toDF("event_date")

val df2 = df
  .withColumn("valid_format",   trunc(col("event_date"), "month"))
  .withColumn("invalid_format", trunc(col("event_date"), "hour"))

df2.show(false)
// +----------+------------+--------------+
// |event_date|valid_format|invalid_format|
// +----------+------------+--------------+
// |2026-01-15|2026-01-01  |null          |
// |2026-02-04|2026-02-01  |null          |
// |2025-12-20|2025-12-01  |null          |
// |2025-07-04|2025-07-01  |null          |
// +----------+------------+--------------+

date_trunc behaves the same way for unrecognized units — silent null, no error. This is worth knowing because a typo in the format string won't fail loudly; you'll just see a column full of nulls. Always run a quick df.show() after adding a truncation to verify.

For formatting the truncated value as a custom string (e.g., 2026-01 for monthly buckets), see date_format. For pulling out individual date components as integers, see year, month, dayofmonth, hour, minute, second, or the more general date_part and extract.

Example Details

Created: 2026-05-07 10:39:56 PM

Last Updated: 2026-05-07 10:39:56 PM