Spark Scala date_add, date_sub, and add_months
date_add, date_sub, and add_months shift a date column forward or backward by a fixed number of days or months. They're the bread-and-butter functions for computing things like "30 days from order date", "one month before renewal", or "next billing cycle". Spark 3.4 added a one-word alias, dateadd, that's only reachable through expr().
date_add
def date_add(start: Column, days: Int): Column
def date_add(start: Column, days: Column): Column
date_add returns a new date that is days days after start. Pass a literal Int when the offset is the same for every row, or a Column when it varies per row.
Here's the most common shape — a fixed number of days added to every row:
val df = Seq(
"2026-01-15",
"2025-07-04",
"2024-12-31",
"2023-03-08",
).toDF("event_date")
val df2 = df
.withColumn("plus_7_days", date_add(col("event_date"), 7))
.withColumn("plus_30_days", date_add(col("event_date"), 30))
df2.show(false)
// +----------+-----------+------------+
// |event_date|plus_7_days|plus_30_days|
// +----------+-----------+------------+
// |2026-01-15|2026-01-22 |2026-02-14 |
// |2025-07-04|2025-07-11 |2025-08-03 |
// |2024-12-31|2025-01-07 |2025-01-30 |
// |2023-03-08|2023-03-15 |2023-04-07 |
// +----------+-----------+------------+
Notice that date_add correctly rolls into the next month and year — 2024-12-31 + 7 days becomes 2025-01-07, not 2024-12-38.
The input column can be a string in yyyy-MM-dd format (Spark casts it implicitly), an actual DateType, or a TimestampType — all three work. The return type is always DateType.
Variable offset per row
When the number of days to add depends on another column, pass that column as the second argument. This is handy for things like computing expected delivery dates from a per-order lead time:
val df = Seq(
("order_a", "2026-01-15", 3),
("order_b", "2026-01-15", 5),
("order_c", "2026-01-15", 14),
("order_d", "2026-01-15", 30),
).toDF("order_id", "ordered_on", "lead_days")
val df2 = df
.withColumn("expected_delivery", date_add(col("ordered_on"), col("lead_days")))
df2.show(false)
// +--------+----------+---------+-----------------+
// |order_id|ordered_on|lead_days|expected_delivery|
// +--------+----------+---------+-----------------+
// |order_a |2026-01-15|3 |2026-01-18 |
// |order_b |2026-01-15|5 |2026-01-20 |
// |order_c |2026-01-15|14 |2026-01-29 |
// |order_d |2026-01-15|30 |2026-02-14 |
// +--------+----------+---------+-----------------+
The Column overload of date_add first appeared in version 3.0.0. Before that, only the Int overload existed — if you needed a per-row offset on older Spark, you had to drop into expr().
date_sub
def date_sub(start: Column, days: Int): Column
def date_sub(start: Column, days: Column): Column
date_sub is the mirror of date_add — it returns the date days days before start. You could of course pass a negative number to date_add and get the same result, but date_sub reads more clearly when the intent is to look backward in time:
val df = Seq(
"2026-01-15",
"2025-07-04",
"2024-12-31",
"2023-03-08",
).toDF("event_date")
val df2 = df
.withColumn("minus_7_days", date_sub(col("event_date"), 7))
.withColumn("minus_30_days", date_sub(col("event_date"), 30))
df2.show(false)
// +----------+------------+-------------+
// |event_date|minus_7_days|minus_30_days|
// +----------+------------+-------------+
// |2026-01-15|2026-01-08 |2025-12-16 |
// |2025-07-04|2025-06-27 |2025-06-04 |
// |2024-12-31|2024-12-24 |2024-12-01 |
// |2023-03-08|2023-03-01 |2023-02-06 |
// +----------+------------+-------------+
Like date_add, date_sub rolls month and year boundaries cleanly — 2026-01-15 - 30 days lands in December of the previous year.
add_months
def add_months(startDate: Column, numMonths: Int): Column
def add_months(startDate: Column, numMonths: Column): Column
add_months shifts a date by a number of months instead of days. Pass a negative value to subtract — there's no subtract_months function; just negate the offset.
val df = Seq(
"2026-01-15",
"2025-07-04",
"2024-12-31",
"2023-03-08",
).toDF("event_date")
val df2 = df
.withColumn("plus_1_month", add_months(col("event_date"), 1))
.withColumn("plus_6_months", add_months(col("event_date"), 6))
.withColumn("minus_3_months", add_months(col("event_date"), -3))
df2.show(false)
// +----------+------------+-------------+--------------+
// |event_date|plus_1_month|plus_6_months|minus_3_months|
// +----------+------------+-------------+--------------+
// |2026-01-15|2026-02-15 |2026-07-15 |2025-10-15 |
// |2025-07-04|2025-08-04 |2026-01-04 |2025-04-04 |
// |2024-12-31|2025-01-31 |2025-06-30 |2024-09-30 |
// |2023-03-08|2023-04-08 |2023-09-08 |2022-12-08 |
// +----------+------------+-------------+--------------+
Look at 2024-12-31 + 6 months — the result is 2025-06-30, not 2025-06-31 (which doesn't exist). When the day-of-month doesn't exist in the target month, add_months clamps it to the last day. Day 15 plus a month is always day 15, but day 31 plus a month is whatever the last day of the target month happens to be.
This end-of-month behavior is worth seeing on its own:
val df = Seq(
"2026-01-31",
"2026-03-31",
"2024-02-29",
"2026-01-30",
).toDF("event_date")
val df2 = df
.withColumn("plus_1_month", add_months(col("event_date"), 1))
df2.show(false)
// +----------+------------+
// |event_date|plus_1_month|
// +----------+------------+
// |2026-01-31|2026-02-28 |
// |2026-03-31|2026-04-30 |
// |2024-02-29|2024-03-29 |
// |2026-01-30|2026-02-28 |
// +----------+------------+
A few things to note:
2026-01-31 + 1 month→2026-02-28. February only has 28 days in 2026, so the result clamps.2024-02-29 + 1 month→2024-03-29. March has a 29th, so no clamping happens — the leap-day origin doesn't carry forward.2026-01-30 + 1 month→2026-02-28. Same clamping as the 31st case, because February still has fewer than 30 days.
If you need calendar-month arithmetic that always stays at end-of-month (so that the 30th of January maps to the 28th of February and the 31st of January also maps to the 28th of February), add_months already does what you want. If you need something different — like always landing exactly N days later — use date_add instead.
dateadd: the SQL-only Spark 3.4 alias
dateadd was added in Spark 3.4.0 as a one-word alias for date_add. It isn't exposed in the Scala functions API, so you call it through expr():
def dateadd(start_date, num_days): Column — via expr()
It returns the same result as date_add for the same inputs:
val df = Seq(
"2026-01-15",
"2025-07-04",
"2024-12-31",
).toDF("event_date")
val df2 = df
.withColumn("date_add", date_add(col("event_date"), 7))
.withColumn("dateadd", expr("dateadd(event_date, 7)"))
df2.show(false)
// +----------+----------+----------+
// |event_date|date_add |dateadd |
// +----------+----------+----------+
// |2026-01-15|2026-01-22|2026-01-22|
// |2025-07-04|2025-07-11|2025-07-11|
// |2024-12-31|2025-01-07|2025-01-07|
// +----------+----------+----------+
For Scala code, prefer date_add — it's the native Column function and doesn't require expr(). dateadd is mostly useful when you're writing the expression as a SQL string (e.g., inside a selectExpr or a SQL view definition) and want the shorter spelling familiar from T-SQL or Snowflake.
If you're targeting Spark 3.0 through 3.3, dateadd doesn't exist yet — use date_add.
Null handling
When the input date is null, all three functions return null. The offset is treated as a literal computation, so a null input produces a null output without any errors:
val df = Seq(
Some("2026-01-15"),
None,
Some("2025-07-04"),
).toDF("event_date")
val df2 = df
.withColumn("plus_7_days", date_add(col("event_date"), 7))
.withColumn("minus_7_days", date_sub(col("event_date"), 7))
.withColumn("plus_1_month", add_months(col("event_date"), 1))
df2.show(false)
// +----------+-----------+------------+------------+
// |event_date|plus_7_days|minus_7_days|plus_1_month|
// +----------+-----------+------------+------------+
// |2026-01-15|2026-01-22 |2026-01-08 |2026-02-15 |
// |null |null |null |null |
// |2025-07-04|2025-07-11 |2025-06-27 |2025-08-04 |
// +----------+-----------+------------+------------+
Related functions
For getting the current date to add or subtract from, see current_date and current_timestamp. For pulling individual fields out of the result, see year, month, and day.