Spark Scala months_between
months_between returns the number of months between two date or timestamp columns as a Double. It's the right tool when you want a months gap rather than a days gap — tenure in months, age of a record, billing cycles, anything where calendar months matter more than raw day counts.
def months_between(end: Column, start: Column): Column
def months_between(end: Column, start: Column, roundOff: Boolean): Column
months_between returns end - start measured in months. Like datediff, the end date comes first and the start date comes second — get them backwards and you'll see negative numbers.
The result is a Double because months don't divide evenly. Whole months count as a full unit; partial months are reported as a fraction of a 31-day month.
The most common shape is two date columns from the same row — for example, a started_on and ended_on pair you want to turn into a duration in months:
val df = Seq(
("order_a", "2026-01-15", "2026-04-15"),
("order_b", "2025-11-01", "2026-02-01"),
("order_c", "2024-06-30", "2026-05-31"),
("order_d", "2026-03-10", "2026-03-25"),
).toDF("order_id", "started_on", "ended_on")
val df2 = df
.withColumn("months", months_between(col("ended_on"), col("started_on")))
df2.show(false)
// +--------+----------+----------+----------+
// |order_id|started_on|ended_on |months |
// +--------+----------+----------+----------+
// |order_a |2026-01-15|2026-04-15|3.0 |
// |order_b |2025-11-01|2026-02-01|3.0 |
// |order_c |2024-06-30|2026-05-31|23.0 |
// |order_d |2026-03-10|2026-03-25|0.48387097|
// +--------+----------+----------+----------+
When the two dates land on the same day of the month, the result is a whole number — 2026-01-15 → 2026-04-15 is exactly 3.0. When they don't, Spark reports the fractional part as extra_days / 31, regardless of how many days are actually in the months involved. That's why 2026-03-10 → 2026-03-25 is 15 / 31 ≈ 0.48387097.
The input columns can be strings in yyyy-MM-dd format (Spark casts them implicitly), DateType, or TimestampType. With timestamps, the time-of-day participates in the calculation — two timestamps 12 hours apart will produce a slightly non-zero result.
End-of-month behavior
months_between has one important quirk: when both dates are the last day of their respective months, the result is a whole number — even though the months have different lengths. This avoids spurious fractions in cases like billing cycles that fall on month-end:
val df = Seq(
("2026-01-15", "2026-03-01"),
("2026-01-31", "2026-02-28"),
("2026-02-28", "2026-03-31"),
("2026-01-01", "2026-04-10"),
).toDF("start", "end")
val df2 = df
.withColumn("rounded", months_between(col("end"), col("start")))
.withColumn("unrounded", months_between(col("end"), col("start"), false))
df2.show(false)
// +----------+----------+----------+------------------+
// |start |end |rounded |unrounded |
// +----------+----------+----------+------------------+
// |2026-01-15|2026-03-01|1.5483871 |1.5483870967741935|
// |2026-01-31|2026-02-28|1.0 |1.0 |
// |2026-02-28|2026-03-31|1.0 |1.0 |
// |2026-01-01|2026-04-10|3.29032258|3.2903225806451615|
// +----------+----------+----------+------------------+
2026-01-31 to 2026-02-28 is 1.0 because both are end-of-month. 2026-02-28 to 2026-03-31 is 1.0 for the same reason. If you actually want the day count divided by 31, you'd get something different — but in practice the end-of-month rule is what you want for billing and tenure logic.
The second column above demonstrates the second signature.
def months_between(end: Column, start: Column, roundOff: Boolean): Column
Passing false for roundOff skips the default rounding to 8 decimal places and returns the full-precision Double. The default (true) is almost always what you want — 8 decimals is plenty — but if you're plugging the result into a downstream calculation where every digit matters, pass false.
Months from a fixed reference date
When you want to compute months against a single anchor — like "today" or a contract start — wrap the literal date in lit() and pass it as the end argument:
val df = Seq(
("alice", "2025-01-01"),
("bob", "2024-08-15"),
("carol", "2025-11-20"),
("dave", "2026-04-30"),
).toDF("user_id", "signed_up_on")
val df2 = df
.withColumn(
"months_since_signup",
months_between(lit("2026-05-15"), col("signed_up_on")),
)
df2.show(false)
// +-------+------------+-------------------+
// |user_id|signed_up_on|months_since_signup|
// +-------+------------+-------------------+
// |alice |2025-01-01 |16.4516129 |
// |bob |2024-08-15 |21.0 |
// |carol |2025-11-20 |5.83870968 |
// |dave |2026-04-30 |0.51612903 |
// +-------+------------+-------------------+
For "months since signup as of right now", swap lit("2026-05-15") for current_date() and Spark will use the current date at query time.
If you only need a whole-number months count, wrap the result in floor() or cast it to an integer. The Double return type gives you the choice — round, truncate, or keep the fraction.
Null handling
When either input is null, the result is null. Null in, null out — no surprises:
val df = Seq(
(Some("2026-01-15"), Some("2026-04-15")),
(None, Some("2026-04-15")),
(Some("2026-01-15"), None),
(None, None),
).toDF("started_on", "ended_on")
val df2 = df
.withColumn("months", months_between(col("ended_on"), col("started_on")))
df2.show(false)
// +----------+----------+------+
// |started_on|ended_on |months|
// +----------+----------+------+
// |2026-01-15|2026-04-15|3.0 |
// |null |2026-04-15|null |
// |2026-01-15|null |null |
// |null |null |null |
// +----------+----------+------+
If you'd rather treat nulls as a fallback date, use coalesce on the input column before passing it to months_between.
Related functions
For the gap measured in days instead of months, see datediff and date_diff. To shift a date forward or backward by a whole number of months, see date_add, date_sub, and add_months. For the current date as an anchor, see current_date and current_timestamp.