Spark Scala months_between

months_between returns the number of months between two date or timestamp columns as a Double. It's the right tool when you want a months gap rather than a days gap — tenure in months, age of a record, billing cycles, anything where calendar months matter more than raw day counts.

def months_between(end: Column, start: Column): Column
def months_between(end: Column, start: Column, roundOff: Boolean): Column

months_between returns end - start measured in months. Like datediff, the end date comes first and the start date comes second — get them backwards and you'll see negative numbers.

The result is a Double because months don't divide evenly. Whole months count as a full unit; partial months are reported as a fraction of a 31-day month.

The most common shape is two date columns from the same row — for example, a started_on and ended_on pair you want to turn into a duration in months:

val df = Seq(
  ("order_a", "2026-01-15", "2026-04-15"),
  ("order_b", "2025-11-01", "2026-02-01"),
  ("order_c", "2024-06-30", "2026-05-31"),
  ("order_d", "2026-03-10", "2026-03-25"),
).toDF("order_id", "started_on", "ended_on")

val df2 = df
  .withColumn("months", months_between(col("ended_on"), col("started_on")))

df2.show(false)
// +--------+----------+----------+----------+
// |order_id|started_on|ended_on  |months    |
// +--------+----------+----------+----------+
// |order_a |2026-01-15|2026-04-15|3.0       |
// |order_b |2025-11-01|2026-02-01|3.0       |
// |order_c |2024-06-30|2026-05-31|23.0      |
// |order_d |2026-03-10|2026-03-25|0.48387097|
// +--------+----------+----------+----------+

When the two dates land on the same day of the month, the result is a whole number — 2026-01-15 → 2026-04-15 is exactly 3.0. When they don't, Spark reports the fractional part as extra_days / 31, regardless of how many days are actually in the months involved. That's why 2026-03-10 → 2026-03-25 is 15 / 31 ≈ 0.48387097.

The input columns can be strings in yyyy-MM-dd format (Spark casts them implicitly), DateType, or TimestampType. With timestamps, the time-of-day participates in the calculation — two timestamps 12 hours apart will produce a slightly non-zero result.

End-of-month behavior

months_between has one important quirk: when both dates are the last day of their respective months, the result is a whole number — even though the months have different lengths. This avoids spurious fractions in cases like billing cycles that fall on month-end:

val df = Seq(
  ("2026-01-15", "2026-03-01"),
  ("2026-01-31", "2026-02-28"),
  ("2026-02-28", "2026-03-31"),
  ("2026-01-01", "2026-04-10"),
).toDF("start", "end")

val df2 = df
  .withColumn("rounded",   months_between(col("end"), col("start")))
  .withColumn("unrounded", months_between(col("end"), col("start"), false))

df2.show(false)
// +----------+----------+----------+------------------+
// |start     |end       |rounded   |unrounded         |
// +----------+----------+----------+------------------+
// |2026-01-15|2026-03-01|1.5483871 |1.5483870967741935|
// |2026-01-31|2026-02-28|1.0       |1.0               |
// |2026-02-28|2026-03-31|1.0       |1.0               |
// |2026-01-01|2026-04-10|3.29032258|3.2903225806451615|
// +----------+----------+----------+------------------+

2026-01-31 to 2026-02-28 is 1.0 because both are end-of-month. 2026-02-28 to 2026-03-31 is 1.0 for the same reason. If you actually want the day count divided by 31, you'd get something different — but in practice the end-of-month rule is what you want for billing and tenure logic.

The second column above demonstrates the second signature.

def months_between(end: Column, start: Column, roundOff: Boolean): Column

Passing false for roundOff skips the default rounding to 8 decimal places and returns the full-precision Double. The default (true) is almost always what you want — 8 decimals is plenty — but if you're plugging the result into a downstream calculation where every digit matters, pass false.

Months from a fixed reference date

When you want to compute months against a single anchor — like "today" or a contract start — wrap the literal date in lit() and pass it as the end argument:

val df = Seq(
  ("alice", "2025-01-01"),
  ("bob",   "2024-08-15"),
  ("carol", "2025-11-20"),
  ("dave",  "2026-04-30"),
).toDF("user_id", "signed_up_on")

val df2 = df
  .withColumn(
    "months_since_signup",
    months_between(lit("2026-05-15"), col("signed_up_on")),
  )

df2.show(false)
// +-------+------------+-------------------+
// |user_id|signed_up_on|months_since_signup|
// +-------+------------+-------------------+
// |alice  |2025-01-01  |16.4516129         |
// |bob    |2024-08-15  |21.0               |
// |carol  |2025-11-20  |5.83870968         |
// |dave   |2026-04-30  |0.51612903         |
// +-------+------------+-------------------+

For "months since signup as of right now", swap lit("2026-05-15") for current_date() and Spark will use the current date at query time.

If you only need a whole-number months count, wrap the result in floor() or cast it to an integer. The Double return type gives you the choice — round, truncate, or keep the fraction.

Null handling

When either input is null, the result is null. Null in, null out — no surprises:

val df = Seq(
  (Some("2026-01-15"), Some("2026-04-15")),
  (None,               Some("2026-04-15")),
  (Some("2026-01-15"), None),
  (None,               None),
).toDF("started_on", "ended_on")

val df2 = df
  .withColumn("months", months_between(col("ended_on"), col("started_on")))

df2.show(false)
// +----------+----------+------+
// |started_on|ended_on  |months|
// +----------+----------+------+
// |2026-01-15|2026-04-15|3.0   |
// |null      |2026-04-15|null  |
// |2026-01-15|null      |null  |
// |null      |null      |null  |
// +----------+----------+------+

If you'd rather treat nulls as a fallback date, use coalesce on the input column before passing it to months_between.

For the gap measured in days instead of months, see datediff and date_diff. To shift a date forward or backward by a whole number of months, see date_add, date_sub, and add_months. For the current date as an anchor, see current_date and current_timestamp.

Spark Scala months_between

End-of-month behavior

Months from a fixed reference date

Null handling

Related functions