Job Board
Consulting

Spark Scala make_date and make_timestamp

The make_date and make_timestamp family of functions construct date and timestamp values from separate numeric columns — year, month, day, and (for timestamps) hour, minute, and second. They're the inverse of extracting parts from a date: instead of pulling fields out, you assemble fields into a single value.

make_date

make_date is the only function in this group that exists directly in org.apache.spark.sql.functions. It takes three integer columns and returns a DateType column.

def make_date(year: Column, month: Column, day: Column): Column

The make_date function first appeared in version 3.3.0 and is defined as:

val df = Seq(
  (2024, 1, 15),
  (2023, 12, 31),
  (2025, 7, 4),
  (2020, 2, 29),
).toDF("year", "month", "day")

val df2 = df
  .withColumn("date", make_date(col("year"), col("month"), col("day")))

df2.show(false)
// +----+-----+---+----------+
// |year|month|day|date      |
// +----+-----+---+----------+
// |2024|1    |15 |2024-01-15|
// |2023|12   |31 |2023-12-31|
// |2025|7    |4  |2025-07-04|
// |2020|2    |29 |2020-02-29|
// +----+-----+---+----------+

Notice that 2020-02-29 is accepted as a valid date because 2020 was a leap year.

Invalid inputs

By default (with spark.sql.ansi.enabled = false), make_date returns null for invalid inputs like a month of 13 or a day of 30 in February. With ANSI mode enabled, the function throws an error instead.

val df = Seq(
  (2024, 1, 15),
  (2024, 2, 30),
  (2024, 13, 1),
  (2024, 4, 0),
).toDF("year", "month", "day")

val df2 = df
  .withColumn("date", make_date(col("year"), col("month"), col("day")))

df2.show(false)
// +----+-----+---+----------+
// |year|month|day|date      |
// +----+-----+---+----------+
// |2024|1    |15 |2024-01-15|
// |2024|2    |30 |null      |
// |2024|13   |1  |null      |
// |2024|4    |0  |null      |
// +----+-----+---+----------+

make_timestamp

make_timestamp builds a timestamp from year, month, day, hour, minute, and second fields, with an optional timezone. It's a Spark SQL function — it isn't available in the org.apache.spark.sql.functions object, so you call it through expr().

def make_timestamp(year, month, day, hour, min, sec): Column — via expr()

def make_timestamp(year, month, day, hour, min, sec, timezone): Column — via expr()

The sec parameter is a decimal — it accepts fractional seconds up to microsecond precision. The result type depends on the spark.sql.timestampType configuration (TIMESTAMP_LTZ by default).

The make_timestamp function first appeared in version 3.0.0.

val df = Seq(
  (2024, 1, 15, 9, 30, 0.0),
  (2024, 6, 21, 14, 45, 30.5),
  (2024, 12, 31, 23, 59, 59.999),
).toDF("year", "month", "day", "hour", "min", "sec")

val df2 = df
  .withColumn(
    "timestamp",
    expr("make_timestamp(year, month, day, hour, min, sec)")
  )

df2.show(false)
// +----+-----+---+----+---+------+-----------------------+
// |year|month|day|hour|min|sec   |timestamp              |
// +----+-----+---+----+---+------+-----------------------+
// |2024|1    |15 |9   |30 |0.0   |2024-01-15 09:30:00    |
// |2024|6    |21 |14  |45 |30.5  |2024-06-21 14:45:30.5  |
// |2024|12   |31 |23  |59 |59.999|2024-12-31 23:59:59.999|
// +----+-----+---+----+---+------+-----------------------+

With a timezone

Pass a seventh argument to specify the source timezone. The wall-clock time is interpreted in that timezone, then converted to the session timezone for display. In the example below, the session timezone is UTC, so the resulting timestamps are shifted from their original timezones into UTC.

val df = Seq(
  (2024, 1, 15, 9, 30, 0.0, "America/New_York"),
  (2024, 6, 21, 14, 45, 30.0, "Europe/London"),
  (2024, 12, 31, 23, 59, 59.0, "Asia/Tokyo"),
).toDF("year", "month", "day", "hour", "min", "sec", "tz")

val df2 = df
  .withColumn(
    "timestamp",
    expr("make_timestamp(year, month, day, hour, min, sec, tz)")
  )

df2.show(false)
// +----+-----+---+----+---+----+----------------+-------------------+
// |year|month|day|hour|min|sec |tz              |timestamp          |
// +----+-----+---+----+---+----+----------------+-------------------+
// |2024|1    |15 |9   |30 |0.0 |America/New_York|2024-01-15 09:30:00|
// |2024|6    |21 |14  |45 |30.0|Europe/London   |2024-06-21 09:45:30|
// |2024|12   |31 |23  |59 |59.0|Asia/Tokyo      |2024-12-31 09:59:59|
// +----+-----+---+----+---+----+----------------+-------------------+

make_timestamp_ntz and make_timestamp_ltz

If you want explicit control over the result type — regardless of the session configuration — use one of the two typed variants:

def make_timestamp_ntz(year, month, day, hour, min, sec): Column — via expr()

def make_timestamp_ltz(year, month, day, hour, min, sec, [timezone]): Column — via expr()

make_timestamp_ntz returns a TIMESTAMP_NTZ (no timezone) value — the wall-clock time as-is, with no timezone conversion. make_timestamp_ltz returns a TIMESTAMP_LTZ (local timezone) value, which is stored as an instant and rendered in the session timezone.

Both functions first appeared in version 3.4.0.

val df = Seq(
  (2024, 1, 15, 9, 30, 0.0),
  (2024, 6, 21, 14, 45, 30.5),
  (2024, 12, 31, 23, 59, 59.0),
).toDF("year", "month", "day", "hour", "min", "sec")

val df2 = df
  .withColumn(
    "ntz",
    expr("make_timestamp_ntz(year, month, day, hour, min, sec)")
  )
  .withColumn(
    "ltz",
    expr("make_timestamp_ltz(year, month, day, hour, min, sec)")
  )

df2.show(false)
// +----+-----+---+----+---+----+---------------------+---------------------+
// |year|month|day|hour|min|sec |ntz                  |ltz                  |
// +----+-----+---+----+---+----+---------------------+---------------------+
// |2024|1    |15 |9   |30 |0.0 |2024-01-15 09:30:00  |2024-01-15 09:30:00  |
// |2024|6    |21 |14  |45 |30.5|2024-06-21 14:45:30.5|2024-06-21 14:45:30.5|
// |2024|12   |31 |23  |59 |59.0|2024-12-31 23:59:59  |2024-12-31 23:59:59  |
// +----+-----+---+----+---+----+---------------------+---------------------+

The displayed values look identical because the session timezone matches the wall-clock time, but the underlying types differ. ntz will always render the same digits regardless of session timezone, while ltz will shift when the session timezone changes.

To go the other direction — extracting parts from an existing date or timestamp — see year, month, and day. For parsing strings into dates and timestamps, see to_date and to_timestamp.

Example Details

Created: 2026-05-17 10:49:31 PM

Last Updated: 2026-05-17 10:49:31 PM