Spark Scala to_date and to_timestamp
to_date and to_timestamp parse string columns into proper date and timestamp types. They're the functions you reach for whenever raw data lands as text — CSV files, JSON payloads, or upstream queries — and you need to do anything date-related with it. Spark 3.4 also adds to_timestamp_ntz and to_timestamp_ltz for explicit time-zone semantics.
Parsing dates with to_date
The to_date function converts a string column to a date column. It has two signatures: one that assumes the standard ISO format (yyyy-MM-dd), and one that accepts a custom format pattern.
def to_date(e: Column): Column
def to_date(e: Column, fmt: String): Column
When the input is already in yyyy-MM-dd format, you don't need to pass a format string:
val df = Seq(
"2026-01-15",
"2025-12-20",
"2025-07-04",
"2024-02-29",
).toDF("event_date_str")
val df2 = df
.withColumn("event_date", to_date(col("event_date_str")))
df2.show(false)
// +--------------+----------+
// |event_date_str|event_date|
// +--------------+----------+
// |2026-01-15 |2026-01-15|
// |2025-12-20 |2025-12-20|
// |2025-07-04 |2025-07-04|
// |2024-02-29 |2024-02-29|
// +--------------+----------+
df2.printSchema()
// root
// |-- event_date_str: string (nullable = true)
// |-- event_date: date (nullable = true)
The output looks identical to the input string, but the schema confirms the column is now a date rather than a string. That distinction matters — once the column is a real date, it works with date arithmetic, datediff, date_format, and every other date/time function.
Parsing non-ISO date formats
Real-world data rarely arrives in ISO format. Pass a Java date pattern as the second argument to handle other formats:
val df = Seq(
"01/15/2026",
"12/20/2025",
"07/04/2025",
"02/29/2024",
).toDF("event_date_str")
val df2 = df
.withColumn("event_date", to_date(col("event_date_str"), "MM/dd/yyyy"))
df2.show(false)
// +--------------+----------+
// |event_date_str|event_date|
// +--------------+----------+
// |01/15/2026 |2026-01-15|
// |12/20/2025 |2025-12-20|
// |07/04/2025 |2025-07-04|
// |02/29/2024 |2024-02-29|
// +--------------+----------+
The format pattern uses the same tokens as date_format — yyyy for the four-digit year, MM for the zero-padded month, dd for the zero-padded day. See the date_format examples for a full tour of the pattern syntax.
Parsing timestamps with to_timestamp
to_timestamp works the same way as to_date, but produces a timestamp column that includes time-of-day information.
def to_timestamp(s: Column): Column
def to_timestamp(s: Column, fmt: String): Column
The default format is yyyy-MM-dd HH:mm:ss:
val df = Seq(
"2026-01-15 09:30:45",
"2025-12-20 23:59:59",
"2025-07-04 00:00:01",
"2024-02-29 12:00:00",
).toDF("event_ts_str")
val df2 = df
.withColumn("event_ts", to_timestamp(col("event_ts_str")))
df2.show(false)
// +-------------------+-------------------+
// |event_ts_str |event_ts |
// +-------------------+-------------------+
// |2026-01-15 09:30:45|2026-01-15 09:30:45|
// |2025-12-20 23:59:59|2025-12-20 23:59:59|
// |2025-07-04 00:00:01|2025-07-04 00:00:01|
// |2024-02-29 12:00:00|2024-02-29 12:00:00|
// +-------------------+-------------------+
df2.printSchema()
// root
// |-- event_ts_str: string (nullable = true)
// |-- event_ts: timestamp (nullable = true)
For timestamps in any other shape, supply a format pattern:
val df = Seq(
"Jan 15, 2026 9:30 AM",
"Dec 20, 2025 11:59 PM",
"Jul 4, 2025 12:01 AM",
"Feb 29, 2024 12:00 PM",
).toDF("event_ts_str")
val df2 = df
.withColumn("event_ts", to_timestamp(col("event_ts_str"), "MMM d, yyyy h:mm a"))
df2.show(false)
// +---------------------+-------------------+
// |event_ts_str |event_ts |
// +---------------------+-------------------+
// |Jan 15, 2026 9:30 AM |2026-01-15 09:30:00|
// |Dec 20, 2025 11:59 PM|2025-12-20 23:59:00|
// |Jul 4, 2025 12:01 AM |2025-07-04 00:01:00|
// |Feb 29, 2024 12:00 PM|2024-02-29 12:00:00|
// +---------------------+-------------------+
MMM matches the abbreviated month name, h is the 12-hour clock, and a matches AM/PM.
to_timestamp_ntz and to_timestamp_ltz
Spark 3.4 adds two SQL-only variants that parse into specific timestamp types:
to_timestamp_ntzparses intoTIMESTAMP_NTZ— a timestamp without a time zone, treated as a wall-clock value.to_timestamp_ltzparses intoTIMESTAMP_LTZ— a timestamp with the session local time zone, the same type returned byto_timestamp.
Neither has a Scala API entry in org.apache.spark.sql.functions, so you call them through expr():
to_timestamp_ntz(timestamp_str[, fmt]) — via expr()
to_timestamp_ltz(timestamp_str[, fmt]) — via expr()
The to_timestamp_ntz and to_timestamp_ltz functions first appeared in version 3.4.0.
val df = Seq(
"2026-01-15 09:30:45",
"2025-12-20 23:59:59",
"2025-07-04 00:00:01",
).toDF("event_ts_str")
val df2 = df
.withColumn("ts_ntz", expr("to_timestamp_ntz(event_ts_str)"))
.withColumn("ts_ltz", expr("to_timestamp_ltz(event_ts_str)"))
df2.show(false)
// +-------------------+-------------------+-------------------+
// |event_ts_str |ts_ntz |ts_ltz |
// +-------------------+-------------------+-------------------+
// |2026-01-15 09:30:45|2026-01-15 09:30:45|2026-01-15 09:30:45|
// |2025-12-20 23:59:59|2025-12-20 23:59:59|2025-12-20 23:59:59|
// |2025-07-04 00:00:01|2025-07-04 00:00:01|2025-07-04 00:00:01|
// +-------------------+-------------------+-------------------+
df2.printSchema()
// root
// |-- event_ts_str: string (nullable = true)
// |-- ts_ntz: timestamp_ntz (nullable = true)
// |-- ts_ltz: timestamp (nullable = true)
The values are identical for these inputs because no time-zone offset is supplied — the difference is in the schema. ts_ntz is timestamp_ntz and is never adjusted for time zones. ts_ltz is the regular timestamp type, which adjusts to the session time zone when the input includes an offset (e.g. 2026-01-15 09:30:45+05:00).
Reach for to_timestamp_ntz when you want to record a wall-clock time that means the same thing everywhere — appointment times, scheduled events, log timestamps already normalized to UTC. Reach for to_timestamp_ltz (or just to_timestamp) when the timestamp represents an absolute instant that should shift with the viewer's time zone.
Handling invalid input
Strings that don't match the expected format produce null rather than throwing an exception. The same goes for null inputs:
val df = Seq(
"2026-01-15",
"not-a-date",
null,
"2026-13-99",
).toDF("event_date_str")
val df2 = df
.withColumn("event_date", to_date(col("event_date_str")))
.withColumn("event_ts", to_timestamp(col("event_date_str")))
df2.show(false)
// +--------------+----------+-------------------+
// |event_date_str|event_date|event_ts |
// +--------------+----------+-------------------+
// |2026-01-15 |2026-01-15|2026-01-15 00:00:00|
// |not-a-date |null |null |
// |null |null |null |
// |2026-13-99 |null |null |
// +--------------+----------+-------------------+
A few things to note: to_timestamp happily accepts a date-only string and zero-pads the time. Garbage strings and impossible dates (month 13, day 99) both yield null instead of an error, so you'll want to filter those out (or count them) downstream rather than trusting the result blindly.
Related functions
To go the other direction — turning a date or timestamp back into a formatted string — use date_format. To extract individual parts (year, month, day, hour, etc.) from the parsed value, see year, month, and day and hour, minute, and second.