Spark Scala Unix Timestamp Functions: unix_timestamp, to_unix_timestamp, and from_unixtime
unix_timestamp and from_unixtime are the bridge between human-readable timestamp strings and Unix epoch seconds. Reach for them when integrating with systems that store time as a long — Kafka payloads, log files, REST APIs — or when you need to do timestamp math that's easier in seconds than in calendar fields. to_unix_timestamp is the SQL-only sibling of unix_timestamp, called through expr().
The examples on this page set the session time zone to UTC so the epoch seconds are reproducible:
scala
spark.conf.set("spark.sql.session.timeZone", "UTC")
Without that, results depend on whichever time zone the driver is running in — the same string converts to different epoch seconds in different zones. Always be explicit when serializing or comparing Unix timestamps.
Parsing strings to Unix seconds with unix_timestamp
unix_timestamp has three overloads — one with no arguments (current time), one that parses a column using the default pattern yyyy-MM-dd HH:mm:ss, and one that takes a custom format pattern:
def unix_timestamp(): Column
def unix_timestamp(s: Column): Column
def unix_timestamp(s: Column, p: String): Column
The single-argument form is the one you'll use most often. As long as your strings match yyyy-MM-dd HH:mm:ss, no pattern is needed:
val df = Seq(
"2026-01-15 09:30:45",
"2025-12-20 23:59:59",
"2025-07-04 00:00:01",
"2024-02-29 12:00:00",
).toDF("event_ts_str")
val df2 = df
.withColumn("unix_ts", unix_timestamp(col("event_ts_str")))
df2.show(false)
// +-------------------+----------+
// |event_ts_str |unix_ts |
// +-------------------+----------+
// |2026-01-15 09:30:45|1768469445|
// |2025-12-20 23:59:59|1766275199|
// |2025-07-04 00:00:01|1751587201|
// |2024-02-29 12:00:00|1709208000|
// +-------------------+----------+
df2.printSchema()
// root
// |-- event_ts_str: string (nullable = true)
// |-- unix_ts: long (nullable = true)
The output is a long — the number of seconds since 1970-01-01 00:00:00 UTC. Pass a Java date pattern as the second argument for any other input shape:
val df = Seq(
"01/15/2026 09:30:45",
"12/20/2025 23:59:59",
"07/04/2025 00:00:01",
"02/29/2024 12:00:00",
).toDF("event_ts_str")
val df2 = df
.withColumn("unix_ts", unix_timestamp(col("event_ts_str"), "MM/dd/yyyy HH:mm:ss"))
df2.show(false)
// +-------------------+----------+
// |event_ts_str |unix_ts |
// +-------------------+----------+
// |01/15/2026 09:30:45|1768469445|
// |12/20/2025 23:59:59|1766275199|
// |07/04/2025 00:00:01|1751587201|
// |02/29/2024 12:00:00|1709208000|
// +-------------------+----------+
The numeric results are identical to the first example — same instants in time, just parsed from a different string format.
Current time with unix_timestamp()
The no-argument form returns the current time as Unix seconds. It's evaluated once per query — every row sees the same value:
val df = Seq("row_1").toDF("example")
val df2 = df
.withColumn("now_unix", unix_timestamp())
df2.show(false)
// +-------+----------+
// |example|now_unix |
// +-------+----------+
// |row_1 |1778453103|
// +-------+----------+
This is useful for stamping a load time onto every row of a DataFrame in epoch-seconds form. If you want a timestamp column instead, reach for current_timestamp.
to_unix_timestamp via expr()
to_unix_timestamp does the same thing as the two-argument form of unix_timestamp, but it's a SQL-only function. There's no entry in org.apache.spark.sql.functions, so you call it through expr():
to_unix_timestamp(timeExp[, fmt]) — via expr()
val df = Seq(
"2026-01-15 09:30:45",
"2025-12-20 23:59:59",
"2025-07-04 00:00:01",
).toDF("event_ts_str")
val df2 = df
.withColumn("unix_ts_default", expr("to_unix_timestamp(event_ts_str)"))
.withColumn("unix_ts_custom", expr("to_unix_timestamp(event_ts_str, 'yyyy-MM-dd HH:mm:ss')"))
df2.show(false)
// +-------------------+---------------+--------------+
// |event_ts_str |unix_ts_default|unix_ts_custom|
// +-------------------+---------------+--------------+
// |2026-01-15 09:30:45|1768469445 |1768469445 |
// |2025-12-20 23:59:59|1766275199 |1766275199 |
// |2025-07-04 00:00:01|1751587201 |1751587201 |
// +-------------------+---------------+--------------+
For most cases you'll prefer unix_timestamp because it's a typed Scala function — better autocomplete, no string-embedded SQL, and the same output. to_unix_timestamp is mainly useful when you're already inside a SQL expression and don't want to break out of it.
Converting Unix seconds back to strings with from_unixtime
from_unixtime is the inverse of unix_timestamp — it takes a long of seconds since the epoch and renders it as a string in the session time zone.
def from_unixtime(ut: Column): Column
def from_unixtime(ut: Column, f: String): Column
The single-argument form uses the default pattern yyyy-MM-dd HH:mm:ss:
val df = Seq(
1768469445L,
1766275199L,
1751587201L,
1709208000L,
).toDF("unix_ts")
val df2 = df
.withColumn("event_ts_str", from_unixtime(col("unix_ts")))
df2.show(false)
// +----------+-------------------+
// |unix_ts |event_ts_str |
// +----------+-------------------+
// |1768469445|2026-01-15 09:30:45|
// |1766275199|2025-12-20 23:59:59|
// |1751587201|2025-07-04 00:00:01|
// |1709208000|2024-02-29 12:00:00|
// +----------+-------------------+
df2.printSchema()
// root
// |-- unix_ts: long (nullable = false)
// |-- event_ts_str: string (nullable = true)
Note that the output is a string, not a timestamp. If you need a real timestamp type, run the result through to_timestamp, or use timestamp_seconds to skip the string step entirely.
Pass a format pattern as the second argument to render in a different shape:
val df = Seq(
1768469445L,
1766275199L,
1751587201L,
1709208000L,
).toDF("unix_ts")
val df2 = df
.withColumn("formatted", from_unixtime(col("unix_ts"), "MMM d, yyyy h:mm a"))
df2.show(false)
// +----------+---------------------+
// |unix_ts |formatted |
// +----------+---------------------+
// |1768469445|Jan 15, 2026 9:30 AM |
// |1766275199|Dec 20, 2025 11:59 PM|
// |1751587201|Jul 4, 2025 12:00 AM |
// |1709208000|Feb 29, 2024 12:00 PM|
// +----------+---------------------+
The pattern syntax matches date_format — see the date_format examples for a complete tour.
Round-tripping strings through unix_timestamp and from_unixtime
A common pattern is converting timestamp strings to Unix seconds for storage or arithmetic, then back to a readable string for display. As long as the session time zone is the same on both ends, the round-trip is lossless for second-resolution timestamps:
val df = Seq(
"2026-01-15 09:30:45",
"2025-12-20 23:59:59",
"2025-07-04 00:00:01",
).toDF("event_ts_str")
val df2 = df
.withColumn("unix_ts", unix_timestamp(col("event_ts_str")))
.withColumn("round_trip", from_unixtime(unix_timestamp(col("event_ts_str"))))
df2.show(false)
// +-------------------+----------+-------------------+
// |event_ts_str |unix_ts |round_trip |
// +-------------------+----------+-------------------+
// |2026-01-15 09:30:45|1768469445|2026-01-15 09:30:45|
// |2025-12-20 23:59:59|1766275199|2025-12-20 23:59:59|
// |2025-07-04 00:00:01|1751587201|2025-07-04 00:00:01|
// +-------------------+----------+-------------------+
Sub-second precision is lost — these functions deal in whole seconds only. If you need milliseconds or microseconds, see timestamp_seconds, timestamp_millis, and timestamp_micros instead.
Handling invalid and null input
Strings that don't match the expected pattern produce null rather than throwing. So do null inputs and impossible dates:
val df = Seq(
"2026-01-15 09:30:45",
"not-a-timestamp",
null,
"2026-13-99 99:99:99",
).toDF("event_ts_str")
val df2 = df
.withColumn("unix_ts", unix_timestamp(col("event_ts_str")))
df2.show(false)
// +-------------------+----------+
// |event_ts_str |unix_ts |
// +-------------------+----------+
// |2026-01-15 09:30:45|1768469445|
// |not-a-timestamp |null |
// |null |null |
// |2026-13-99 99:99:99|null |
// +-------------------+----------+
Silent null substitution means parse failures won't blow up your job, but they also won't surface unless you look. Count or filter the nulls downstream rather than trusting that every row converted cleanly.
Related functions
To parse strings directly into timestamp or date types instead of long, see to_date and to_timestamp. To format a timestamp column as a string without going through epoch seconds, see date_format. For the current time as a timestamp rather than long, see current_date and current_timestamp.