Job Board
Consulting

Spark Scala Timestamp Conversion Functions

Six paired functions convert between Unix epoch numbers and timestamp columns at three precision levels: timestamp_seconds, timestamp_millis, and timestamp_micros go from a numeric column to a timestamp, and unix_seconds, unix_millis, and unix_micros go the other way. Reach for them when you're ingesting long epoch values from Kafka, log files, or APIs and need a real timestamp type to filter, format, or join on — or when you need to emit timestamps as numeric values for downstream systems.

These functions return or accept a true timestamp column. That's the key difference from unix_timestamp and from_unixtime, which produce a long and a string respectively. Reach for the conversion functions on this page when you want the result typed as a timestamp, and when you need sub-second precision.

The examples below set the session time zone to UTC so the epoch numbers are reproducible:

scala spark.conf.set("spark.sql.session.timeZone", "UTC")

Without that, results depend on whichever zone the driver runs in.

timestamp_seconds

timestamp_seconds takes a column of long seconds since the Unix epoch and returns a timestamp. It's the only one of the six functions in this group that's exposed directly in the Scala functions API — the rest are SQL-only.

The timestamp_seconds function first appeared in version 3.1.0 and is defined as:

def timestamp_seconds(e: Column): Column
val df = Seq(
  1768469445L,
  1766275199L,
  1751587201L,
  1709208000L,
).toDF("unix_seconds")

val df2 = df
  .withColumn("event_ts", timestamp_seconds(col("unix_seconds")))

df2.show(false)
// +------------+-------------------+
// |unix_seconds|event_ts           |
// +------------+-------------------+
// |1768469445  |2026-01-15 09:30:45|
// |1766275199  |2025-12-20 23:59:59|
// |1751587201  |2025-07-04 00:00:01|
// |1709208000  |2024-02-29 12:00:00|
// +------------+-------------------+

df2.printSchema()
// root
//  |-- unix_seconds: long (nullable = false)
//  |-- event_ts: timestamp (nullable = false)

Note the timestamp type on the output — that's what makes this function more useful than from_unixtime when you plan to do further timestamp work. You can call date_format, year, hour, date_add, or any other date/time function directly on it.

timestamp_millis and timestamp_micros via expr()

The millisecond and microsecond variants aren't exposed in the Scala API, so you call them through expr():

timestamp_millis(milliseconds) — via expr()

timestamp_micros(microseconds) — via expr()

Both functions first appeared in version 3.1.0. They're useful when your epoch numbers carry sub-second precision — for instance, JavaScript's Date.now() returns milliseconds, and many trace/log systems emit microseconds:

val df = Seq(
  (1768469445123L, 1768469445123456L),
  (1766275199500L, 1766275199500000L),
  (1751587201000L, 1751587201000001L),
).toDF("unix_millis", "unix_micros")

val df2 = df
  .withColumn("ts_from_millis", expr("timestamp_millis(unix_millis)"))
  .withColumn("ts_from_micros", expr("timestamp_micros(unix_micros)"))

df2.show(false)
// +-------------+----------------+-----------------------+--------------------------+
// |unix_millis  |unix_micros     |ts_from_millis         |ts_from_micros            |
// +-------------+----------------+-----------------------+--------------------------+
// |1768469445123|1768469445123456|2026-01-15 09:30:45.123|2026-01-15 09:30:45.123456|
// |1766275199500|1766275199500000|2025-12-20 23:59:59.5  |2025-12-20 23:59:59.5     |
// |1751587201000|1751587201000001|2025-07-04 00:00:01    |2025-07-04 00:00:01.000001|
// +-------------+----------------+-----------------------+--------------------------+

Two things to notice. First, Spark renders trailing zeros in the fractional component lazily — 23:59:59.500 shows as 23:59:59.5, and an exact second shows with no fractional part at all. The underlying value still has microsecond precision; it's just a display quirk. Second, timestamp_micros preserves the single trailing microsecond in 1751587201000001 (00:00:01.000001), demonstrating that Spark's timestamp type stores microsecond resolution natively.

unix_seconds, unix_millis, and unix_micros via expr()

Going the other direction, unix_seconds, unix_millis, and unix_micros take a timestamp and return a long. All three are SQL-only:

unix_seconds(timestamp) — via expr()

unix_millis(timestamp) — via expr()

unix_micros(timestamp) — via expr()

All three first appeared in version 3.1.0. unix_seconds and unix_millis truncate any precision finer than what they return — unix_seconds(2026-01-15 09:30:45.999) returns 1768469445, not 1768469446. unix_micros returns the full microsecond value Spark has stored:

val df = Seq(
  "2026-01-15 09:30:45.123456",
  "2025-12-20 23:59:59.500000",
  "2025-07-04 00:00:01.000001",
).toDF("event_ts_str")

val df2 = df
  .withColumn("event_ts",    to_timestamp(col("event_ts_str")))
  .withColumn("unix_secs",   expr("unix_seconds(event_ts)"))
  .withColumn("unix_millis", expr("unix_millis(event_ts)"))
  .withColumn("unix_micros", expr("unix_micros(event_ts)"))

df2.show(false)
// +--------------------------+--------------------------+----------+-------------+----------------+
// |event_ts_str              |event_ts                  |unix_secs |unix_millis  |unix_micros     |
// +--------------------------+--------------------------+----------+-------------+----------------+
// |2026-01-15 09:30:45.123456|2026-01-15 09:30:45.123456|1768469445|1768469445123|1768469445123456|
// |2025-12-20 23:59:59.500000|2025-12-20 23:59:59.5     |1766275199|1766275199500|1766275199500000|
// |2025-07-04 00:00:01.000001|2025-07-04 00:00:01.000001|1751587201|1751587201000|1751587201000001|
// +--------------------------+--------------------------+----------+-------------+----------------+

Look at the third row: the timestamp 00:00:01.000001 produces unix_millis = 1751587201000 (the trailing microsecond is dropped) and unix_micros = 1751587201000001 (the trailing microsecond survives). Pick the function that matches the precision the consumer expects — if downstream is a JavaScript Date or a Kafka header that stores milliseconds, use unix_millis.

Round-tripping through epoch micros

A common pattern is serializing timestamps as long for storage or wire transfer, then reconstructing them on read. With unix_micros and timestamp_micros, the round-trip is lossless for any value Spark's timestamp type can represent:

val df = Seq(
  "2026-01-15 09:30:45.123456",
  "2025-12-20 23:59:59.500000",
  "2025-07-04 00:00:01.000001",
).toDF("event_ts_str")

val df2 = df
  .withColumn("event_ts",   to_timestamp(col("event_ts_str")))
  .withColumn("unix_micros", expr("unix_micros(event_ts)"))
  .withColumn("round_trip", expr("timestamp_micros(unix_micros(event_ts))"))

df2.show(false)
// +--------------------------+--------------------------+----------------+--------------------------+
// |event_ts_str              |event_ts                  |unix_micros     |round_trip                |
// +--------------------------+--------------------------+----------------+--------------------------+
// |2026-01-15 09:30:45.123456|2026-01-15 09:30:45.123456|1768469445123456|2026-01-15 09:30:45.123456|
// |2025-12-20 23:59:59.500000|2025-12-20 23:59:59.5     |1766275199500000|2025-12-20 23:59:59.5     |
// |2025-07-04 00:00:01.000001|2025-07-04 00:00:01.000001|1751587201000001|2025-07-04 00:00:01.000001|
// +--------------------------+--------------------------+----------------+--------------------------+

Choosing unix_seconds/timestamp_seconds or unix_millis/timestamp_millis for the round-trip silently drops sub-second or sub-millisecond detail — fine if you don't care about that precision, but worth being deliberate about.

Nulls and edge cases

Null inputs produce null outputs across all six functions. Negative values are accepted and represent dates before the epoch — -86400 is exactly one day before 1970-01-01:

val df = Seq(
  Some(1768469445L),
  None,
  Some(0L),
  Some(-86400L),
).toDF("unix_seconds")

val df2 = df
  .withColumn("event_ts", timestamp_seconds(col("unix_seconds")))

df2.show(false)
// +------------+-------------------+
// |unix_seconds|event_ts           |
// +------------+-------------------+
// |1768469445  |2026-01-15 09:30:45|
// |null        |null               |
// |0           |1970-01-01 00:00:00|
// |-86400      |1969-12-31 00:00:00|
// +------------+-------------------+

The same behavior applies to timestamp_millis, timestamp_micros, and the three unix_* functions when called on null timestamps.

For converting between long epoch seconds and formatted strings (rather than timestamp columns), see unix_timestamp and from_unixtime. To parse strings directly into timestamp or date types, see to_date and to_timestamp. For getting the current time as a timestamp, see current_date and current_timestamp. To extract calendar fields from a timestamp after conversion, see year, month, and day and hour, minute, and second.

Example Details

Created: 2026-05-10 11:05:31 PM

Last Updated: 2026-05-10 11:05:31 PM