Spark Scala UTC Timezone Functions
from_utc_timestamp, to_utc_timestamp, and convert_timezone shift a timestamp column between time zones. Use them when your data is stored in one zone (almost always UTC for warehouse data) but you need to report, filter, or join in another — they handle daylight saving transitions for you, so you don't have to do offset math by hand.
The examples below set the session time zone to UTC so the rendered output is reproducible:
scala
spark.conf.set("spark.sql.session.timeZone", "UTC")
Without that, Spark renders timestamps in whatever zone the driver runs in, which makes results depend on the developer's laptop.
from_utc_timestamp
from_utc_timestamp takes a timestamp that represents a moment in UTC and returns the same moment expressed as a local time in the target zone. It's the function to reach for when your warehouse stores UTC and you need to render it for a user, region, or downstream system that thinks in a different zone.
The from_utc_timestamp function is defined as:
def from_utc_timestamp(ts: Column, tz: String): Column
def from_utc_timestamp(ts: Column, tz: Column): Column
The first overload takes the target zone as a String literal. The second takes a Column, which is useful when each row has its own zone — for example, a users table where every user has a timezone field.
val df = Seq(
"2026-01-15 14:30:00",
"2026-06-21 08:00:00",
"2026-12-25 23:00:00",
).toDF("utc_ts_str")
val df2 = df
.withColumn("utc_ts", to_timestamp(col("utc_ts_str")))
.withColumn("new_york", from_utc_timestamp(col("utc_ts"), "America/New_York"))
.withColumn("tokyo", from_utc_timestamp(col("utc_ts"), "Asia/Tokyo"))
df2.show(false)
// +-------------------+-------------------+-------------------+-------------------+
// |utc_ts_str |utc_ts |new_york |tokyo |
// +-------------------+-------------------+-------------------+-------------------+
// |2026-01-15 14:30:00|2026-01-15 14:30:00|2026-01-15 09:30:00|2026-01-15 23:30:00|
// |2026-06-21 08:00:00|2026-06-21 08:00:00|2026-06-21 04:00:00|2026-06-21 17:00:00|
// |2026-12-25 23:00:00|2026-12-25 23:00:00|2026-12-25 18:00:00|2026-12-26 08:00:00|
// +-------------------+-------------------+-------------------+-------------------+
Notice the January row: New York is -5 (EST) and Tokyo is +9. In June, New York shifts to -4 (EDT) because the function honors daylight saving rules. December 25 in Tokyo crosses midnight into December 26 — the date changes, not just the time.
Use full IANA zone IDs (America/New_York, Europe/London, Asia/Tokyo) rather than abbreviations like EST or JST. The abbreviations are ambiguous and don't carry DST rules; the IANA IDs do.
to_utc_timestamp
to_utc_timestamp is the inverse: it takes a timestamp that represents a local time in some zone and returns the equivalent moment in UTC. It's what you want when ingesting timestamps from a system that stores local time without a zone offset, and you need to normalize everything to UTC before further processing.
def to_utc_timestamp(ts: Column, tz: String): Column
def to_utc_timestamp(ts: Column, tz: Column): Column
val df = Seq(
("2026-01-15 09:30:00", "America/New_York"),
("2026-06-21 12:00:00", "Europe/London"),
("2026-12-25 18:00:00", "Asia/Tokyo"),
).toDF("local_ts_str", "tz")
val df2 = df
.withColumn("local_ts", to_timestamp(col("local_ts_str")))
.withColumn("utc_ts", to_utc_timestamp(col("local_ts"), col("tz")))
df2.show(false)
// +-------------------+----------------+-------------------+-------------------+
// |local_ts_str |tz |local_ts |utc_ts |
// +-------------------+----------------+-------------------+-------------------+
// |2026-01-15 09:30:00|America/New_York|2026-01-15 09:30:00|2026-01-15 14:30:00|
// |2026-06-21 12:00:00|Europe/London |2026-06-21 12:00:00|2026-06-21 11:00:00|
// |2026-12-25 18:00:00|Asia/Tokyo |2026-12-25 18:00:00|2026-12-25 09:00:00|
// +-------------------+----------------+-------------------+-------------------+
This example uses the Column overload — each row's zone comes from the tz column, which is what you'd do when ingesting events from a multi-region system where each event records its own local zone.
The June row shows London at +1 (BST), not +0 — to_utc_timestamp correctly applies British Summer Time. Tokyo at +9 shifts December 25 6pm local to December 25 9am UTC.
Daylight Saving Transitions
The interesting behavior of from_utc_timestamp and to_utc_timestamp shows up at DST boundaries. In 2026, US DST starts on March 8 at 2am local time — clocks spring forward to 3am. The function returns whatever the local zone says the time was at that UTC instant:
val df = Seq(
"2026-03-08 06:30:00",
"2026-03-08 07:30:00",
"2026-11-01 06:30:00",
).toDF("utc_ts_str")
val df2 = df
.withColumn("utc_ts", to_timestamp(col("utc_ts_str")))
.withColumn("new_york", from_utc_timestamp(col("utc_ts"), "America/New_York"))
df2.show(false)
// +-------------------+-------------------+-------------------+
// |utc_ts_str |utc_ts |new_york |
// +-------------------+-------------------+-------------------+
// |2026-03-08 06:30:00|2026-03-08 06:30:00|2026-03-08 01:30:00|
// |2026-03-08 07:30:00|2026-03-08 07:30:00|2026-03-08 03:30:00|
// |2026-11-01 06:30:00|2026-11-01 06:30:00|2026-11-01 01:30:00|
// +-------------------+-------------------+-------------------+
At UTC 06:30 on March 8 the offset is still -5 (EST), so New York reads 01:30. One hour later at UTC 07:30 the offset has shifted to -4 (EDT), and New York reads 03:30 — the 02:00–03:00 local hour was skipped. November 1 is the fall-back, where the offset returns to -5 and 06:30 UTC lands at 01:30 local. You'd have to write a lot of branching code to reproduce this by hand.
convert_timezone
convert_timezone is more general than the UTC-specific functions: it converts a timestamp from any source zone to any target zone in one call. It's a SQL function added in Spark 3.4.0, so it isn't exposed directly in the Scala functions object — call it through expr():
def convert_timezone(sourceTz, targetTz, sourceTs): Column — via expr()
def convert_timezone(targetTz, sourceTs): Column — via expr()
The three-argument form names the source zone explicitly. The two-argument form falls back to the session time zone for the source.
val df = Seq(
"2026-01-15 09:30:00",
"2026-06-21 12:00:00",
"2026-12-25 18:00:00",
).toDF("source_ts_str")
val df2 = df
.withColumn("source_ts", to_timestamp(col("source_ts_str")))
.withColumn("converted", expr("convert_timezone('America/New_York', 'Asia/Tokyo', source_ts)"))
df2.show(false)
// +-------------------+-------------------+-------------------+
// |source_ts_str |source_ts |converted |
// +-------------------+-------------------+-------------------+
// |2026-01-15 09:30:00|2026-01-15 09:30:00|2026-01-15 23:30:00|
// |2026-06-21 12:00:00|2026-06-21 12:00:00|2026-06-22 01:00:00|
// |2026-12-25 18:00:00|2026-12-25 18:00:00|2026-12-26 08:00:00|
// +-------------------+-------------------+-------------------+
January: New York 09:30 (-5) is Tokyo 23:30 (+9) — a +14-hour shift. June: New York 12:00 (-4 EDT) is Tokyo 01:00 the next day (+9) — a +13-hour shift. The difference between the two rows is DST, applied automatically.
The two-argument form uses the session time zone as the source, which is convenient when your DataFrame already represents times in the session zone:
val df = Seq(
"2026-01-15 09:30:00",
"2026-06-21 12:00:00",
"2026-12-25 18:00:00",
).toDF("source_ts_str")
val df2 = df
.withColumn("source_ts", to_timestamp(col("source_ts_str")))
.withColumn("in_tokyo", expr("convert_timezone('Asia/Tokyo', source_ts)"))
df2.show(false)
// +-------------------+-------------------+-------------------+
// |source_ts_str |source_ts |in_tokyo |
// +-------------------+-------------------+-------------------+
// |2026-01-15 09:30:00|2026-01-15 09:30:00|2026-01-15 18:30:00|
// |2026-06-21 12:00:00|2026-06-21 12:00:00|2026-06-21 21:00:00|
// |2026-12-25 18:00:00|2026-12-25 18:00:00|2026-12-26 03:00:00|
// +-------------------+-------------------+-------------------+
Since the session zone is UTC here, convert_timezone('Asia/Tokyo', source_ts) shifts each timestamp by +9 hours.
Which function to use
The three functions overlap, but they're not interchangeable:
- Use
from_utc_timestampwhen your column is already in UTC and you want it in a non-UTC zone. - Use
to_utc_timestampwhen your column is in some other zone and you want it in UTC. - Use
convert_timezonefor arbitrary zone-to-zone conversions, or when you want a single function that doesn't care which zone is "from" and which is "to".
If you're starting fresh in Spark 3.4 or later, convert_timezone covers all three use cases with one API. from_utc_timestamp and to_utc_timestamp predate it (both have been around since Spark 1.5.0) and are still the right choice if you need to support older Spark versions.
Related Functions
For converting between Unix epoch numbers and timestamp columns, see timestamp_seconds, unix_seconds, and related functions. For formatting a timestamp as a string with a custom pattern, see date_format. For pulling a specific field (year, month, hour) out of a timestamp, see date_part and extract.