Spark Scala UTC Timezone Functions

from_utc_timestamp, to_utc_timestamp, and convert_timezone shift a timestamp column between time zones. Use them when your data is stored in one zone (almost always UTC for warehouse data) but you need to report, filter, or join in another — they handle daylight saving transitions for you, so you don't have to do offset math by hand.

The examples below set the session time zone to UTC so the rendered output is reproducible:

scala spark.conf.set("spark.sql.session.timeZone", "UTC")

Without that, Spark renders timestamps in whatever zone the driver runs in, which makes results depend on the developer's laptop.

from_utc_timestamp

from_utc_timestamp takes a timestamp that represents a moment in UTC and returns the same moment expressed as a local time in the target zone. It's the function to reach for when your warehouse stores UTC and you need to render it for a user, region, or downstream system that thinks in a different zone.

The from_utc_timestamp function is defined as:

def from_utc_timestamp(ts: Column, tz: String): Column

def from_utc_timestamp(ts: Column, tz: Column): Column

The first overload takes the target zone as a String literal. The second takes a Column, which is useful when each row has its own zone — for example, a users table where every user has a timezone field.

val df = Seq(
  "2026-01-15 14:30:00",
  "2026-06-21 08:00:00",
  "2026-12-25 23:00:00",
).toDF("utc_ts_str")

val df2 = df
  .withColumn("utc_ts",   to_timestamp(col("utc_ts_str")))
  .withColumn("new_york", from_utc_timestamp(col("utc_ts"), "America/New_York"))
  .withColumn("tokyo",    from_utc_timestamp(col("utc_ts"), "Asia/Tokyo"))

df2.show(false)
// +-------------------+-------------------+-------------------+-------------------+
// |utc_ts_str         |utc_ts             |new_york           |tokyo              |
// +-------------------+-------------------+-------------------+-------------------+
// |2026-01-15 14:30:00|2026-01-15 14:30:00|2026-01-15 09:30:00|2026-01-15 23:30:00|
// |2026-06-21 08:00:00|2026-06-21 08:00:00|2026-06-21 04:00:00|2026-06-21 17:00:00|
// |2026-12-25 23:00:00|2026-12-25 23:00:00|2026-12-25 18:00:00|2026-12-26 08:00:00|
// +-------------------+-------------------+-------------------+-------------------+

Notice the January row: New York is -5 (EST) and Tokyo is +9. In June, New York shifts to -4 (EDT) because the function honors daylight saving rules. December 25 in Tokyo crosses midnight into December 26 — the date changes, not just the time.

Use full IANA zone IDs (America/New_York, Europe/London, Asia/Tokyo) rather than abbreviations like EST or JST. The abbreviations are ambiguous and don't carry DST rules; the IANA IDs do.

to_utc_timestamp

to_utc_timestamp is the inverse: it takes a timestamp that represents a local time in some zone and returns the equivalent moment in UTC. It's what you want when ingesting timestamps from a system that stores local time without a zone offset, and you need to normalize everything to UTC before further processing.

def to_utc_timestamp(ts: Column, tz: String): Column

def to_utc_timestamp(ts: Column, tz: Column): Column

val df = Seq(
  ("2026-01-15 09:30:00", "America/New_York"),
  ("2026-06-21 12:00:00", "Europe/London"),
  ("2026-12-25 18:00:00", "Asia/Tokyo"),
).toDF("local_ts_str", "tz")

val df2 = df
  .withColumn("local_ts", to_timestamp(col("local_ts_str")))
  .withColumn("utc_ts",   to_utc_timestamp(col("local_ts"), col("tz")))

df2.show(false)
// +-------------------+----------------+-------------------+-------------------+
// |local_ts_str       |tz              |local_ts           |utc_ts             |
// +-------------------+----------------+-------------------+-------------------+
// |2026-01-15 09:30:00|America/New_York|2026-01-15 09:30:00|2026-01-15 14:30:00|
// |2026-06-21 12:00:00|Europe/London   |2026-06-21 12:00:00|2026-06-21 11:00:00|
// |2026-12-25 18:00:00|Asia/Tokyo      |2026-12-25 18:00:00|2026-12-25 09:00:00|
// +-------------------+----------------+-------------------+-------------------+

This example uses the Column overload — each row's zone comes from the tz column, which is what you'd do when ingesting events from a multi-region system where each event records its own local zone.

The June row shows London at +1 (BST), not +0 — to_utc_timestamp correctly applies British Summer Time. Tokyo at +9 shifts December 25 6pm local to December 25 9am UTC.

Daylight Saving Transitions

The interesting behavior of from_utc_timestamp and to_utc_timestamp shows up at DST boundaries. In 2026, US DST starts on March 8 at 2am local time — clocks spring forward to 3am. The function returns whatever the local zone says the time was at that UTC instant:

val df = Seq(
  "2026-03-08 06:30:00",
  "2026-03-08 07:30:00",
  "2026-11-01 06:30:00",
).toDF("utc_ts_str")

val df2 = df
  .withColumn("utc_ts",   to_timestamp(col("utc_ts_str")))
  .withColumn("new_york", from_utc_timestamp(col("utc_ts"), "America/New_York"))

df2.show(false)
// +-------------------+-------------------+-------------------+
// |utc_ts_str         |utc_ts             |new_york           |
// +-------------------+-------------------+-------------------+
// |2026-03-08 06:30:00|2026-03-08 06:30:00|2026-03-08 01:30:00|
// |2026-03-08 07:30:00|2026-03-08 07:30:00|2026-03-08 03:30:00|
// |2026-11-01 06:30:00|2026-11-01 06:30:00|2026-11-01 01:30:00|
// +-------------------+-------------------+-------------------+

At UTC 06:30 on March 8 the offset is still -5 (EST), so New York reads 01:30. One hour later at UTC 07:30 the offset has shifted to -4 (EDT), and New York reads 03:30 — the 02:00–03:00 local hour was skipped. November 1 is the fall-back, where the offset returns to -5 and 06:30 UTC lands at 01:30 local. You'd have to write a lot of branching code to reproduce this by hand.

convert_timezone

convert_timezone is more general than the UTC-specific functions: it converts a timestamp from any source zone to any target zone in one call. It's a SQL function added in Spark 3.4.0, so it isn't exposed directly in the Scala functions object — call it through expr():

def convert_timezone(sourceTz, targetTz, sourceTs): Column — via expr()

def convert_timezone(targetTz, sourceTs): Column — via expr()

The three-argument form names the source zone explicitly. The two-argument form falls back to the session time zone for the source.

val df = Seq(
  "2026-01-15 09:30:00",
  "2026-06-21 12:00:00",
  "2026-12-25 18:00:00",
).toDF("source_ts_str")

val df2 = df
  .withColumn("source_ts", to_timestamp(col("source_ts_str")))
  .withColumn("converted", expr("convert_timezone('America/New_York', 'Asia/Tokyo', source_ts)"))

df2.show(false)
// +-------------------+-------------------+-------------------+
// |source_ts_str      |source_ts          |converted          |
// +-------------------+-------------------+-------------------+
// |2026-01-15 09:30:00|2026-01-15 09:30:00|2026-01-15 23:30:00|
// |2026-06-21 12:00:00|2026-06-21 12:00:00|2026-06-22 01:00:00|
// |2026-12-25 18:00:00|2026-12-25 18:00:00|2026-12-26 08:00:00|
// +-------------------+-------------------+-------------------+

January: New York 09:30 (-5) is Tokyo 23:30 (+9) — a +14-hour shift. June: New York 12:00 (-4 EDT) is Tokyo 01:00 the next day (+9) — a +13-hour shift. The difference between the two rows is DST, applied automatically.

The two-argument form uses the session time zone as the source, which is convenient when your DataFrame already represents times in the session zone:

val df = Seq(
  "2026-01-15 09:30:00",
  "2026-06-21 12:00:00",
  "2026-12-25 18:00:00",
).toDF("source_ts_str")

val df2 = df
  .withColumn("source_ts", to_timestamp(col("source_ts_str")))
  .withColumn("in_tokyo",  expr("convert_timezone('Asia/Tokyo', source_ts)"))

df2.show(false)
// +-------------------+-------------------+-------------------+
// |source_ts_str      |source_ts          |in_tokyo           |
// +-------------------+-------------------+-------------------+
// |2026-01-15 09:30:00|2026-01-15 09:30:00|2026-01-15 18:30:00|
// |2026-06-21 12:00:00|2026-06-21 12:00:00|2026-06-21 21:00:00|
// |2026-12-25 18:00:00|2026-12-25 18:00:00|2026-12-26 03:00:00|
// +-------------------+-------------------+-------------------+

Since the session zone is UTC here, convert_timezone('Asia/Tokyo', source_ts) shifts each timestamp by +9 hours.

Which function to use

The three functions overlap, but they're not interchangeable:

Use from_utc_timestamp when your column is already in UTC and you want it in a non-UTC zone.
Use to_utc_timestamp when your column is in some other zone and you want it in UTC.
Use convert_timezone for arbitrary zone-to-zone conversions, or when you want a single function that doesn't care which zone is "from" and which is "to".

If you're starting fresh in Spark 3.4 or later, convert_timezone covers all three use cases with one API. from_utc_timestamp and to_utc_timestamp predate it (both have been around since Spark 1.5.0) and are still the right choice if you need to support older Spark versions.