Job Board
Consulting

Spark Scala date_format

date_format converts a date, timestamp, or string column into a formatted string column using a pattern made of letters like yyyy, MM, dd, HH, and EEEE. Use it whenever you need a human-readable date label, a custom export format, or a partition-friendly column like 2026-01.

def date_format(dateExpr: Column, format: String): Column

The first argument is the date or timestamp column. The second argument is a format pattern — a string literal, not a column. The return type is always a string. If the input is a string column in yyyy-MM-dd or yyyy-MM-dd HH:mm:ss form, Spark casts it implicitly before formatting.

val df = Seq(
  "2026-01-15",
  "2026-02-04",
  "2025-12-20",
  "2025-07-04",
).toDF("event_date")

val df2 = df
  .withColumn("us_format",   date_format(col("event_date"), "MM/dd/yyyy"))
  .withColumn("long_format", date_format(col("event_date"), "MMMM d, yyyy"))
  .withColumn("short_day",   date_format(col("event_date"), "EEE, MMM d"))

df2.show(false)
// +----------+----------+-----------------+-----------+
// |event_date|us_format |long_format      |short_day  |
// +----------+----------+-----------------+-----------+
// |2026-01-15|01/15/2026|January 15, 2026 |Thu, Jan 15|
// |2026-02-04|02/04/2026|February 4, 2026 |Wed, Feb 4 |
// |2025-12-20|12/20/2025|December 20, 2025|Sat, Dec 20|
// |2025-07-04|07/04/2025|July 4, 2025     |Fri, Jul 4 |
// +----------+----------+-----------------+-----------+

A few things to notice. MM produces a zero-padded number (01, 02), while M would produce 1, 2. MMMM is the full month name and MMM is the three-letter abbreviation. The day-of-week patterns EEEE and EEE work the same way. And d is the day of month without zero-padding, while dd is padded.

Common pattern letters

These are the patterns you'll reach for most often. The full set is documented in the Spark datetime patterns reference.

  • Year: yyyy (2026), yy (26)
  • Month: MM (01), MMM (Jan), MMMM (January)
  • Day of month: dd (15, zero-padded), d (15, no pad)
  • Day of week: EEE (Thu), EEEE (Thursday)
  • Hour: HH (09, 24-hour), h (9, 12-hour)
  • Minute / second: mm (05), ss (45)
  • AM/PM marker: a (AM, PM)
  • Quarter: QQQ (Q1, Q4)

Any non-letter character in the pattern is rendered as a literal — that's why MM/dd/yyyy has slashes and MMMM d, yyyy has a comma. To include a literal letter (not as a pattern), wrap it in single quotes, like 'T' in an ISO-style timestamp.

Formatting timestamps

When the input has a time component, you can format both the date and time portions. The same function handles both:

val df = Seq(
  "2026-01-15 09:30:45",
  "2026-02-04 14:05:00",
  "2025-12-20 23:59:59",
  "2025-07-04 00:00:01",
).toDF("event_ts")

val df2 = df
  .withColumn("date_only",   date_format(col("event_ts"), "yyyy-MM-dd"))
  .withColumn("time_only",   date_format(col("event_ts"), "HH:mm:ss"))
  .withColumn("twelve_hour", date_format(col("event_ts"), "h:mm a"))
  .withColumn("iso_compact", date_format(col("event_ts"), "yyyyMMdd'T'HHmmss"))

df2.show(false)
// +-------------------+----------+---------+-----------+---------------+
// |event_ts           |date_only |time_only|twelve_hour|iso_compact    |
// +-------------------+----------+---------+-----------+---------------+
// |2026-01-15 09:30:45|2026-01-15|09:30:45 |9:30 AM    |20260115T093045|
// |2026-02-04 14:05:00|2026-02-04|14:05:00 |2:05 PM    |20260204T140500|
// |2025-12-20 23:59:59|2025-12-20|23:59:59 |11:59 PM   |20251220T235959|
// |2025-07-04 00:00:01|2025-07-04|00:00:01 |12:00 AM   |20250704T000001|
// +-------------------+----------+---------+-----------+---------------+

The iso_compact example shows how single quotes are used for literal letters: 'T' is rendered as a T between the date and time portions, instead of being interpreted as a pattern letter. Anything you wrap in single quotes is passed through verbatim.

Note h versus HH: h is 12-hour (so 14:05 becomes 2:05 PM) and HH is 24-hour. If you use h without a, you'll lose the AM/PM distinction and end up with ambiguous values like 2:05.

Extracting parts as strings

date_format is also a quick way to pull out individual date components when you want them as strings. This is handy for partitioning or grouping — for example, an hourly bucket column or a year_month field for monthly aggregates:

val df = Seq(
  "2026-01-15",
  "2026-02-04",
  "2025-12-20",
  "2025-07-04",
).toDF("event_date")

val df2 = df
  .withColumn("year",     date_format(col("event_date"), "yyyy"))
  .withColumn("month",    date_format(col("event_date"), "MMMM"))
  .withColumn("day_name", date_format(col("event_date"), "EEEE"))
  .withColumn("quarter",  date_format(col("event_date"), "QQQ"))
  .withColumn("year_mo",  date_format(col("event_date"), "yyyy-MM"))

df2.show(false)
// +----------+----+--------+---------+-------+-------+
// |event_date|year|month   |day_name |quarter|year_mo|
// +----------+----+--------+---------+-------+-------+
// |2026-01-15|2026|January |Thursday |Q1     |2026-01|
// |2026-02-04|2026|February|Wednesday|Q1     |2026-02|
// |2025-12-20|2025|December|Saturday |Q4     |2025-12|
// |2025-07-04|2025|July    |Friday   |Q3     |2025-07|
// +----------+----+--------+---------+-------+-------+

If you need the result as a number rather than a string, prefer the dedicated extractors — year, month, dayofmonth for date parts, hour, minute, second for time parts, or the more general date_part and extract. Those return integers; date_format always returns a string.

Null handling

If the input column is null, the result is null. The format string itself isn't checked against the input — Spark only computes a value when there's something to format:

val df = Seq(
  Some("2026-01-15"),
  None,
  Some("2025-07-04"),
  None,
).toDF("event_date")

val df2 = df
  .withColumn("formatted", date_format(col("event_date"), "MMM d, yyyy"))

df2.show(false)
// +----------+------------+
// |event_date|formatted   |
// +----------+------------+
// |2026-01-15|Jan 15, 2026|
// |null      |null        |
// |2025-07-04|Jul 4, 2025 |
// |null      |null        |
// +----------+------------+

Null in, null out — there's no built-in default. If you'd rather render nulls as a placeholder string like "unknown", wrap the result in coalesce with a lit("unknown") fallback.

date_format is the formatter; the parser going in the other direction is to_date and to_timestamp, which take a string and a format pattern and produce a date or timestamp. For getting individual date parts as integers, see year, month, dayofmonth and hour, minute, second. For getting "today" or "now" as the input, see current_date and current_timestamp.

Example Details

Created: 2026-05-05 10:19:22 PM

Last Updated: 2026-05-05 10:19:22 PM