Spark Scala date_format
date_format converts a date, timestamp, or string column into a formatted string column using a pattern made of letters like yyyy, MM, dd, HH, and EEEE. Use it whenever you need a human-readable date label, a custom export format, or a partition-friendly column like 2026-01.
def date_format(dateExpr: Column, format: String): Column
The first argument is the date or timestamp column. The second argument is a format pattern — a string literal, not a column. The return type is always a string. If the input is a string column in yyyy-MM-dd or yyyy-MM-dd HH:mm:ss form, Spark casts it implicitly before formatting.
val df = Seq(
"2026-01-15",
"2026-02-04",
"2025-12-20",
"2025-07-04",
).toDF("event_date")
val df2 = df
.withColumn("us_format", date_format(col("event_date"), "MM/dd/yyyy"))
.withColumn("long_format", date_format(col("event_date"), "MMMM d, yyyy"))
.withColumn("short_day", date_format(col("event_date"), "EEE, MMM d"))
df2.show(false)
// +----------+----------+-----------------+-----------+
// |event_date|us_format |long_format |short_day |
// +----------+----------+-----------------+-----------+
// |2026-01-15|01/15/2026|January 15, 2026 |Thu, Jan 15|
// |2026-02-04|02/04/2026|February 4, 2026 |Wed, Feb 4 |
// |2025-12-20|12/20/2025|December 20, 2025|Sat, Dec 20|
// |2025-07-04|07/04/2025|July 4, 2025 |Fri, Jul 4 |
// +----------+----------+-----------------+-----------+
A few things to notice. MM produces a zero-padded number (01, 02), while M would produce 1, 2. MMMM is the full month name and MMM is the three-letter abbreviation. The day-of-week patterns EEEE and EEE work the same way. And d is the day of month without zero-padding, while dd is padded.
Common pattern letters
These are the patterns you'll reach for most often. The full set is documented in the Spark datetime patterns reference.
- Year:
yyyy(2026),yy(26) - Month:
MM(01),MMM(Jan),MMMM(January) - Day of month:
dd(15, zero-padded),d(15, no pad) - Day of week:
EEE(Thu),EEEE(Thursday) - Hour:
HH(09, 24-hour),h(9, 12-hour) - Minute / second:
mm(05),ss(45) - AM/PM marker:
a(AM,PM) - Quarter:
QQQ(Q1,Q4)
Any non-letter character in the pattern is rendered as a literal — that's why MM/dd/yyyy has slashes and MMMM d, yyyy has a comma. To include a literal letter (not as a pattern), wrap it in single quotes, like 'T' in an ISO-style timestamp.
Formatting timestamps
When the input has a time component, you can format both the date and time portions. The same function handles both:
val df = Seq(
"2026-01-15 09:30:45",
"2026-02-04 14:05:00",
"2025-12-20 23:59:59",
"2025-07-04 00:00:01",
).toDF("event_ts")
val df2 = df
.withColumn("date_only", date_format(col("event_ts"), "yyyy-MM-dd"))
.withColumn("time_only", date_format(col("event_ts"), "HH:mm:ss"))
.withColumn("twelve_hour", date_format(col("event_ts"), "h:mm a"))
.withColumn("iso_compact", date_format(col("event_ts"), "yyyyMMdd'T'HHmmss"))
df2.show(false)
// +-------------------+----------+---------+-----------+---------------+
// |event_ts |date_only |time_only|twelve_hour|iso_compact |
// +-------------------+----------+---------+-----------+---------------+
// |2026-01-15 09:30:45|2026-01-15|09:30:45 |9:30 AM |20260115T093045|
// |2026-02-04 14:05:00|2026-02-04|14:05:00 |2:05 PM |20260204T140500|
// |2025-12-20 23:59:59|2025-12-20|23:59:59 |11:59 PM |20251220T235959|
// |2025-07-04 00:00:01|2025-07-04|00:00:01 |12:00 AM |20250704T000001|
// +-------------------+----------+---------+-----------+---------------+
The iso_compact example shows how single quotes are used for literal letters: 'T' is rendered as a T between the date and time portions, instead of being interpreted as a pattern letter. Anything you wrap in single quotes is passed through verbatim.
Note h versus HH: h is 12-hour (so 14:05 becomes 2:05 PM) and HH is 24-hour. If you use h without a, you'll lose the AM/PM distinction and end up with ambiguous values like 2:05.
Extracting parts as strings
date_format is also a quick way to pull out individual date components when you want them as strings. This is handy for partitioning or grouping — for example, an hourly bucket column or a year_month field for monthly aggregates:
val df = Seq(
"2026-01-15",
"2026-02-04",
"2025-12-20",
"2025-07-04",
).toDF("event_date")
val df2 = df
.withColumn("year", date_format(col("event_date"), "yyyy"))
.withColumn("month", date_format(col("event_date"), "MMMM"))
.withColumn("day_name", date_format(col("event_date"), "EEEE"))
.withColumn("quarter", date_format(col("event_date"), "QQQ"))
.withColumn("year_mo", date_format(col("event_date"), "yyyy-MM"))
df2.show(false)
// +----------+----+--------+---------+-------+-------+
// |event_date|year|month |day_name |quarter|year_mo|
// +----------+----+--------+---------+-------+-------+
// |2026-01-15|2026|January |Thursday |Q1 |2026-01|
// |2026-02-04|2026|February|Wednesday|Q1 |2026-02|
// |2025-12-20|2025|December|Saturday |Q4 |2025-12|
// |2025-07-04|2025|July |Friday |Q3 |2025-07|
// +----------+----+--------+---------+-------+-------+
If you need the result as a number rather than a string, prefer the dedicated extractors — year, month, dayofmonth for date parts, hour, minute, second for time parts, or the more general date_part and extract. Those return integers; date_format always returns a string.
Null handling
If the input column is null, the result is null. The format string itself isn't checked against the input — Spark only computes a value when there's something to format:
val df = Seq(
Some("2026-01-15"),
None,
Some("2025-07-04"),
None,
).toDF("event_date")
val df2 = df
.withColumn("formatted", date_format(col("event_date"), "MMM d, yyyy"))
df2.show(false)
// +----------+------------+
// |event_date|formatted |
// +----------+------------+
// |2026-01-15|Jan 15, 2026|
// |null |null |
// |2025-07-04|Jul 4, 2025 |
// |null |null |
// +----------+------------+
Null in, null out — there's no built-in default. If you'd rather render nulls as a placeholder string like "unknown", wrap the result in coalesce with a lit("unknown") fallback.
Related functions
date_format is the formatter; the parser going in the other direction is to_date and to_timestamp, which take a string and a format pattern and produce a date or timestamp. For getting individual date parts as integers, see year, month, dayofmonth and hour, minute, second. For getting "today" or "now" as the input, see current_date and current_timestamp.