Job Board
Consulting

ANSI Mode by Default in Spark 4.0: What Breaks and How to Fix It

Spark 4.0 flipped spark.sql.ansi.enabled from false to true, so invalid casts, arithmetic overflow, divide-by-zero, and bad array indices that used to silently return null now throw runtime errors. This guide catalogs each failure mode with the exception you'll see and the try_* function that fixes it without falling back to legacy mode.

If you want the broader upgrade context first, see Upgrading from Spark 3.x to Spark 4.0: A Practical Guide and What's New in Spark 4.0 for Scala Developers. This article zooms in on ANSI mode — the single change most likely to surface real bugs in working pipelines.


The Right Mental Model

ANSI mode is not a new feature. It is the SQL standard. Spark's pre-4.0 default of returning null on arithmetic overflow or invalid casts was the deviation. The change in Spark 4.0 brings the default into line with PostgreSQL, MySQL (in strict mode), and basically every other ANSI-compliant SQL engine.

That framing matters because it changes what you do when you hit a failure. The wrong instinct is "set spark.sql.ansi.enabled=false and move on." The right instinct is "the pipeline was producing silent garbage on this row — now I can see it." Each failure is a data quality issue you previously did not know you had.

For most failures, there is a try_* function that returns null for the exact case you wanted to tolerate, while keeping ANSI semantics everywhere else. That's the move: opt out per expression, not globally.


Failure 1: Invalid String → Numeric Cast

Pipelines that ingest CSVs, JSON strings, or upstream DataFrames with string-typed numeric fields will hit this first. Anywhere you have col("amount").cast("int") and the source is dirty, Spark 4.0 will now throw.

// Spark 3.x: returns null
// Spark 4.0: throws SparkNumberFormatException
//   [CAST_INVALID_INPUT] The value 'abc' of the type "STRING" cannot be cast to "INT"
import org.apache.spark.sql.functions._

df.select(col("amount").cast("int"))

// Fix — use try_cast (returns null for unparseable values)
df.select(expr("try_cast(amount AS INT)").as("amount"))

// SQL form
spark.sql("SELECT try_cast(amount AS INT) FROM source")

try_cast has the same signature surface as cast — same source/target type rules, same precision/format handling. The only difference is what happens on failure: null instead of an exception. If your pipeline has an upstream contract that the column is already valid, leave it as cast and let the failure surface bad data. If the column is known-dirty by design, switch to try_cast.

Failure 2: Numeric Cast Overflow

Casting a BIGINT to INT outside the int range used to silently truncate. In Spark 4.0 it throws.

// Spark 3.x: returns the truncated value
// Spark 4.0: throws SparkArithmeticException
//   [CAST_OVERFLOW] The value 2147483648L cannot be cast to "INT" due to an overflow
spark.sql("SELECT CAST(2147483648L AS INT)")

// Fix — try_cast returns null on overflow
spark.sql("SELECT try_cast(2147483648L AS INT)")  // null

This same exception surfaces on inserts when spark.sql.storeAssignmentPolicy=ANSI (also the new default). Inserting a BIGINT value into an INT column now produces [CAST_OVERFLOW_IN_TABLE_INSERT] instead of a silently truncated row. The fix is to widen the destination column or try_cast the input expression to match the column type before insert.

Failure 3: Integer Arithmetic Overflow

Addition, subtraction, and multiplication on INT and BIGINT columns now throw on overflow rather than wrapping around.

// Spark 3.x: wraps around silently to -2147483648
// Spark 4.0: throws SparkArithmeticException
//   [ARITHMETIC_OVERFLOW] integer overflow.
//   Use 'try_add' to tolerate overflow and return NULL instead.
spark.sql("SELECT 2147483647 + 1")

// Fix — use try_add / try_subtract / try_multiply
spark.sql("SELECT try_add(2147483647, 1)")       // null
spark.sql("SELECT try_subtract(-2147483648, 1)") // null
spark.sql("SELECT try_multiply(1000000000, 3)")  // null

The error message itself names the function to use, which is a nice touch from the Spark team. In practice, most production overflow bugs come from aggregations on row counts or unbounded counters that nobody expected to exceed Int.MaxValue. The cleanest long-term fix is usually to widen the column to BIGINT, not to wrap the expression in try_add. Reach for try_add when the input is genuinely bounded but you cannot trust upstream data.

Failure 4: Division by Zero

Division and modulo by zero used to return null. Now they throw.

// Spark 3.x: returns null
// Spark 4.0: throws SparkArithmeticException
//   [DIVIDE_BY_ZERO] Division by zero
spark.sql("SELECT 1 / 0")
spark.sql("SELECT 10 % 0")

// Fix — try_divide and try_mod
spark.sql("SELECT try_divide(1, 0)")  // null
spark.sql("SELECT try_mod(10, 0)")    // null

This is one of the cases where the behavior change is most likely to be silently masking a real bug. A divide-by-zero in a metric calculation almost always means an empty denominator (zero rows in a group, no matched join key) — surfacing it as an exception is usually what you want. If you genuinely need null on empty denominators, try_divide is correct.

Failure 5: Decimal Overflow in Aggregates

Sum and average over DECIMAL columns can exceed precision limits. ANSI mode throws; the legacy default returned null.

// Spark 3.x: returns null on overflow
// Spark 4.0: throws SparkArithmeticException on decimal overflow
spark.sql("SELECT sum(amount) FROM payments")

// Fix — try_sum and try_avg
spark.sql("SELECT try_sum(amount), try_avg(amount) FROM payments")

If you hit this regularly, the underlying issue is usually a DECIMAL(p, s) declaration with too small a precision for the scale of the data. Bumping precision is the durable fix; try_sum is the safe override.

Failure 6: Array and Map Index Out of Bounds

element_at on an out-of-bounds index, and the array subscript operator arr[i], now throw under ANSI mode.

// Spark 3.x: returns null
// Spark 4.0: throws SparkArrayIndexOutOfBoundsException
spark.sql("SELECT element_at(array(1, 2, 3), 10)")

// Fix — try_element_at
spark.sql("SELECT try_element_at(array(1, 2, 3), 10)")  // null

try_element_at works on both arrays and maps. For maps, the lookup-key-not-present case becomes null instead of an exception, which is what most code paths expect. If you build map columns from map_from_arrays and look up keys that may not exist, swap to try_element_at everywhere.

Failure 7: Date and Timestamp Parsing

to_date, to_timestamp, make_date, and make_timestamp now throw on unparseable inputs instead of returning null.

// Spark 3.x: returns null
// Spark 4.0: throws on invalid dates
spark.sql("SELECT to_date('2020-13-01')")     // invalid month
spark.sql("SELECT to_timestamp('not a date')")

// Fix — try_to_date, try_to_timestamp, try_make_date, try_make_timestamp
spark.sql("SELECT try_to_date('2020-13-01')")        // null
spark.sql("SELECT try_to_timestamp('not a date')")   // null
spark.sql("SELECT try_make_timestamp(2020, 13, 1, 0, 0, 0)")  // null

For the mechanics of these functions in normal use, see to_date and to_timestamp in Spark Scala. The try_* variants are drop-in replacements when your input is known-messy — for example, parsing a timestamp column from CSV where some rows have garbage values.

Failure 8: parse_url and Other Coercions

parse_url and a handful of other string-to-structured-data functions also moved from "return null" to "throw" under ANSI mode.

// Spark 3.x: returns null
// Spark 4.0: throws IllegalArgumentException
spark.sql("SELECT parse_url('not a valid url', 'HOST')")

// Fix — try_parse_url
spark.sql("SELECT try_parse_url('not a valid url', 'HOST')")  // null

The Practical Migration Recipe

Here's the workflow that has held up across actual upgrades:

  1. Disable ANSI mode and ship the upgrade first. Set spark.sql.ansi.enabled=false in your application config, run your test suite, and ship Spark 4.0. This decouples the runtime upgrade from the data-quality cleanup.

  2. Re-enable ANSI mode in a non-prod environment. Run your full pipeline against a representative sample of production data with ANSI on. Capture every exception. Each one is a row that was previously corrupted silently.

  3. Triage each failure. For each exception class, decide:
    • Keep the throw. If the failure points at upstream dirty data that should be rejected, leave the expression as-is and add a data-quality check or fail-fast in the loader.
    • Switch to try_*. If null is the genuinely correct semantic for that path, replace cast with try_cast, + with try_add, etc. Do this at the expression level, not globally.
    • Widen the type. Most overflow failures are best fixed by widening INT to BIGINT or bumping DECIMAL precision. Reaching for try_add to suppress overflow is usually masking the real bug.
  4. Re-enable ANSI in production. Once the test environment is clean, flip spark.sql.ansi.enabled back to its default. Keep it on going forward — it's the safety net for the next class of dirty data nobody anticipated.

The escape hatch — setting spark.sql.ansi.enabled=false permanently — is available, but you're sacrificing the most valuable behavior change in Spark 4.0 to avoid two days of data triage. Pay the migration cost once.


Quick Reference: Operation → try_* Function

Operation Throws under ANSI Use this instead
CAST(x AS T) (invalid input) SparkNumberFormatException try_cast(x AS T)
CAST(x AS T) (overflow) SparkArithmeticException [CAST_OVERFLOW] try_cast(x AS T)
a + b, a - b, a * b (overflow) SparkArithmeticException [ARITHMETIC_OVERFLOW] try_add, try_subtract, try_multiply
a / b, a % b (zero divisor) SparkArithmeticException [DIVIDE_BY_ZERO] try_divide, try_mod
sum(decimal), avg(decimal) (overflow) SparkArithmeticException try_sum, try_avg
element_at(arr, i), arr[i] SparkArrayIndexOutOfBoundsException try_element_at
to_date, to_timestamp (unparseable) parsing/format exception try_to_date, try_to_timestamp
make_date, make_timestamp (invalid) IllegalArgumentException try_make_date, try_make_timestamp
parse_url (invalid URL) IllegalArgumentException try_parse_url
INSERT value into smaller column [CAST_OVERFLOW_IN_TABLE_INSERT] try_cast input, or widen column

The Bottom Line

ANSI mode by default is the right call. Silent nulls and silent overflow are how data quality bugs survive for years in production pipelines. Spark 4.0 forces the conversation, and the try_* family gives you per-expression control to keep the safety net everywhere except the specific paths where null is the real semantic.

For the authoritative reference on every behavior covered by ANSI mode, see the Spark ANSI Compliance documentation and the SQL migration guide.

Article Details

Created: 2026-05-09

Last Updated: 2026-05-09 11:07:35 PM