Spark Scala Absolute Value: abs
The abs function returns the absolute value of a numeric column — the value with its sign stripped. It works on any numeric type (integers, longs, doubles, decimals) and is most often used when you care about the magnitude of a number but not its direction.
def abs(e: Column): Column
The return type matches the input type: pass in an IntegerType column and you get back an IntegerType column; pass in a DoubleType and you get a DoubleType.
val df = Seq(
-42,
-7,
0,
13,
256,
).toDF("value")
val df2 = df
.withColumn("absolute", abs(col("value")))
df2.show(false)
// +-----+--------+
// |value|absolute|
// +-----+--------+
// |-42 |42 |
// |-7 |7 |
// |0 |0 |
// |13 |13 |
// |256 |256 |
// +-----+--------+
Positive numbers and zero are returned unchanged. Negative numbers have their sign flipped.
Floating-Point Values
abs works the same way on Double and Float columns:
val df = Seq(
-3.14,
-0.5,
0.0,
2.71828,
99.999,
).toDF("value")
val df2 = df
.withColumn("absolute", abs(col("value")))
df2.show(false)
// +-------+--------+
// |value |absolute|
// +-------+--------+
// |-3.14 |3.14 |
// |-0.5 |0.5 |
// |0.0 |0.0 |
// |2.71828|2.71828 |
// |99.999 |99.999 |
// +-------+--------+
Computing Absolute Differences
A common use is computing the unsigned difference between two columns — for example, the error between an expected and actual value, where you don't care whether the actual was over or under:
val df = Seq(
("Alice", 100.00, 95.50),
("Bob", 80.25, 92.75),
("Carol", 60.00, 60.00),
("Dave", 45.10, 50.00),
("Eve", 110.00, 88.00),
).toDF("name", "expected", "actual")
val df2 = df
.withColumn("error", abs(col("expected") - col("actual")))
df2.show(false)
// +-----+--------+------+-----------------+
// |name |expected|actual|error |
// +-----+--------+------+-----------------+
// |Alice|100.0 |95.5 |4.5 |
// |Bob |80.25 |92.75 |12.5 |
// |Carol|60.0 |60.0 |0.0 |
// |Dave |45.1 |50.0 |4.899999999999999|
// |Eve |110.0 |88.0 |22.0 |
// +-----+--------+------+-----------------+
Without abs, Bob's error would be -12.5 and Eve's would be +22.0, and any aggregation like the mean would let positive and negative errors cancel each other out. Wrapping the difference in abs keeps every error contributing its true magnitude.
Note Dave's error: 4.899999999999999 instead of 4.9. That's standard IEEE 754 floating-point representation error from the subtraction, not something abs introduced. If you need exact decimal arithmetic, cast to DecimalType before subtracting, or round the result.
Nulls
abs is null-safe — a null input produces a null output:
val df = Seq(
Some(-5),
Some(3),
None,
Some(0),
Some(-100),
).toDF("value")
val df2 = df
.withColumn("absolute", abs(col("value")))
df2.show(false)
// +-----+--------+
// |value|absolute|
// +-----+--------+
// |-5 |5 |
// |3 |3 |
// |null |null |
// |0 |0 |
// |-100 |100 |
// +-----+--------+
If you need to treat nulls as zero, combine with coalesce: abs(coalesce(col("value"), lit(0))).