Spark Scala Exp and Expm1
Spark provides two exponential functions: exp computes e^x and expm1 computes e^x - 1. They're the inverses of log and log1p respectively, and you'll reach for them whenever you need to undo a log transform, compute compound growth, or work with continuous decay.
exp
exp computes e raised to the power of the input column, where e ≈ 2.71828 is Euler's number:
def exp(e: Column): Column
def exp(columnName: String): Column
The result is a Double. exp(0) is 1, exp(1) is e, and the function grows quickly as the input increases:
val df = Seq(
0.0,
1.0,
2.0,
-1.0,
0.5,
).toDF("value")
val df2 = df
.withColumn("exp", exp(col("value")))
df2.show(false)
// +-----+-------------------+
// |value|exp |
// +-----+-------------------+
// |0.0 |1.0 |
// |1.0 |2.7182818284590455 |
// |2.0 |7.38905609893065 |
// |-1.0 |0.36787944117144233|
// |0.5 |1.6487212707001282 |
// +-----+-------------------+
Negative inputs produce values between 0 and 1 — exp(-1) is 1/e. Fractional inputs land between integer powers — exp(0.5) is the square root of e.
Inverse of log
exp and log are inverses: exp(log(x)) = x for any positive x. This is the most common reason to reach for exp in practice — you log-transformed a column to stabilize variance or fit a model, and now you need to bring predictions back to the original scale:
val df = Seq(
1.0,
2.718281828459045,
10.0,
100.0,
0.5,
).toDF("value")
val df2 = df
.withColumn("ln", log(col("value")))
.withColumn("exp_of_ln", exp(log(col("value"))))
df2.show(false)
// +-----------------+-------------------+------------------+
// |value |ln |exp_of_ln |
// +-----------------+-------------------+------------------+
// |1.0 |0.0 |1.0 |
// |2.718281828459045|1.0 |2.7182818284590455|
// |10.0 |2.302585092994046 |10.000000000000002|
// |100.0 |4.605170185988092 |100.00000000000004|
// |0.5 |-0.6931471805599453|0.5 |
// +-----------------+-------------------+------------------+
The small drift on 100.0 (showing 100.00000000000004) is normal IEEE 754 rounding from the two-step conversion. If you need exact round-tripping for display, cast to a fixed scale or round explicitly.
expm1
expm1 computes e^x - 1. The point of having a dedicated function rather than writing exp(x) - 1 is numerical precision when x is very close to zero:
def expm1(e: Column): Column
def expm1(columnName: String): Column
For small x, e^x is just barely larger than 1, and subtracting 1 from a result like 1.0001000050001667 throws away leading digits — you lose precision exactly where you need it most. expm1 computes the difference in a way that preserves those digits:
val df = Seq(
0.0,
0.0001,
0.01,
1.0,
5.0,
).toDF("value")
val df2 = df
.withColumn("exp_minus_1", exp(col("value")) - lit(1.0))
.withColumn("expm1", expm1(col("value")))
df2.show(false)
// +------+--------------------+---------------------+
// |value |exp_minus_1 |expm1 |
// +------+--------------------+---------------------+
// |0.0 |0.0 |0.0 |
// |1.0E-4|1.000050001667141E-4|1.0000500016667084E-4|
// |0.01 |0.010050167084167949|0.010050167084168058 |
// |1.0 |1.7182818284590455 |1.718281828459045 |
// |5.0 |147.4131591025766 |147.4131591025766 |
// +------+--------------------+---------------------+
At x = 0.0001, the two columns diverge in the 14th significant digit — expm1 keeps the digits 1.0000500016667... while exp(x) - 1 rounds to 1.0000500016671.... The gap looks small here but compounds over many operations and matters for log-likelihood calculations, continuous-compounding interest, and any "small percentage change" arithmetic. As x grows, the advantage disappears and both columns agree — by x = 5 they're identical.
expm1 is also the inverse of log1p: expm1(log1p(x)) = x. If you used log1p to handle a column that could contain zeros, use expm1 to invert it on the way back out.
Special Inputs: Nulls, Overflow, and Underflow
exp follows the standard IEEE 754 rules: nulls pass through, very negative inputs underflow to 0, and very positive inputs overflow to positive infinity. expm1 behaves the same way except that its underflow value is -1 (since e^(-large) - 1 approaches -1 rather than 0):
val df = Seq(
Some(0.0),
Some(1.0),
Some(-1000.0),
Some(1000.0),
None,
).toDF("value")
val df2 = df
.withColumn("exp", exp(col("value")))
.withColumn("expm1", expm1(col("value")))
df2.show(false)
// +-------+------------------+-----------------+
// |value |exp |expm1 |
// +-------+------------------+-----------------+
// |0.0 |1.0 |0.0 |
// |1.0 |2.7182818284590455|1.718281828459045|
// |-1000.0|0.0 |-1.0 |
// |1000.0 |Infinity |Infinity |
// |null |null |null |
// +-------+------------------+-----------------+
Unlike log, neither function produces null for in-range inputs — every finite Double has a defined exponential. The only way to get null out is to put null in. The Infinity result for 1000.0 is real positive infinity; downstream arithmetic on it will follow IEEE 754 rules (e.g. Infinity + 1 = Infinity, Infinity - Infinity = NaN).
Related Functions
For the inverse direction — natural and arbitrary-base logarithms — see log, log2, log10, log1p, and ln. For raising values to other powers, see pow, sqrt, and cbrt.