Spark Scala Factorial: factorial
The factorial function returns the factorial of an integer column — the product of all positive integers up to and including the input value (n! = n × (n-1) × ... × 2 × 1). It's useful anywhere you need to compute permutations, combinations, or other counting expressions inline in a DataFrame.
def factorial(e: Column): Column
The function takes a single integer column and returns a LongType column. By convention 0! = 1 and 1! = 1.
val df = Seq(
0,
1,
3,
5,
7,
10,
).toDF("n")
val df2 = df
.withColumn("n_factorial", factorial(col("n")))
df2.show(false)
// +---+-----------+
// |n |n_factorial|
// +---+-----------+
// |0 |1 |
// |1 |1 |
// |3 |6 |
// |5 |120 |
// |7 |5040 |
// |10 |3628800 |
// +---+-----------+
Factorials grow very quickly — 10! is already over three million, and 20! is roughly 2.4 × 10^18, which is near the upper limit of a 64-bit signed integer.
Input Range, Negatives, and Nulls
Because the result must fit in a LongType, factorial only accepts inputs in the range [0, 20]. Anything outside that range — values greater than 20 or any negative number — returns null instead of overflowing or throwing. Null inputs also produce null outputs:
val df = Seq(
Some(19),
Some(20),
Some(21),
Some(-3),
None,
).toDF("n")
val df2 = df
.withColumn("n_factorial", factorial(col("n")))
df2.show(false)
// +----+-------------------+
// |n |n_factorial |
// +----+-------------------+
// |19 |121645100408832000 |
// |20 |2432902008176640000|
// |21 |null |
// |-3 |null |
// |null|null |
// +----+-------------------+
20! is the largest factorial that fits in a Long. If you need larger values, you'll have to compute them outside of factorial — for example by working in log-space with log and exp, or by switching to a different representation entirely.
Computing Combinations
A common use of factorial is computing the number of ways to choose k items from n — the binomial coefficient C(n, k) = n! / (k! × (n-k)!). Each piece of the formula is a factorial, so you can express it directly with column arithmetic:
val df = Seq(
("Pick 2 from 5", 5, 2),
("Pick 3 from 6", 6, 3),
("Pick 4 from 10", 10, 4),
("Pick 5 from 8", 8, 5),
).toDF("scenario", "n", "k")
val df2 = df
.withColumn(
"combinations",
factorial(col("n")) / (factorial(col("k")) * factorial(col("n") - col("k"))),
)
df2.show(false)
// +--------------+---+---+------------+
// |scenario |n |k |combinations|
// +--------------+---+---+------------+
// |Pick 2 from 5 |5 |2 |10.0 |
// |Pick 3 from 6 |6 |3 |20.0 |
// |Pick 4 from 10|10 |4 |210.0 |
// |Pick 5 from 8 |8 |5 |56.0 |
// +--------------+---+---+------------+
The division promotes the result to Double, which is why the values are displayed as 10.0, 20.0, etc. Wrap the expression in a cast to LongType if you want integer output. Keep in mind that this formula is constrained by the same n ≤ 20 limit — for larger inputs, you'll need to cancel terms in the formula by hand before reaching for factorial.