Job Board
Consulting

Spark Scala Hypot

The hypot function computes sqrt(a² + b²) for two numeric inputs without the intermediate overflow or underflow that a naive implementation would produce. It's the standard tool for distances between points, vector magnitudes, and anywhere the Pythagorean theorem applies.

def hypot(l: Column, r: Column): Column

The function returns a Double and accepts a wide range of argument shapes — two columns, a column and a literal Double, or a column referenced by name. The output is always a Double regardless of input type.

val df = Seq(
  (3.0, 4.0),
  (5.0, 12.0),
  (8.0, 15.0),
  (1.0, 1.0),
  (0.0, 7.0),
).toDF("a", "b")

val df2 = df
  .withColumn("hypotenuse", hypot(col("a"), col("b")))

df2.show(false)
// +---+----+------------------+
// |a  |b   |hypotenuse        |
// +---+----+------------------+
// |3.0|4.0 |5.0               |
// |5.0|12.0|13.0              |
// |8.0|15.0|17.0              |
// |1.0|1.0 |1.4142135623730951|
// |0.0|7.0 |7.0               |
// +---+----+------------------+

The first three rows are well-known Pythagorean triples — (3, 4, 5), (5, 12, 13), and (8, 15, 17) — so the hypotenuse comes out as a clean integer. When one input is 0, the result is just the absolute value of the other input.

Distance from Origin

A common application is computing the Euclidean distance between a point and the origin in a 2D plane. Since hypot squares both inputs internally, it handles negative coordinates correctly without any extra handling:

val df = Seq(
  ("origin", 0.0, 0.0),
  ("point_a", 3.0, 4.0),
  ("point_b", -6.0, 8.0),
  ("point_c", 1.5, 2.5),
  ("point_d", -10.0, -10.0),
).toDF("label", "x", "y")

val df2 = df
  .withColumn("distance_from_origin", hypot(col("x"), col("y")))

df2.show(false)
// +-------+-----+-----+--------------------+
// |label  |x    |y    |distance_from_origin|
// +-------+-----+-----+--------------------+
// |origin |0.0  |0.0  |0.0                 |
// |point_a|3.0  |4.0  |5.0                 |
// |point_b|-6.0 |8.0  |10.0                |
// |point_c|1.5  |2.5  |2.9154759474226504  |
// |point_d|-10.0|-10.0|14.142135623730951  |
// +-------+-----+-----+--------------------+

For the distance between two arbitrary points (x1, y1) and (x2, y2), pass the differences: hypot(col("x2") - col("x1"), col("y2") - col("y1")).

Why Not Just sqrt(a*a + b*b)?

hypot exists because it's numerically more stable than the obvious formula. When a or b is very large, squaring it can overflow a Double even when the final result would fit. When both are very small, squaring can underflow to zero. hypot rescales internally to avoid both problems. For everyday inputs the two approaches give the same answer, but for extreme values prefer hypot over writing the formula yourself with sqrt and pow.

Null Handling

If either input is null, the result is null:

val df = Seq(
  (Some(3.0), Some(4.0)),
  (Some(5.0), None),
  (None, Some(9.0)),
  (None, None),
).toDF("a", "b")

val df2 = df
  .withColumn("hypotenuse", hypot(col("a"), col("b")))

df2.show(false)
// +----+----+----------+
// |a   |b   |hypotenuse|
// +----+----+----------+
// |3.0 |4.0 |5.0       |
// |5.0 |null|null      |
// |null|9.0 |null      |
// |null|null|null      |
// +----+----+----------+

If you want nulls to be treated as zero (so a missing coordinate contributes nothing to the distance), wrap the inputs in coalesce(col("a"), lit(0.0)) before passing them to hypot.

Example Details

Created: 2026-05-31 10:53:11 PM

Last Updated: 2026-05-31 10:53:11 PM