Spark Scala Trigonometric Functions

Spark Scala exposes the full set of trigonometric functions from java.lang.Math as DataFrame column functions: the basics (sin, cos, tan), their inverses (asin, acos, atan, atan2), the reciprocals (cot, csc, sec), and the hyperbolic versions of all of them. Every input and output is in radians, not degrees — use the radians function to convert if your data is in degrees.

sin, cos, and tan

The three basic trig functions take an angle in radians and return its sine, cosine, or tangent:

def sin(e: Column): Column

def cos(e: Column): Column

def tan(e: Column): Column

Each has a String overload that accepts a column name instead of a Column. Output is a Double ratio with no fixed range for tan, and [-1, 1] for sin and cos.

val df = Seq(
  0.0,
  math.Pi / 6,
  math.Pi / 4,
  math.Pi / 3,
  math.Pi / 2,
).toDF("radians")

val df2 = df
  .withColumn("sin", sin(col("radians")))
  .withColumn("cos", cos(col("radians")))
  .withColumn("tan", tan(col("radians")))

df2.show(false)
// +------------------+-------------------+---------------------+--------------------+
// |radians           |sin                |cos                  |tan                 |
// +------------------+-------------------+---------------------+--------------------+
// |0.0               |0.0                |1.0                  |0.0                 |
// |0.5235987755982988|0.49999999999999994|0.8660254037844387   |0.5773502691896257  |
// |0.7853981633974483|0.7071067811865475 |0.7071067811865476   |0.9999999999999999  |
// |1.0471975511965976|0.8660254037844386 |0.5000000000000001   |1.7320508075688767  |
// |1.5707963267948966|1.0                |6.123233995736766E-17|1.633123935319537E16|
// +------------------+-------------------+---------------------+--------------------+

These are the standard angles 0, π/6, π/4, π/3, π/2 — that is, 0°, 30°, 45°, 60°, 90°. The expected exact values are 0, 1/2, √2/2, √3/2, 1 for sine, mirrored for cosine. The visible drift (0.49999999999999994 instead of 0.5, or 6.12E-17 instead of 0 for cos(π/2)) is normal IEEE 754 rounding — π/2 itself isn't representable exactly in floating point, so the trig functions can't produce exact results at it either.

The big number in the last row of tan — 1.633…E16 — is tan(π/2), which is mathematically infinite. The function doesn't return Infinity because the input π/2 is slightly off from the true π/2, so the result is just a very large finite number. If you see implausibly huge tangents in production data, suspect inputs near π/2 + nπ.

asin, acos, and atan

The inverse functions go the other direction — given a ratio, return the angle in radians:

def asin(e: Column): Column

def acos(e: Column): Column

def atan(e: Column): Column

asin and acos accept inputs in [-1, 1] and return angles in [-π/2, π/2] and [0, π] respectively. Inputs outside [-1, 1] produce NaN. atan accepts any real number and returns an angle in (-π/2, π/2).

val df = Seq(
  -1.0,
  -0.5,
  0.0,
  0.5,
  1.0,
).toDF("value")

val df2 = df
  .withColumn("asin", asin(col("value")))
  .withColumn("acos", acos(col("value")))
  .withColumn("atan", atan(col("value")))

df2.show(false)
// +-----+-------------------+------------------+-------------------+
// |value|asin               |acos              |atan               |
// +-----+-------------------+------------------+-------------------+
// |-1.0 |-1.5707963267948966|3.141592653589793 |-0.7853981633974483|
// |-0.5 |-0.5235987755982989|2.0943951023931957|-0.4636476090008061|
// |0.0  |0.0                |1.5707963267948966|0.0                |
// |0.5  |0.5235987755982989 |1.0471975511965979|0.4636476090008061 |
// |1.0  |1.5707963267948966 |0.0               |0.7853981633974483 |
// +-----+-------------------+------------------+-------------------+

The acos(0) value 1.5707963267948966 is π/2, and acos(-1) is π. asin(±1) returns ±π/2, the edges of its range.

atan2

atan2 takes two arguments — a y and an x — and returns the angle to the point (x, y) from the positive x-axis. Unlike plain atan(y / x), it preserves quadrant information by looking at the signs of both arguments separately:

def atan2(y: Column, x: Column): Column

The Scala API has eight overloads in total, with String/Column/Double variants for each argument so you can mix column references with constants. The return value is in (-π, π].

val df = Seq(
  (1.0, 1.0),
  (1.0, -1.0),
  (-1.0, -1.0),
  (-1.0, 1.0),
  (1.0, 0.0),
).toDF("y", "x")

val df2 = df
  .withColumn("atan2", atan2(col("y"), col("x")))
  .withColumn("atan_y_over_x", atan(col("y") / col("x")))

df2.show(false)
// +----+----+-------------------+-------------------+
// |y   |x   |atan2              |atan_y_over_x      |
// +----+----+-------------------+-------------------+
// |1.0 |1.0 |0.7853981633974483 |0.7853981633974483 |
// |1.0 |-1.0|2.356194490192345  |-0.7853981633974483|
// |-1.0|-1.0|-2.356194490192345 |0.7853981633974483 |
// |-1.0|1.0 |-0.7853981633974483|-0.7853981633974483|
// |1.0 |0.0 |1.5707963267948966 |null               |
// +----+----+-------------------+-------------------+

Look at the second row: (y=1, x=-1) is in the second quadrant, so the correct angle is 3π/4 ≈ 2.356. atan2 returns it. atan(y/x) computes atan(-1) and returns -π/4 ≈ -0.785 — losing the sign information entirely. Whenever you're computing angles from coordinate pairs (bearings, vectors, complex number arguments), reach for atan2, not atan.

The last row shows the other practical advantage: atan2(1, 0) correctly returns π/2 for the point straight up, while atan(1 / 0) divides by zero and propagates a null.

cot, csc, and sec

The three reciprocal trig functions are the inverses of tan, sin, and cos:

def cot(e: Column): Column

def csc(e: Column): Column

def sec(e: Column): Column

The cot, csc, and sec functions first appeared in version 3.3.0. Each takes a Column (no String overload).

val df = Seq(
  math.Pi / 6,
  math.Pi / 4,
  math.Pi / 3,
).toDF("radians")

val df2 = df
  .withColumn("cot", cot(col("radians")))
  .withColumn("csc", csc(col("radians")))
  .withColumn("sec", sec(col("radians")))

df2.show(false)
// +------------------+------------------+------------------+------------------+
// |radians           |cot               |csc               |sec               |
// +------------------+------------------+------------------+------------------+
// |0.5235987755982988|1.7320508075688774|2.0000000000000004|1.1547005383792515|
// |0.7853981633974483|1.0000000000000002|1.4142135623730951|1.414213562373095 |
// |1.0471975511965976|0.577350269189626 |1.1547005383792517|1.9999999999999996|
// +------------------+------------------+------------------+------------------+

At π/6 (30°), csc is 1/sin(π/6) = 1/0.5 = 2, and cot is 1/tan(π/6) = √3 ≈ 1.732. These exist in the API so you don't have to write lit(1.0) / sin(...) and get the right behavior at the singularities. They're convenient — not magical. If you'd rather express the math in terms of the basic functions you already use, that works equally well.

sinh, cosh, and tanh

The hyperbolic versions use the same names with an h suffix. They take real numbers (not angles) and return values based on e^x and e^-x:

def sinh(e: Column): Column

def cosh(e: Column): Column

def tanh(e: Column): Column

Each has a String overload. sinh and cosh grow exponentially. tanh is bounded in (-1, 1) — it's a popular activation function in machine learning because of that S-curve shape.

val df = Seq(
  -2.0,
  -1.0,
  0.0,
  1.0,
  2.0,
).toDF("value")

val df2 = df
  .withColumn("sinh", sinh(col("value")))
  .withColumn("cosh", cosh(col("value")))
  .withColumn("tanh", tanh(col("value")))

df2.show(false)
// +-----+-------------------+------------------+-------------------+
// |value|sinh               |cosh              |tanh               |
// +-----+-------------------+------------------+-------------------+
// |-2.0 |-3.626860407847019 |3.7621956910836314|-0.9640275800758169|
// |-1.0 |-1.1752011936438014|1.543080634815244 |-0.7615941559557649|
// |0.0  |0.0                |1.0               |0.0                |
// |1.0  |1.1752011936438014 |1.543080634815244 |0.7615941559557649 |
// |2.0  |3.626860407847019  |3.7621956910836314|0.9640275800758169 |
// +-----+-------------------+------------------+-------------------+

Notice that sinh is odd (negates with the sign of the input) and cosh is even (symmetric around zero) — cosh(-2) and cosh(2) are identical. tanh saturates fast: by ±2 it's already past ±0.96, and by ±5 it's indistinguishable from ±1 in Double precision.

asinh, acosh, and atanh

The inverse hyperbolic functions reverse sinh, cosh, and tanh:

def asinh(e: Column): Column

def acosh(e: Column): Column

def atanh(e: Column): Column

The asinh, acosh, and atanh functions first appeared in version 3.1.0. Each has a String overload.

Domain restrictions matter here:

asinh accepts any real number.
acosh requires inputs ≥ 1 — anything smaller returns NaN.
atanh requires inputs in (-1, 1) — exactly ±1 returns ±Infinity, and outside that range returns NaN.

val df = Seq(
  0.0,
  0.5,
  1.0,
  2.0,
).toDF("value")

val df2 = df
  .withColumn("asinh", asinh(col("value")))
  .withColumn("acosh", acosh(col("value")))
  .withColumn("atanh", atanh(col("value")))

df2.show(false)
// +-----+-------------------+------------------+------------------+
// |value|asinh              |acosh             |atanh             |
// +-----+-------------------+------------------+------------------+
// |0.0  |0.0                |NaN               |0.0               |
// |0.5  |0.48121182505960347|NaN               |0.5493061443340548|
// |1.0  |0.8813735870195429 |0.0               |Infinity          |
// |2.0  |1.4436354751788103 |1.3169578969248166|NaN               |
// +-----+-------------------+------------------+------------------+

The NaN cells aren't errors — they're the IEEE 754 way of saying "the input is outside the function's domain." If you're not sure your inputs are in range, filter or clip them first rather than catching NaN downstream. Similarly, the Infinity for atanh(1) will propagate through subsequent arithmetic; use isnan and === against Double.PositiveInfinity if you need to detect either case.

For natural and arbitrary-base logarithms — the building blocks of the hyperbolic functions — see log, log2, log10, log1p, and ln. For the exponential function e^x, see exp and expm1.