The Reference You Need
Spark Scala Examples
Simple spark scala examples to help you quickly complete your data etl pipelines. Save time digging through the spark scala function api and instead get right to the code you need...
Page 1 of 8
-
degrees and radians in Spark Scala: Convert Between Angle Units on DataFrame Columns
The degrees and radians functions convert DataFrame columns between the two ways of measuring angles. radians turns degrees into radians; degrees does the reverse. They're the unit-conversion helpers you reach for whenever your data is in degrees but you need to feed it into Spark's trig functions, which all expect radians.
-
Trigonometric Functions in Spark Scala: sin, cos, tan and More on DataFrame Columns
Spark Scala exposes the full set of trigonometric functions from java.lang.Math as DataFrame column functions: the basics (sin, cos, tan), their inverses (asin, acos, atan, atan2), the reciprocals (cot, csc, sec), and the hyperbolic versions of all of them. Every input and output is in radians, not degrees — use the radians function to convert if your data is in degrees.
-
exp and expm1 in Spark Scala: Exponential Functions on DataFrame Columns
Spark provides two exponential functions: exp computes e^x and expm1 computes e^x - 1. They're the inverses of log and log1p respectively, and you'll reach for them whenever you need to undo a log transform, compute compound growth, or work with continuous decay.
-
log, log2, log10, log1p, and ln in Spark Scala: Logarithms in a DataFrame
Spark provides a family of logarithm functions: log for natural log or an arbitrary base, log2 and log10 for the two most common bases, log1p for accurate results near zero, and ln (SQL-only) as an alias for the natural log. They all return Double and treat non-positive inputs as null rather than raising errors.
-
sqrt, cbrt, and pow in Spark Scala: Square Roots, Cube Roots, and Powers in a DataFrame
The sqrt, cbrt, and pow functions compute square roots, cube roots, and arbitrary powers of numeric columns. They return Double regardless of input type and behave like Java's Math.sqrt, Math.cbrt, and Math.pow — including how they handle negative inputs and special values like NaN.
-
ceil, floor, and rint in Spark Scala: Rounding to Integers in a DataFrame
The ceil, floor, and rint functions round a numeric column to an integer. ceil rounds up toward positive infinity, floor rounds down toward negative infinity, and rint rounds to the nearest integer using banker's rounding for exact halves. ceil and floor also accept a scale argument to round to a specific number of decimal places.
-
round and bround in Spark Scala: Rounding Numeric Columns in a DataFrame
The round and bround functions round numeric columns to a given number of decimal places. They differ in how they handle exact halves: round rounds half away from zero (the most common convention), while bround uses banker's rounding, which rounds half to the nearest even number to reduce bias in large aggregations.
-
abs in Spark Scala: Compute the Absolute Value of a Numeric Column in a DataFrame
The abs function returns the absolute value of a numeric column — the value with its sign stripped. It works on any numeric type (integers, longs, doubles, decimals) and is most often used when you care about the magnitude of a number but not its direction.
-
Time Windows in Spark Scala: window, session_window, and window_time for DataFrame Aggregations
When you need to aggregate event-time data — page views per 5 minutes, average price every minute, user activity per session — Spark provides window, session_window, and window_time. These functions bucket timestamps into intervals so you can group and aggregate them with the regular DataFrame API. They work on both batch and streaming DataFrames; the examples below all use batch.
-
make_interval, make_dt_interval, and make_ym_interval in Spark Scala: Build Intervals in a DataFrame
The make_interval family builds interval values from numeric columns. Use make_interval for general-purpose intervals that mix years, months, days, and time-of-day components; use make_ym_interval when you need a strict year-month interval (e.g., subscription terms); and use make_dt_interval when you need a strict day-time interval (e.g., job durations). All three are Spark SQL functions called through expr().