Spark Scala Elt
elt returns the n-th value from a list of column expressions, where n is a 1-based index. It's handy when you have an integer column that points at one of several sibling columns (or literal values) and you want to materialize the chosen value into a single result column.
elt is a Spark SQL function. It isn't available directly in the org.apache.spark.sql.functions object, so you call it through expr():
def elt(n, input1, input2, ...): Column — via expr()
The first argument is the 1-based index. The remaining arguments are the candidate values — they can be column references, literals, or any mix of the two. If n is out of range or null, elt returns null (assuming spark.sql.ansi.enabled is false, which is the default — when ANSI mode is on, an out-of-range index throws ArrayIndexOutOfBoundsException instead).
Here's a basic example using elt to pick a region label from three sibling columns based on an index:
val df = Seq(
(1, "North America", "Europe", "Asia"),
(2, "North America", "Europe", "Asia"),
(3, "North America", "Europe", "Asia"),
(1, "North America", "Europe", "Asia"),
).toDF("region_idx", "region_1", "region_2", "region_3")
val df2 = df
.withColumn("region", expr("elt(region_idx, region_1, region_2, region_3)"))
df2.show(false)
// +----------+-------------+--------+--------+-------------+
// |region_idx|region_1 |region_2|region_3|region |
// +----------+-------------+--------+--------+-------------+
// |1 |North America|Europe |Asia |North America|
// |2 |North America|Europe |Asia |Europe |
// |3 |North America|Europe |Asia |Asia |
// |1 |North America|Europe |Asia |North America|
// +----------+-------------+--------+--------+-------------+
Mapping integer codes to string labels
Because the candidate values can be string literals, elt is also useful as a compact lookup for small, fixed enums — for example, mapping a numeric tier code to a tier name:
val df = Seq(
("Alice", 1),
("Bob", 2),
("Carol", 3),
("Dave", 4),
("Eve", 0),
).toDF("name", "tier")
val df2 = df
.withColumn("tier_label", expr("elt(tier, 'Bronze', 'Silver', 'Gold')"))
df2.show(false)
// +-----+----+----------+
// |name |tier|tier_label|
// +-----+----+----------+
// |Alice|1 |Bronze |
// |Bob |2 |Silver |
// |Carol|3 |Gold |
// |Dave |4 |null |
// |Eve |0 |null |
// +-----+----+----------+
Note that Dave's tier of 4 and Eve's tier of 0 both fall outside the 1-based range of the three candidates, so elt returns null for those rows. For larger or more dynamic lookups, a join against a lookup DataFrame is usually a better fit — elt shines when the candidate list is short and known at query time.
Null and out-of-range index handling
When the index is null or points past the end of the candidate list, elt returns null:
val df = Seq(
("order-1", Some(1)),
("order-2", Some(2)),
("order-3", None),
("order-4", Some(5)),
).toDF("order_id", "status_idx")
val df2 = df
.withColumn("status", expr("elt(status_idx, 'pending', 'shipped', 'delivered')"))
df2.show(false)
// +--------+----------+-------+
// |order_id|status_idx|status |
// +--------+----------+-------+
// |order-1 |1 |pending|
// |order-2 |2 |shipped|
// |order-3 |null |null |
// |order-4 |5 |null |
// +--------+----------+-------+
If you want a fallback value instead of null for out-of-range indices, wrap the result in coalesce with a default literal.
Related functions
For looking up a value by position from an array column rather than a list of separate columns, see element_at. For conditional column selection where each row picks a value based on a predicate rather than an index, when / otherwise is the more natural fit.