Job Board
Consulting

Spark Scala find_in_set

find_in_set returns the 1-based position of a string within a comma-delimited list stored in another column. It returns 0 if the string isn't found and null if either input is null.

find_in_set

find_in_set(str, strArray) - Returns the index (1-based) of str in the comma-delimited strArray

find_in_set is a SQL function — use it via expr() in Spark Scala. It takes two arguments: the string to search for and a column containing comma-separated values. It returns an integer: the 1-based position if found, or 0 if not.

Here's an example that checks which position each role occupies in a comma-delimited roles column:

val df = Seq(
  ("Alice",   "admin,editor,viewer"),
  ("Bob",     "editor,viewer"),
  ("Carol",   "admin,viewer"),
  ("Dave",    "viewer"),
  ("Eve",     "admin,editor"),
).toDF("name", "roles")

val df2 = df
  .withColumn("admin_pos",  expr("find_in_set('admin', roles)"))
  .withColumn("editor_pos", expr("find_in_set('editor', roles)"))
  .withColumn("viewer_pos", expr("find_in_set('viewer', roles)"))

df2.show(false)
// +-----+-------------------+---------+----------+----------+
// |name |roles              |admin_pos|editor_pos|viewer_pos|
// +-----+-------------------+---------+----------+----------+
// |Alice|admin,editor,viewer|1        |2         |3         |
// |Bob  |editor,viewer      |0        |1         |2         |
// |Carol|admin,viewer       |1        |0         |2         |
// |Dave |viewer             |0        |0         |1         |
// |Eve  |admin,editor       |1        |2         |0         |
// +-----+-------------------+---------+----------+----------+

Alice has all three roles — admin at position 1, editor at 2, viewer at 3. Bob has no admin role, so admin_pos returns 0. The return value is a position, not a boolean, but you can use it as one: any non-zero value means the string was found.

To filter rows where a specific value exists in the list, check for a non-zero result:

scala df.filter(expr("find_in_set('admin', roles)") > 0)

Edge cases: commas, empty strings, and nulls

find_in_set has specific behavior worth knowing about:

  • If the search string contains a comma, the result is always 0 — it's treated as not found because commas are the delimiter.
  • An empty string can match an empty element in the list (two consecutive commas create an empty element).
  • Null in either argument returns null.
val df = Seq(
  ("admin",        "admin,editor,viewer"),
  ("manager",      "admin,editor,viewer"),
  ("admin,editor", "admin,editor,viewer"),
  ("",             "admin,,viewer"),
  (null,           "admin,editor,viewer"),
).toDF("search", "roles")

val df2 = df
  .withColumn("result", expr("find_in_set(search, roles)"))

df2.show(false)
// +------------+-------------------+------+
// |search      |roles              |result|
// +------------+-------------------+------+
// |admin       |admin,editor,viewer|1     |
// |manager     |admin,editor,viewer|0     |
// |admin,editor|admin,editor,viewer|0     |
// |            |admin,,viewer      |2     |
// |null        |admin,editor,viewer|null  |
// +------------+-------------------+------+

"admin,editor" returns 0 because the search string itself contains a comma. The empty string "" matches the empty element between the two consecutive commas in "admin,,viewer" at position 2. The null row returns null as expected.

For checking whether a value exists in a fixed list of literals (not a comma-delimited string column), see isin. For splitting a comma-delimited string into an array column, see split. For finding the position of a substring within a string (rather than a value in a delimited list), see instr and locate.

Example Details

Created: 2026-04-10 10:05:25 PM

Last Updated: 2026-04-10 10:05:25 PM