Spark Scala like, ilike, and rlike
like, ilike, and rlike are Column methods that match a string column against a pattern — like uses SQL LIKE wildcards, ilike does the same case-insensitively, and rlike matches against a full regular expression. All three return a boolean column.
These are methods on Column, not standalone functions from org.apache.spark.sql.functions. You call them directly on a column expression:
scala
col("name").like("%pattern%")
col("name").ilike("%pattern%")
col("name").rlike("regex.*pattern")
like
def like(literal: String): Column
like returns true if the string column matches the given SQL LIKE pattern. Two wildcards are available:
%— matches any sequence of zero or more characters_— matches exactly one character
The match is case-sensitive. Here's an example using email addresses and short names to show common LIKE patterns:
val df = Seq(
("alice@example.com", "alice"),
("bob@company.org", "bob"),
("carol@example.net", "carol"),
("dave@mail.example.com", "dave"),
("eve@other.io", "eve"),
).toDF("email", "name")
val df2 = df
.withColumn("ends_example_com", col("email").like("%@example.com"))
.withColumn("starts_with_b", col("email").like("b%"))
.withColumn("second_char_o", col("name").like("_o%"))
df2.show(false)
// +---------------------+-----+----------------+-------------+-------------+
// |email |name |ends_example_com|starts_with_b|second_char_o|
// +---------------------+-----+----------------+-------------+-------------+
// |alice@example.com |alice|true |false |false |
// |bob@company.org |bob |false |true |true |
// |carol@example.net |carol|false |false |false |
// |dave@mail.example.com|dave |false |false |false |
// |eve@other.io |eve |false |false |false |
// +---------------------+-----+----------------+-------------+-------------+
ends_example_com uses %@example.com to match any email ending in that domain — note that dave@mail.example.com doesn't match because the suffix must be exactly @example.com. second_char_o uses _o% to match names where the second character is o, which is why bob matches but carol doesn't.
ilike
The ilike function first appeared in version 3.3.0 and is defined as:
def ilike(literal: String): Column
ilike behaves exactly like like but ignores case when matching. Here's an example that highlights the difference — the same pattern applied with like and ilike to city names stored with inconsistent casing:
val df = Seq(
"New York",
"new york",
"NEW YORK",
"Los Angeles",
"los angeles",
"Chicago",
).toDF("city")
val df2 = df
.withColumn("like_match", col("city").like("new york%"))
.withColumn("ilike_match", col("city").ilike("new york%"))
df2.show(false)
// +-----------+----------+-----------+
// |city |like_match|ilike_match|
// +-----------+----------+-----------+
// |New York |false |true |
// |new york |true |true |
// |NEW YORK |false |true |
// |Los Angeles|false |false |
// |los angeles|false |false |
// |Chicago |false |false |
// +-----------+----------+-----------+
like only matches "new york" (exact lowercase). ilike matches all three capitalizations. Use ilike whenever the data might have inconsistent casing and you don't want to call lower first.
rlike
def rlike(literal: String): Column
rlike returns true if the string column matches a Java regular expression. The full Java regex syntax is available — quantifiers, character classes, anchors, alternation. Here's an example that validates phone numbers against a pattern that accepts common US formats:
val df = Seq(
("Alice", "555-123-4567"),
("Bob", "555.987.6543"),
("Carol", "(555) 246-8101"),
("Dave", "not-a-number"),
("Eve", "5551234"),
).toDF("name", "phone")
val df2 = df
.withColumn("is_phone", col("phone").rlike("""^(?d{3})?[-. ]d{3}[-. ]d{4}$"""))
df2.show(false)
// +-----+--------------+--------+
// |name |phone |is_phone|
// +-----+--------------+--------+
// |Alice|555-123-4567 |true |
// |Bob |555.987.6543 |true |
// |Carol|(555) 246-8101|true |
// |Dave |not-a-number |false |
// |Eve |5551234 |false |
// +-----+--------------+--------+
The pattern ^\(?\d{3}\)?[-.\s]\d{3}[-.\s]\d{4}$ matches an optional opening paren, three digits, an optional closing paren, a separator (dash, dot, or space), three more digits, another separator, and four final digits. Alice, Bob, and Carol all match. Dave and Eve don't have the right structure.
Use triple-quoted strings ("""...""") for regex patterns to avoid escaping backslashes — \d is cleaner than "\\d".
Null handling
All three methods return null when the input column is null. If you need null to count as false, wrap the result with coalesce: coalesce(col("email").like("%@example.com"), lit(false)).
Related functions
For checking whether a string contains a literal substring (no wildcards), see contains, startsWith, and endsWith. For extracting or replacing substrings matched by a regex, see regexp_replace.