Spark Scala contains, startsWith, and endsWith
contains, startsWith, and endsWith are Column methods that check whether a string column contains a substring, begins with a prefix, or ends with a suffix. They each return a boolean column — true, false, or null for null input.
These are methods on Column, not standalone functions from org.apache.spark.sql.functions. You call them directly on a column expression:
scala
col("name").contains("value")
col("name").startsWith("prefix")
col("name").endsWith("suffix")
contains
def contains(other: Any): Column
contains returns true if the string column includes the given substring anywhere within it, false otherwise. Here's an example that tags product descriptions by the materials or properties they mention:
val df = Seq(
("T-shirt", "100% organic cotton, machine washable"),
("Running Shoes", "lightweight mesh upper, foam midsole"),
("Backpack", "waterproof nylon with padded laptop sleeve"),
("Coffee Mug", "ceramic, dishwasher safe, 12oz"),
("Yoga Mat", "non-slip surface, eco-friendly foam"),
).toDF("product", "description")
val df2 = df
.withColumn("has_foam", col("description").contains("foam"))
.withColumn("has_waterproof", col("description").contains("waterproof"))
.withColumn("has_machine_wash", col("description").contains("machine washable"))
df2.show(false)
// +-------------+------------------------------------------+--------+--------------+----------------+
// |product |description |has_foam|has_waterproof|has_machine_wash|
// +-------------+------------------------------------------+--------+--------------+----------------+
// |T-shirt |100% organic cotton, machine washable |false |false |true |
// |Running Shoes|lightweight mesh upper, foam midsole |true |false |false |
// |Backpack |waterproof nylon with padded laptop sleeve|false |true |false |
// |Coffee Mug |ceramic, dishwasher safe, 12oz |false |false |false |
// |Yoga Mat |non-slip surface, eco-friendly foam |true |false |false |
// +-------------+------------------------------------------+--------+--------------+----------------+
The search is case-sensitive. contains("foam") matches "foam midsole" and "eco-friendly foam" but would not match "Foam" or "FOAM". Pair with lower if you need case-insensitive matching.
startsWith
def startsWith(literal: String): Column
def startsWith(other: Column): Column
startsWith returns true if the string column begins with the given prefix. There are two signatures — one taking a string literal and one taking another Column (useful when the prefix itself comes from a column in the same DataFrame).
Here's an example using URL scheme detection:
val df = Seq(
"https://sparkingscala.com/examples/",
"https://sparkingscala.com/tutorials/",
"http://old.example.com/page",
"ftp://files.example.com/data.csv",
"https://sparkingscala.com/",
).toDF("url")
val df2 = df
.withColumn("is_https", col("url").startsWith("https://"))
.withColumn("is_example", col("url").endsWith("/examples/"))
df2.show(false)
// +------------------------------------+--------+----------+
// |url |is_https|is_example|
// +------------------------------------+--------+----------+
// |https://sparkingscala.com/examples/ |true |true |
// |https://sparkingscala.com/tutorials/|true |false |
// |http://old.example.com/page |false |false |
// |ftp://files.example.com/data.csv |false |false |
// |https://sparkingscala.com/ |true |false |
// +------------------------------------+--------+----------+
endsWith
def endsWith(literal: String): Column
def endsWith(other: Column): Column
endsWith mirrors startsWith — it returns true if the string column ends with the given suffix. The example above already shows both together. Like startsWith, it has a string-literal and a Column overload.
Null handling
All three methods return null when the input column is null:
val df = Seq(
Some("support@sparkingscala.com"),
Some("hello@example.org"),
Some("not-an-email"),
None,
).toDF("email")
val df2 = df
.withColumn("has_at", col("email").contains("@"))
.withColumn("starts_with_support", col("email").startsWith("support"))
.withColumn("ends_with_com", col("email").endsWith(".com"))
df2.show(false)
// +-------------------------+------+-------------------+-------------+
// |email |has_at|starts_with_support|ends_with_com|
// +-------------------------+------+-------------------+-------------+
// |support@sparkingscala.com|true |true |true |
// |hello@example.org |true |false |false |
// |not-an-email |false |false |false |
// |null |null |null |null |
// +-------------------------+------+-------------------+-------------+
If null rows should be treated as false rather than null, wrap the result with coalesce: coalesce(col("email").contains("@"), lit(false)).
Related functions
For finding the position of a substring rather than a boolean check, see instr and locate. For pattern-based matching with regular expressions, see regexp_replace. For splitting a string on a delimiter, see split.