Job Board
Consulting

Spark Scala contains, startsWith, and endsWith

contains, startsWith, and endsWith are Column methods that check whether a string column contains a substring, begins with a prefix, or ends with a suffix. They each return a boolean column — true, false, or null for null input.

These are methods on Column, not standalone functions from org.apache.spark.sql.functions. You call them directly on a column expression:

scala col("name").contains("value") col("name").startsWith("prefix") col("name").endsWith("suffix")

contains

def contains(other: Any): Column

contains returns true if the string column includes the given substring anywhere within it, false otherwise. Here's an example that tags product descriptions by the materials or properties they mention:

val df = Seq(
  ("T-shirt",       "100% organic cotton, machine washable"),
  ("Running Shoes", "lightweight mesh upper, foam midsole"),
  ("Backpack",      "waterproof nylon with padded laptop sleeve"),
  ("Coffee Mug",    "ceramic, dishwasher safe, 12oz"),
  ("Yoga Mat",      "non-slip surface, eco-friendly foam"),
).toDF("product", "description")

val df2 = df
  .withColumn("has_foam",         col("description").contains("foam"))
  .withColumn("has_waterproof",   col("description").contains("waterproof"))
  .withColumn("has_machine_wash", col("description").contains("machine washable"))

df2.show(false)
// +-------------+------------------------------------------+--------+--------------+----------------+
// |product      |description                               |has_foam|has_waterproof|has_machine_wash|
// +-------------+------------------------------------------+--------+--------------+----------------+
// |T-shirt      |100% organic cotton, machine washable     |false   |false         |true            |
// |Running Shoes|lightweight mesh upper, foam midsole      |true    |false         |false           |
// |Backpack     |waterproof nylon with padded laptop sleeve|false   |true          |false           |
// |Coffee Mug   |ceramic, dishwasher safe, 12oz            |false   |false         |false           |
// |Yoga Mat     |non-slip surface, eco-friendly foam       |true    |false         |false           |
// +-------------+------------------------------------------+--------+--------------+----------------+

The search is case-sensitive. contains("foam") matches "foam midsole" and "eco-friendly foam" but would not match "Foam" or "FOAM". Pair with lower if you need case-insensitive matching.

startsWith

def startsWith(literal: String): Column

def startsWith(other: Column): Column

startsWith returns true if the string column begins with the given prefix. There are two signatures — one taking a string literal and one taking another Column (useful when the prefix itself comes from a column in the same DataFrame).

Here's an example using URL scheme detection:

val df = Seq(
  "https://sparkingscala.com/examples/",
  "https://sparkingscala.com/tutorials/",
  "http://old.example.com/page",
  "ftp://files.example.com/data.csv",
  "https://sparkingscala.com/",
).toDF("url")

val df2 = df
  .withColumn("is_https",   col("url").startsWith("https://"))
  .withColumn("is_example", col("url").endsWith("/examples/"))

df2.show(false)
// +------------------------------------+--------+----------+
// |url                                 |is_https|is_example|
// +------------------------------------+--------+----------+
// |https://sparkingscala.com/examples/ |true    |true      |
// |https://sparkingscala.com/tutorials/|true    |false     |
// |http://old.example.com/page         |false   |false     |
// |ftp://files.example.com/data.csv    |false   |false     |
// |https://sparkingscala.com/          |true    |false     |
// +------------------------------------+--------+----------+

endsWith

def endsWith(literal: String): Column

def endsWith(other: Column): Column

endsWith mirrors startsWith — it returns true if the string column ends with the given suffix. The example above already shows both together. Like startsWith, it has a string-literal and a Column overload.

Null handling

All three methods return null when the input column is null:

val df = Seq(
  Some("support@sparkingscala.com"),
  Some("hello@example.org"),
  Some("not-an-email"),
  None,
).toDF("email")

val df2 = df
  .withColumn("has_at",               col("email").contains("@"))
  .withColumn("starts_with_support",  col("email").startsWith("support"))
  .withColumn("ends_with_com",        col("email").endsWith(".com"))

df2.show(false)
// +-------------------------+------+-------------------+-------------+
// |email                    |has_at|starts_with_support|ends_with_com|
// +-------------------------+------+-------------------+-------------+
// |support@sparkingscala.com|true  |true               |true         |
// |hello@example.org        |true  |false              |false        |
// |not-an-email             |false |false              |false        |
// |null                     |null  |null               |null         |
// +-------------------------+------+-------------------+-------------+

If null rows should be treated as false rather than null, wrap the result with coalesce: coalesce(col("email").contains("@"), lit(false)).

For finding the position of a substring rather than a boolean check, see instr and locate. For pattern-based matching with regular expressions, see regexp_replace. For splitting a string on a delimiter, see split.

Example Details

Created: 2026-03-21 02:20:20 PM

Last Updated: 2026-03-21 02:20:20 PM