The Reference You Need
Spark Scala Examples
Simple spark scala examples to help you quickly complete your data etl pipelines. Save time digging through the spark scala function api and instead get right to the code you need...
Page 2 of 4
-
String Length Functions in Spark Scala: length, bit_length, and octet_length
Spark provides three functions for measuring string size: length counts characters, octet_length counts bytes, and bit_length counts bits. For ASCII text they all agree, but they diverge once you have Unicode characters — which matters whenever you're validating input lengths or working with encoded data.
-
Split Function: Split Strings into Arrays in Spark Scala DataFrames
split breaks a string column on a delimiter or regular expression pattern and returns an ArrayType column. It's the go-to function whenever you need to turn a delimited string — like a CSV field, a tag list, or a log line — into individual elements you can work with.
-
Substring Functions: substring and substring_index in Spark Scala DataFrames
substring and substring_index are two complementary ways to extract a portion of a string column. Use substring when you know the character position; use substring_index when you want to split on a delimiter.
-
Lpad and Rpad: Pad Strings in Spark Scala DataFrames
The lpad and rpad functions pad a string column to a specified length by adding characters to the left or right side. They're commonly used for zero-padding numbers, aligning text output, and formatting fixed-width fields.
-
InitCap: Convert Strings to Title Case in Spark Scala DataFrames
The initcap function converts a string column to title case — capitalizing the first letter of each word and lowercasing the rest. It's useful for normalizing names, addresses, and other text where consistent capitalization matters.
-
Lower and Upper: Convert String Case in Spark Scala DataFrames
The lower and upper functions convert string columns to lowercase and uppercase respectively. They're commonly used to normalize data for case-insensitive comparisons and consistent formatting.
-
SBT Assembly Jar Naming in Shell Scripting
When using sbt in spark scala projects it can be useful to access the name of the assembly that will be created with sbt within your bash or zsh shell scripts. Thankfully extracting the values is rather straight forward.
-
Building MapType Columns in Spark Scala DataFrames for Enhanced Data Structuring
Using a MapType in Spark Scala DataFrames can be helpful as it provides a flexible logical structures that can be used when solving problems such as: Machine Learning Feature Engineering, Data Exploration, Serialization, Enriching Data and Denormalization. Thankfully building these map spark scala columns is very stright fwd from existing data within a data frame.
-
Converting a Map Type to a JSON String in Spark Scala
Using a MapType in Spark Scala DataFrames provides a more flexible logical structures, hierarchical data and of course working with arbitrary data attributes.
-
Spark Scala isin Function Examples
The isin function is defined on a spark column and is used to filter rows in a DataFrame or DataSet.