Helping you Learn Spark Scala.
Find code samples, tutorials and the latest news at Sparking Scala. We make it easy to solve your data etl problems and help you go from code to valuable outcomes quickly.
Recent Spark Scala Examples
-
Substring Functions: substring and substring_index in Spark Scala DataFrames
substring and substring_index are two complementary ways to extract a portion of a string column. Use substring when you know the character position; use substring_index when you want to split on a delimiter.
-
Lpad and Rpad: Pad Strings in Spark Scala DataFrames
The lpad and rpad functions pad a string column to a specified length by adding characters to the left or right side. They're commonly used for zero-padding numbers, aligning text output, and formatting fixed-width fields.
-
InitCap: Convert Strings to Title Case in Spark Scala DataFrames
The initcap function converts a string column to title case — capitalizing the first letter of each word and lowercasing the rest. It's useful for normalizing names, addresses, and other text where consistent capitalization matters.
-
Lower and Upper: Convert String Case in Spark Scala DataFrames
The lower and upper functions convert string columns to lowercase and uppercase respectively. They're commonly used to normalize data for case-insensitive comparisons and consistent formatting.
-
SBT Assembly Jar Naming in Shell Scripting
When using sbt in spark scala projects it can be useful to access the name of the assembly that will be created with sbt within your bash or zsh shell scripts. Thankfully extracting the values is rather straight forward.
-
Building MapType Columns in Spark Scala DataFrames for Enhanced Data Structuring
Using a MapType in Spark Scala DataFrames can be helpful as it provides a flexible logical structures that can be used when solving problems such as: Machine Learning Feature Engineering, Data Exploration, Serialization, Enriching Data and Denormalization. Thankfully building these map spark scala columns is very stright fwd from existing data within a data frame.
-
Converting a Map Type to a JSON String in Spark Scala
Using a MapType in Spark Scala DataFrames provides a more flexible logical structures, hierarchical data and of course working with arbitrary data attributes.
-
Spark Scala isin Function Examples
The isin function is defined on a spark column and is used to filter rows in a DataFrame or DataSet.
-
Hashing Functions, Spark Scala SQL API Function
Hash functions serve many purposes in data engineering. They can be used to check the integrity of data, help with duplication issues, cryptographic use cases for security, improve efficiency when trying to balance partition sizes in large data operations and many more.
-
When and Otherwise in Spark Scala -- Examples
The when function in Spark implements conditionals within your DataFrame based etl pipelines. It allows you to perform fallthrough logic and create new columns with values based upone the conditions logic.
Recent Spark Scala Tutorials
-
Creating DataFrames in Spark Scala for Testing with toDF
When testing your data engineering etl pipelines it can be a real help to quickly create simple DataFrames with the data scenarios you are transforming. Also, when you encounter problems in production that were unexpected, quickly creating test cases that account for that new situation are also highly beneficial. Thankfully the Spark Scala toDF function found in the implicits library can assist with this.
-
Spark Scala Cache Best Practices
Caching a DataFrame tells Spark to keep it in memory (or on disk) after the first time it's computed. This avoids recomputing the same transformations every time you trigger an action. Used well, it can dramatically speed up your pipelines. Used carelessly, it can eat all your memory and make things slower.
Latest Spark Scala News
-
Upgrading from Spark 3.x to Spark 4.0: A Practical Guide
Spark 4.0 brings real breaking changes that will likely affect your existing Scala pipelines — ANSI mode on by default, Scala 2.12 dropped, JDK 17 required, and infrastructure changes to shuffle and event logging. This guide walks through each one with before/after context and the config knob to fall back if you need time to migrate.
-
What's New in Spark 4.0 for Scala Developers
Spark 4.0 is the biggest release in years — over 5,100 resolved tickets from 390+ contributors. Here's what matters most if you're writing or maintaining Spark Scala applications.
-
Spark is Like a Sledgehammer
UNKNOWN
-
Introducing SparkingScala: Your Ultimate Spark Scala Resource
In the evolving landscape of big data engineering and analytics, staying up-to-date with the latest tools and technologies is a chore. Also, with the growing adoption of pyspark, Spark Scala seems to be taking a back seat in the ecosystem. That's where SparkingScala comes to the rescue! Created by experienced data engineers who have been developing and maintaining spark scala applications for years. We aim to create a simple resource for Spark Scala.