Helping you Learn Spark Scala.

Find code samples, tutorials and the latest news at Sparking Scala. We make it easy to solve your data etl problems and help you go from code to valuable outcomes quickly.

a spark scala dev who is trying to get their cluster to stop failing

an angry spark cluster node that is having a bad day

an elite spark scala developer that is monitoring a structured streaming job

a data engineer that is trying to optimize his spark scala partition sizes in his etl pipeline

an idle spark cluster pool that is waiting to be pushed

Recent Spark Scala Examples

SBT Assembly Jar Naming in Shell Scripting
When using sbt in spark scala projects it can be useful to access the name of the assembly that will be created with sbt within your bash or zsh shell scripts. Thankfully extracting the values is rather straight forward.
Building MapType Columns in Spark Scala DataFrames for Enhanced Data Structuring
Using a MapType in Spark Scala DataFrames can be helpful as it provides a flexible logical structures that can be used when solving problems such as: Machine Learning Feature Engineering, Data Exploration, Serialization, Enriching Data and Denormalization. Thankfully building these map spark scala columns is very stright fwd from existing data within a data frame.
Converting a Map Type to a JSON String in Spark Scala
Using a MapType in Spark Scala DataFrames provides a more flexible logical structures, hierarchical data and of course working with arbitrary data attributes.
Spark Scala isin Function Examples
The isin function is defined on a spark column and is used to filter rows in a DataFrame or DataSet.
Hashing Functions, Spark Scala SQL API Function
Hash functions serve many purposes in data engineering. They can be used to check the integrity of data, help with duplication issues, cryptographic use cases for security, improve efficiency when trying to balance partition sizes in large data operations and many more.
When and Otherwise in Spark Scala -- Examples
The when function in Spark implements conditionals within your DataFrame based etl pipelines. It allows you to perform fallthrough logic and create new columns with values based upone the conditions logic.
Coalesce in Spark Scala for Data Cleaning
The coalesce function returns the first non-null value from a list of columns. It's a common technique when you have multiple values and you want to prioritize selecting the first available one from them.
Format Numbers in Spark Scala For Humans
The format_number function in Spark Scala is useful when you want to present numerical data in a human-readable and standardized format. This helps improve the clarity of numeric values by adding, commas for thousands separators and controlling the number of decimal places.
Data Transformation and Data Extraction with Spark's regexp_replace Function
Regular expression matching and replace are a comonly used tool within data etl pipelines to transform, clean your string data and extract more structured information from it.
Concatenate Columns in Spark Scala | Guide to Using concat and concat_ws Functions
The concat function in Spark Scala takes multuple columns as input and returns a concated version of all of them. When any column in the list of columns to concatenate are null then the result is null.

See more spark scala examples...

Recent Spark Scala Tutorials

Creating DataFrames in Spark Scala for Testing with toDF
When testing your data engineering etl pipelines it can be a real help to quickly create simple DataFrames with the data scenarios you are transforming. Also, when you encounter problems in production that were unexpected, quickly creating test cases that account for that new situation are also highly beneficial. Thankfully the Spark Scala toDF function found in the implicits library can assist with this.
Spark Scala Cache Best Practices
Sequi magni ut numquam aut corporis qui quam ex enim dolor reprehenderit mollitia. Et non odio dolore dicta incidunt perferendis. Pariatur voluptatum ex a natus sed qui illo iste commodi voluptas similique

See more spark scala tutorials...

Latest Spark Scala News

Spark is Like a Sledgehammer
UNKNOWN
Introducing SparkingScala: Your Ultimate Spark Scala Resource
In the evolving landscape of big data engineering and analytics, staying up-to-date with the latest tools and technologies is a chore. Also, with the growing adoption of pyspark, Spark Scala seems to be taking a back seat in the ecosystem. That's where SparkingScala comes to the rescue! Created by experienced data engineers who have been developing and maintaining spark scala applications for years. We aim to create a simple resource for Spark Scala.

See the latst big data news...