The Reference You Need
Spark Scala Examples
Simple spark scala examples to help you quickly complete your data etl pipelines. Save time digging through the spark scala function api and instead get right to the code you need...
Page 3 of 4
-
Hashing Functions, Spark Scala SQL API Function
Hash functions serve many purposes in data engineering. They can be used to check the integrity of data, help with duplication issues, cryptographic use cases for security, improve efficiency when trying to balance partition sizes in large data operations and many more.
-
When and Otherwise in Spark Scala -- Examples
The when function in Spark implements conditionals within your DataFrame based etl pipelines. It allows you to perform fallthrough logic and create new columns with values based upone the conditions logic.
-
Coalesce in Spark Scala for Data Cleaning
The coalesce function returns the first non-null value from a list of columns. It's a common technique when you have multiple values and you want to prioritize selecting the first available one from them.
-
Format Numbers in Spark Scala For Humans
The format_number function in Spark Scala is useful when you want to present numerical data in a human-readable and standardized format. This helps improve the clarity of numeric values by adding, commas for thousands separators and controlling the number of decimal places.
-
Data Transformation and Data Extraction with Spark's regexp_replace Function
Regular expression matching and replace are a comonly used tool within data etl pipelines to transform, clean your string data and extract more structured information from it.
-
Concatenate Columns in Spark Scala | Guide to Using concat and concat_ws Functions
The concat function in Spark Scala takes multuple columns as input and returns a concated version of all of them. When any column in the list of columns to concatenate are null then the result is null.
-
Trim Functions: Usage and Examples for trim, ltrim, and rtrim in Spark Scala DataFrames
When doing string manipulations in Spark Scala Data Frames, trim is a frequently used function that can quickly help clean up or trim the whitespace (or any characters) from the start and ends of a string column.
-
Run a Single Test Using SBT in Your Spark Scala Project
It's also really easy to run a single test using sbt, but the syntax is rather convoluted and can be hard to rember. To run a specific test you specify the test suite and the complete test name as follow:
-
Converting a Struct Type to a JSON String in Spark Scala
Using a struct type in Spark Scala DataFrames offers different benefits, from type safety, more flexible logical structures, hierarchical data and of course working with structured data.
-
Random Functions, Rand, Randn and Examples
Generating random values is a common need when creating data etl pipeline. They are useful for machine learning pipelines, data sampling and testing to mimic real data (synthetic data). The rand functions fill this need when working with DataFrames.