The Reference You Need
Spark Scala Examples
Simple spark scala examples to help you quickly complete your data etl pipelines. Save time digging through the spark scala function api and instead get right to the code you need...
Page 2 of 2
-
Trim Functions: Usage and Examples for trim, ltrim, and rtrim in Spark Scala DataFrames
When doing string manipulations in Spark Scala Data Frames, trim is a frequently used function that can quickly help clean up or trim the whitespace (or any characters) from the start and ends of a string column.
-
Run a Single Test Using SBT in Your Spark Scala Project
It's also really easy to run a single test using sbt, but the syntax is rather convoluted and can be hard to rember. To run a specific test you specify the test suite and the complete test name as follow:
-
Converting a Struct Type to a JSON String in Spark Scala
Using a struct type in Spark Scala DataFrames offers different benefits, from type safety, more flexible logical structures, hierarchical data and of course working with structured data.
-
Random Functions, Rand, Randn and Examples
Generating random values is a common need when creating data etl pipeline. They are useful for machine learning pipelines, data sampling and testing to mimic real data (synthetic data). The rand functions fill this need when working with DataFrames.
-
Spark Scala Functions, Spark SQL API
The Spark SQL Functions API is a powerful tool provided by Apache Spark's Scala library. It provides many familiar functions used in data processing, data manipulation and transformations. Anyone who has experience with SQL will quickly understand many of the capabilities and how they work with DataFrames.
-
Array Union, Spark Scala SQL API Function
The array_union function in Spark Scala takes two arrays as input and returns a new array containing all unique elements from the input arrays, removing any duplicates. When one or more of the arrays are null the entire result is null.