The Guide You Need
Spark Scala Tutorials
Straight forward spark scala tutorials. Learn best practices, databricks platform nuances and the latest in big data trends...
Page 1 of 1
-
Creating DataFrames in Spark Scala for Testing with toDF
When testing your data engineering etl pipelines it can be a real help to quickly create simple DataFrames with the data scenarios you are transforming. Also, when you encounter problems in production that were unexpected, quickly creating test cases that account for that new situation are also highly beneficial. Thankfully the Spark Scala toDF function found in the implicits library can assist with this.
-
Spark Scala Cache Best Practices
Caching a DataFrame tells Spark to keep it in memory (or on disk) after the first time it's computed. This avoids recomputing the same transformations every time you trigger an action. Used well, it can dramatically speed up your pipelines. Used carelessly, it can eat all your memory and make things slower.