Convert a Spark Scala Map to a JSON String
Using a MapType in Spark Scala DataFrames provides a more flexible logical structures, hierarchical data and of course working with arbitrary data attributes.
Maps and JSON objects are very similar structures and it can be useful to convert maps to json. Specifically if you need to use 'arbitrary' structures within a column. JSON can often offer much more flexibility for downstream users of your data pipeline.
Map to JSON Column Example
Let's look at a simple example where we have maps of animals and their some of their attributes. Notice how all of the animals have slightly different attributes. This is one of the benefits of using a Map (and JSON).
In this example we are converting the map the json using the to_json
function:
val df = Seq[Map[String, Map[String, String]]](
Map("Lion" -> Map(
"Species" -> "Panthera leo",
"Habitat" -> "Grasslands and savannas",
"Diet" -> "Carnivore",
"Average Lifespan" -> "20",
"Top Speed (km/h)" -> "80"
)),
Map("Elephant" -> Map(
"Species" -> "Loxodonta africana",
"Habitat" -> "African forests and grasslands",
"Diet" -> "Herbivore",
"Average Lifespan" -> "60",
"Weight (kg)" -> "5000"
)),
Map("Dolphin" -> Map(
"Species" -> "Delphinus delphis",
"Habitat" -> "Oceans and seas",
"Diet" -> "Carnivore",
"Average Lifespan" -> "20",
"Swimming Speed (km/h)" -> "60"
)),
).toDF("animal")
val df2 = df
.withColumn("json", to_json(col("animal")))
df2.show(false)
// +--------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
// |animal |json |
// +--------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
// |{Lion -> {Diet -> Carnivore, Average Lifespan -> 20, Habitat -> Grasslands and savannas, Top Speed (km/h) -> 80, Species -> Panthera leo}} |{"Lion":{"Diet":"Carnivore","Average Lifespan":"20","Habitat":"Grasslands and savannas","Top Speed (km/h)":"80","Species":"Panthera leo"}} |
// |{Elephant -> {Weight (kg) -> 5000, Diet -> Herbivore, Average Lifespan -> 60, Habitat -> African forests and grasslands, Species -> Loxodonta africana}}|{"Elephant":{"Weight (kg)":"5000","Diet":"Herbivore","Average Lifespan":"60","Habitat":"African forests and grasslands","Species":"Loxodonta africana"}}|
// |{Dolphin -> {Swimming Speed (km/h) -> 60, Diet -> Carnivore, Average Lifespan -> 20, Habitat -> Oceans and seas, Species -> Delphinus delphis}} |{"Dolphin":{"Swimming Speed (km/h)":"60","Diet":"Carnivore","Average Lifespan":"20","Habitat":"Oceans and seas","Species":"Delphinus delphis"}} |
// +--------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
I will typically use the technique of converting maps to JSON when restructuring more complicated objects for down downstream consumers. Often the map is created from a grouping of different columns that are already within the data and then converted to JSON for convience.