Job Board

Format Numbers Easily in Spark Scala

The format_number function in Spark Scala is useful when you want to present numerical data in a human-readable and standardized format. This helps improve the clarity of numeric values by adding, commas for thousands separators and controlling the number of decimal places.

Format Number Definition

The format_number function first appeared in version 1.5.0 and as of Spark 3.4.1` it is defined as:

def format_number(x: Column, d: Int): Column

The parameter x is the numerical column to format and d is the number of decimal places to use. The resulting column is a string formated with comma's and the specificed decimal places.

Format Number Example

Let's see format_number in action. Supposed we have a simple DataFrame that contains the boiling point in Celcius of different elements in the periodic table:

val df = Seq(
  ("Actinium", 3200.0),
  ("Chlorine", -34.04),
  ("Nitrogen", -195.79),
  ("Iodine", 184.3),
  ("Praseodymium", 3290.0)
).toDF("element", "boiling_point")

Now let's create a new column called 'formatted' that we will create using the formate_number function:

val df2 = df.withColumn("formatted", format_number(col("boiling_point"), 3))

If we show this dataframe we would see:

// +------------+-------------+---------+
// |element     |boiling_point|formatted|
// +------------+-------------+---------+
// |Actinium    |3200.0       |3,200.000|
// |Chlorine    |-34.04       |-34.040  |
// |Nitrogen    |-195.79      |-195.790 |
// |Iodine      |184.3        |184.300  |
// |Praseodymium|3290.0       |3,290.000|
// +------------+-------------+---------+

The format_number is a straight forward spark scala function. It can be used to make numerical values much easier to read and understand for humans. This is especially important when presenting data to stakeholders or analysts.

Example Details

Created: 2023-08-11 11:51:00 AM

Last Updated: 2023-08-11 11:51:00 AM