Formatting numbers

Notation and tables

Roland Krause

MADS6

Tuesday, 10 December 2024

Material

Formating numbers

Three types of numbers

  1. Exact numbers

  2. Measured numbers

  • Numbers with an associated error
  1. Computed numbers
  • Numbers resulting from (potentially complex) calculation of a measured quantity or simulation.

Note

In data science, we are frequently facing values with limited detail on their measurements. That’s not an invitation to ignore the fact.

Exact numbers

Counted objects

  • 3 bikes
  • 7 people
  • 1015 items sold

Note

In text, we would write “three bikes and seven people”.

Exact relation

1km = 1000m

Speed of light

299 792 458 m/s (a definition)

Measured numbers

Data with natural errors

  • Multiple measurements will return differing results

  • Measurements by different people will return different results.

  • Precision is naturally limited

Why we care

In data science we typically do not measure data ourselves.

The value we print to a report however conveys a meaning of certainty.

Significant digits

Types of zeros

Leading zeros
| ||||||
0.000000230453400
          |    ||
Captive zero   Trailing zeros

Significant zeros

  1. All non-zero numbers are significant.
  2. Zeros surrounded by other numbers are significant.
  3. Leading zeros are never significant
  4. Trailing zeros after a decimal if they have been measured.

Examples

Value Digits Explanation
1642 m 4 All non-zero
10.303 ml 5 Captive zeros
67.0 g 3 Trailing measured zero
0.00053503 5 All leading zeros irrelevant

Your turn

How many significant digits?

  • 1200001
  • 12.000
  • 0.0000000001
  • 0.0010000001

Solution

Value Digits Explanation
1200001 m 7 Mostly captive zeros
12.000 g 5 If measured
0.0000000001 1 All leading zeros irrelevant
0.0010000001 8

How to display numbers

Many numbers in data science are not in our “nice” range from 0 - 1000.

Example numbers

my_val <- (c(0.1, 0.00000000123, 13/150660001, 761231, -3.243) )
my_val
[1]  1.00000e-01  1.23000e-09  8.62870e-08  7.61231e+05
[5] -3.24300e+00

Numbers printed to the console can be deceiving

print(1)
[1] 1
print(1.00001)
[1] 1.00001
print(c(1, 1.00001))
[1] 1.00000 1.00001

But you can’t say Base R is doing a bad job…

Common notations of numbers

Decimal notation

sprintf("%f", my_val)
[1] "0.100000"      "0.000000"      "0.000000"     
[4] "761231.000000" "-3.243000"    

Scientific notation

Should be \(1.9 \times 10^{-4}\) but we usually get to see 1.9e-4.

$1.9 \times 10^{-4}$
sprintf("%e", my_val)
[1] "1.000000e-01"  "1.230000e-09"  "8.628700e-08" 
[4] "7.612310e+05"  "-3.243000e+00"

Engineering notation

Exponents are a multiple of three.

num(my_val, notation = "eng")
<pillar_num(eng)[5]>
[1] 100   e-3   1.23e-9  86.3 e-9 761.  e+3  -3.24e+0

Using the pillar::num()

Formating the C-style sprintf()

Formating

  • %i - Integer values
  • %f - Decimal numerical format (fixed)
  • %e - Scientific notation %E with capital E
  • %g - Best of both worlds: decimal if exponent is < -4.

Many other options for padding, currency symbols

Examples

sprintf("%f", my_val)
[1] "0.100000"      "0.000000"      "0.000000"     
[4] "761231.000000" "-3.243000"    
sprintf("%e", my_val)
[1] "1.000000e-01"  "1.230000e-09"  "8.628700e-08" 
[4] "7.612310e+05"  "-3.243000e+00"
sprintf("%G", my_val)
[1] "0.1"        "1.23E-09"   "8.6287E-08" "761231"    
[5] "-3.243"    
sprintf("%i", my_val)
Error in sprintf("%i", my_val): invalid format '%i'; use format %f, %e, %g or %a for numeric objects

Python

Modern Python is using sprintf-style syntax in its str.format() function. Using the sprintf functions is deprecated.

Check the documentation.

Rounding to significant digits

Display of numbers carries a meaning

Compare different numbers - Eight, 12.0, 1.76-e7, four, 3.0

Generally: Round to three significant digits

signif(my_val, 3) 
[1]  1.00e-01  1.23e-09  8.63e-08  7.61e+05 -3.24e+00

Rounding measured data

Adding numbers need to be rounded to the least precise digit

[1] 11.961
[1] 12

Least precise value (4.5) carries a single digit.

Tables

Data in tables

Basic conventions

  • Text is left-aligned.
  • Numbers are aligned at the decimal place

Base R is doing this nicely but the result is not publication quality.

# A tibble: 344 × 4
   species island    bill_length_mm bill_depth_mm
   <fct>   <fct>              <dbl>         <dbl>
 1 Adelie  Dream               32.1          15.5
 2 Adelie  Dream               33.1          16.1
 3 Adelie  Torgersen           33.5          19  
 4 Adelie  Dream               34            17.1
 5 Adelie  Torgersen           34.1          18.1
 6 Adelie  Torgersen           34.4          18.4
 7 Adelie  Biscoe              34.5          18.1
 8 Adelie  Torgersen           34.6          21.1
 9 Adelie  Torgersen           34.6          17.2
10 Adelie  Biscoe              35            17.9
# ℹ 334 more rows

Playing with gt

penguins |>
  select(species, island, contains("bill")) |>
  sample_n(10) |>
  gt() 
species island bill_length_mm bill_depth_mm
Gentoo Biscoe 44.5 14.3
Adelie Torgersen 38.6 21.2
Gentoo Biscoe 45.3 13.7
Chinstrap Dream 52.8 20.0
Adelie Torgersen 37.3 20.5
Chinstrap Dream 43.2 16.6
Gentoo Biscoe 47.5 14.2
Gentoo Biscoe 52.2 17.1
Chinstrap Dream 50.8 19.0
Gentoo Biscoe 46.1 13.2

Playing with gt

penguins |>
  select(species, island, contains("bill")) |>
  sample_n(10) |>
  gt() |>
  cols_align(
    align = "left", 
    columns = c(species, island)) |>
  cols_label(
    species = "Species",
    island = "Island",
    bill_length_mm =  "Bill length (mm)"  ,
    bill_depth_mm = "Bill depth (mm)"
  )   
Species Island Bill length (mm) Bill depth (mm)
Adelie Dream 35.6 17.5
Gentoo Biscoe 55.9 17.0
Gentoo Biscoe 43.2 14.5
Adelie Torgersen 37.2 19.4
Adelie Torgersen 34.6 17.2
Adelie Dream 36.0 18.5
Chinstrap Dream 52.2 18.8
Adelie Dream 32.1 15.5
Adelie Dream 37.2 18.1
Adelie Biscoe 38.1 17.0

Playing with gt

penguins |>
  select(species, island, contains("bill")) |>
  sample_n(10) |>
  gt() |>
  cols_align(
    align = "left", 
    columns = c(species, island)) |>
  cols_label(
    species = "Species",
    island = "Island",
    bill_length_mm =  "Length"  ,
    bill_depth_mm = "Depth"
  ) |>
  tab_spanner("Bill dimensions (mm)", contains("bill"))
Species Island
Bill dimensions (mm)
Length Depth
Chinstrap Dream 52.0 18.1
Adelie Torgersen 35.9 16.6
Adelie Torgersen 39.7 18.4
Adelie Torgersen 42.8 18.5
Adelie Biscoe 37.6 17.0
Adelie Dream 40.2 17.1
Gentoo Biscoe 45.5 15.0
Gentoo Biscoe 46.1 13.2
Chinstrap Dream 58.0 17.8
Chinstrap Dream 46.5 17.9

Playing with gt

penguins |>
  select(species, island, contains("bill")) |>
  sample_n(10) |>
  gt() |>
  cols_align(
    align = "left", 
    columns = c(species, island)) |>
  cols_label(
    species = "Species",
    island = "Island",
    bill_length_mm =  "Length"  ,
    bill_depth_mm = "Depth"
  ) |>
  tab_spanner("Bill dimensions (mm)", contains("bill")) |> 
  tab_options(column_labels.background.color = "#00A4E1") 
Species Island
Bill dimensions (mm)
Length Depth
Chinstrap Dream 49.6 18.2
Adelie Dream 37.6 19.3
Gentoo Biscoe 45.4 14.6
Gentoo Biscoe 51.3 14.2
Gentoo Biscoe 43.8 13.9
Chinstrap Dream 52.8 20.0
Adelie Dream 40.9 18.9
Adelie Torgersen NA NA
Adelie Torgersen 41.1 17.6
Adelie Torgersen 35.9 16.6

Playing with gt

penguins |>
  select(species, island, contains("bill")) |>
  sample_n(10) |>
  gt() |>
  cols_align(align = "left", columns = c(species, island)) |>
  cols_label(
    species = "Species",
    island = "Island",
    bill_length_mm =  "Length"  ,
    bill_depth_mm = "Depth"
  ) |>
  tab_spanner("Bill dimensions (mm)", contains("bill")) |> 
  tab_options(column_labels.background.color = "#00A4E1") |> 
  tab_style(
    style = cell_text(size = pct(120)),
    locations = cells_body()) |> 
  tab_style(
    style = cell_text(weight = "bold"),
    locations = list(cells_column_labels(),
                     cells_column_spanners()))
Species Island
Bill dimensions (mm)
Length Depth
Chinstrap Dream 51.3 19.9
Gentoo Biscoe 45.4 14.6
Adelie Torgersen 36.2 17.2
Adelie Biscoe 43.2 19.0
Gentoo Biscoe 46.5 13.5
Adelie Biscoe 41.1 18.2
Gentoo Biscoe 45.5 13.9
Adelie Biscoe 35.5 16.2
Adelie Dream 36.0 17.1
Chinstrap Dream 50.2 18.8

Before we stop

We looked into …

  • Some basic principles
  • Reviewed on material
  • Clean up a chart
  • Formatted numbers