
Amounts, small multiples, heatmaps
MADS6
Tuesday, 26 November 2024
Slides
Exploration
Aims of exploratory data analysis
Finding the right visualisation for amounts
Trends vs amounts
Small multiples with facets
Exploratory data analysis in R for Data Science (Wickham)
Chapters 6 in Fundamentals of data visualisation by Claus O. Wilke







brazil_url <- "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-04-06/brazil_loss.csv"
brazil_loss <- read_csv(brazil_url,
col_select = -c(entity, code),
col_types = list(year = col_character()))
pivot_longer(brazil_loss,
cols = commercial_crops:small_scale_clearing,
names_to = "reasons",
values_to = "area_ha") -> brazil_loss_long_complete
(brazil_loss_long_complete |>
filter(year == 2003) -> brazil_loss_long)# A tibble: 11 × 3
year reasons area_ha
<chr> <chr> <dbl>
1 2003 commercial_crops 550000
2 2003 flooding_due_to_dams 0
3 2003 natural_disturbances 35000
4 2003 pasture 2761000
5 2003 selective_logging 149000
6 2003 fire 44000
7 2003 mining 0
8 2003 other_infrastructure 9000
9 2003 roads 35000
10 2003 tree_plantations_including_palm 26000
11 2003 small_scale_clearing 358000

Technical column names
In exploratory analysis, bare column names or even functions that are used as in this example.
For a final, presented product, consider changing than to a more readable form.
brazil_loss_long |>
ggplot(aes(
y =
# Collapsing all but the five most common levels
fct_lump_n(reasons, n = 5, w = area_ha) |>
# Sorting by area
fct_infreq(w = area_ha) |>
# Reverse the sorting
fct_rev(),
x = area_ha,
fill = reasons
)) +
geom_col() +
labs(y = NULL) +
theme(legend.position = "bottom")Legend with all reasons
brazil_loss_long |>
mutate(reasons_fct =
# Collapsing all but the five most common levels
fct_lump_n(reasons, n = 5, w = area_ha) |>
# Sorting by area
fct_infreq(w = area_ha) |>
# Reverse the sorting
fct_rev() ) |>
ggplot(aes(
y = reasons_fct,
x = area_ha,
fill = reasons_fct
)) +
geom_col() +
labs(y = NULL) +
theme(legend.position = "bottom")Combining amounts and trends
Note
Alpha (transparancy) is a simple tool to emphasize particular plot elements and staying in the same color scheme.

brazil_loss |>
ggplot(aes(y = fct_rev(fct_infreq(year, pasture)), x = pasture)) +
geom_point(color = "red", size = 3)
Not ideal with years (ordinal category, technically numeric)
| year | pasture |
|---|---|
| 2001 | 1520000 |
| 2002 | 2568000 |
| 2003 | 2761000 |
| 2004 | 2564000 |
| 2005 | 2665000 |
| 2006 | 1861000 |
| 2007 | 1577000 |
| 2008 | 1345000 |
| 2009 | 847000 |
| 2010 | 616000 |
| 2011 | 738000 |
| 2012 | 546000 |
| 2013 | 695000 |
Not fantastic.
Meh.
Ack!




brazil_loss_long_complete |>
ggplot(aes(x = year, y = area_ha,
fill = fct_lump(reasons, n = 5, w = area_ha))) +
geom_col() +
labs(title = "The five main reasons for deforestation in Brazil",
subtitle = "Area in hectar per year",
fill = NULL,
x = NULL,
y = NULL) +
facet_wrap(vars(fct_lump(reasons, n = 5, w = area_ha)), nrow = 3) +
theme(legend.position = "none") +
guides(x = guide_axis(n.dodge = 2)) +
scale_y_continuous(labels = scales::comma)
reasonsDodged labels and square root transformation and viridis color scheme
Adding suppressed labels and log-transformation