
Amounts, small multiples, heatmaps
MADS6
2026-03-11
Slides
Exploration
Aims of exploratory data analysis
Finding the right visualisation for amounts
Trends vs amounts
Small multiples with facets
Exploratory data analysis in R for Data Science (Wickham)
Chapters 6 in Fundamentals of data visualisation by Claus O. Wilke







brazil_url <- "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-04-06/brazil_loss.csv"
brazil_loss <- read_csv(brazil_url,
col_select = -c(entity, code),
col_types = list(year = col_character()))
pivot_longer(brazil_loss,
cols = commercial_crops:small_scale_clearing,
names_to = "reasons",
values_to = "area_ha") -> brazil_loss_long_complete
(brazil_loss_long_complete |>
filter(year == 2003) -> brazil_loss_long)# A tibble: 11 × 3
year reasons area_ha
<chr> <chr> <dbl>
1 2003 commercial_crops 550000
2 2003 flooding_due_to_dams 0
3 2003 natural_disturbances 35000
4 2003 pasture 2761000
5 2003 selective_logging 149000
6 2003 fire 44000
7 2003 mining 0
8 2003 other_infrastructure 9000
9 2003 roads 35000
10 2003 tree_plantations_including_palm 26000
11 2003 small_scale_clearing 358000

Technical column names
In exploratory analysis, bare column names or even functions that are used as in this example.
For a final, presented product, consider changing than to a more readable form.
brazil_loss_long |>
ggplot(aes(
y =
# Collapsing all but the five most common levels
fct_lump_n(reasons, n = 5, w = area_ha) |>
# Sorting by area
fct_infreq(w = area_ha) |>
# Reverse the sorting
fct_rev(),
x = area_ha,
fill = reasons
)) +
geom_col() +
labs(y = NULL) +
theme(legend.position = "bottom")Legend with all reasons
brazil_loss_long |>
mutate(reasons_fct =
# Collapsing all but the five most common levels
fct_lump_n(reasons, n = 5, w = area_ha) |>
# Sorting by area
fct_infreq(w = area_ha) |>
# Reverse the sorting
fct_rev() ) |>
ggplot(aes(
y = reasons_fct,
x = area_ha,
fill = reasons_fct
)) +
geom_col() +
labs(y = NULL) +
theme(legend.position = "bottom")Combining amounts and trends
Note
Alpha (transparancy) is a simple tool to emphasize particular plot elements and staying in the same color scheme.


Not ideal with years (ordinal category, technically numeric)
| year | pasture |
|---|---|
| 2001 | 1520000 |
| 2002 | 2568000 |
| 2003 | 2761000 |
| 2004 | 2564000 |
| 2005 | 2665000 |
| 2006 | 1861000 |
| 2007 | 1577000 |
| 2008 | 1345000 |
| 2009 | 847000 |
| 2010 | 616000 |
| 2011 | 738000 |
| 2012 | 546000 |
| 2013 | 695000 |
Not fantastic.
Meh.
Ack!




brazil_loss_long_complete |>
ggplot(aes(x = year, y = area_ha,
fill = fct_lump(reasons, n = 5, w = area_ha))) +
geom_col() +
labs(title = "The five main reasons for deforestation in Brazil",
subtitle = "Area in hectar per year",
fill = NULL,
x = NULL,
y = NULL) +
facet_wrap(vars(fct_lump(reasons, n = 5, w = area_ha)), nrow = 3) +
theme(legend.position = "none") +
guides(x = guide_axis(n.dodge = 2)) +
scale_y_continuous(labels = scales::comma)
reasonsDodged labels and square root transformation and viridis color scheme
Adding suppressed labels and log-transformation