Trends

Line charts

Roland Krause

MADS6

Wednesday, 25 March 2026

Session aims

Learning objectives

How to draw trends with smoothing
Direct labeling for lines

Good plotting practice

Slowly moving towards production.

Material

Chapter Statistical summaries in ggplot2: Elegant Graphics for Data Analysis (3e)

Fundamentals of data visualisation

Consult the books for reference

We can only cover a fraction in class!

Brazil rain forest

Previously considered Brazil loss data as categorical

brazil_url <- "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-04-06/brazil_loss.csv"
brazil_loss <- read_csv(brazil_url, 
                        col_select = -c(entity, code),
                        col_types = list(year = col_character()))

pivot_longer(brazil_loss,
           cols = commercial_crops:small_scale_clearing,
           names_to = "reasons",
           values_to = "area_ha") -> brazil_loss_long_complete

(brazil_loss_long_complete |> 
    filter(year == 2003)  -> brazil_loss_long)

# A tibble: 11 × 3
   year  reasons                         area_ha
   <chr> <chr>                             <dbl>
 1 2003  commercial_crops                 550000
 2 2003  flooding_due_to_dams                  0
 3 2003  natural_disturbances              35000
 4 2003  pasture                         2761000
 5 2003  selective_logging                149000
 6 2003  fire                              44000
 7 2003  mining                                0
 8 2003  other_infrastructure               9000
 9 2003  roads                             35000
10 2003  tree_plantations_including_palm   26000
11 2003  small_scale_clearing             358000

Caution

Is this really a time series?

Small multiple plot with years as category

brazil_loss_long_complete |>
  ggplot(aes(x = year, y = area_ha, 
             fill = fct_lump(reasons, n = 5, w = area_ha))) +
  geom_col() +
  labs(title = "The five main reasons for deforestation in Brazil",
       subtitle = "Area in hectar per year",
       fill = NULL,
       x = NULL,
       y = NULL) +
  facet_wrap(vars(fct_lump(reasons, n = 5, w = area_ha)), nrow = 3) +
  theme(legend.position = "none")  +
  guides(x = guide_axis(n.dodge = 2)) +
  scale_y_continuous(labels = scales::comma) +
  theme(plot.background = element_rect(colour = "red", linewidth = 2))

Time data should be lines

Code

brazil_loss_long_complete |>
  ggplot(aes(x = as.factor(year), y = area_ha, colour = reasons)) + 
  
#             fill = fct_lump(reasons, n = 5, w = area_ha))) +
  geom_line(aes( group = reasons)) +
  labs(title = "Reasons for deforestation in Brazil in the early 2000s",
       subtitle = "Area in hectar per year",
       fill = NULL,
       x = NULL,
       y = NULL) +
  #vars(fct_lump(reasons, n = 5, w = area_ha))
  facet_wrap(vars(reasons), scales = "free_y") +
  theme(legend.position = "none",
        axis.text = )  +
   scale_x_discrete(
    breaks = seq(from = 2000, to = 2015, by = 5)) +
  scale_y_continuous(labels = scales::comma) +
  theme(base_size = 6)

Labels are posing a problem with small multiple plots.

Basic line chart

Code

gapminder |> 
  # filtering self-join for countries with 
  # more than 50M people in 1952
  semi_join(gapminder |> 
               filter(year == 1952, 
                      pop > 50000000) |> 
            select(country), join_by(country)) -> 
              gapminder_large
              
  gapminder_large |>
ggplot(aes(x = as.factor(year), y= pop, colour = country, group = country)) +
  geom_line() +
  labs(title = "Population growth", 
       y = NULL, x = NULL) +
  theme(plot.background = element_rect(colour = "red", linewidth = 2)) +
  scale_y_continuous(labels = label_number(suffix = "M", scale = 1e-6)) ->
  def
def

Confusing for audience

Extra effort to read the legend.

Direct labeling using `ggrepel`

Code

library(ggrepel)
gapminder_large |> 
  # for creating labels
  mutate(country_label = if_else(year == max(year), country, NA_character_ )) |> 
  ggplot(aes(x = year, y= pop, colour = country, group = country)) +
  geom_line() +
  geom_text_repel(aes(label = country_label) , nudge_x = 0.35, size = 4) +
  theme(legend.position = "none") +
    scale_y_continuous(labels = label_number(suffix = "M", scale = 1e-6)) +
  # space for plot, not ideal solution
  coord_cartesian(xlim = c(1952, 2016)) +
  labs(title = "Population growth",
    x = NULL,
      y = NULL) ->
  ggrep

ggrep

ggrepel working OK but not ideal.

Using a secondary axis

Code

# filtering self-join for countries with 
# less than 10M people in 1952

gapminder_label <-
  semi_join(gapminder,
            gapminder |>
              filter(year == 1952,
                     pop > 50000000),
            join_by(country)) |>
  filter(year == max(year))

gapminder_large |>
  semi_join(gapminder_label, join_by(country)) |>
  ggplot(aes(
    x = year,
    y = pop,
    colour = country,
    group = country
  )) +
  geom_line() +
  theme(legend.position = "none") +
  scale_y_continuous(
    labels = label_number(suffix = "M", scale = 1e-6),
    sec.axis = dup_axis(
      breaks = gapminder_label$pop,
      labels = gapminder_label$country,
      name = NULL
    ),
    trans = "log10"
  ) +
  labs(title = "Population growth",
    x = NULL,
       y = NULL, ) +  
 coord_cartesian(xlim = c(1955, 2004)) ->
  sea

sea

Direct labeling with `geomtextpath`

Code

library(geomtextpath)

gapminder |>
  semi_join(gapminder_label, join_by(country)) |>
  ggplot(aes(
    x = year,
    y = pop,
    colour = country,
    group = country
  )) +
  geom_textpath(aes(label = country), 
                # label at 90% of line
                hjust= 0.9) + 
    scale_y_continuous(labels = label_number(suffix = "M", scale = 1e-6)) +
  theme(legend.position = "none") +
  labs(title = "Population growth",
    x = NULL,
      y = NULL) -> 
  gtp 

gtp

Summary

Time series

Basics

Unemployment rate in the US

ggplot(economics) +
  aes(x = date, y = unemploy) +
  geom_point() +
  geom_smooth()

Connecting times with lines

Subsampling `economics`

ggplot(economics  |> 
         slice_sample(n = 5)) +
  aes(x = date, y = unemploy) +
  geom_point() +
  geom_line()

LOESS smoothing

ggplot(economics) +
  aes(x = date, y = unemploy) +
  geom_point() +
  geom_smooth(method = "loess", formula = y ~ x)

Locally estimated scatterplot

Standard in ggplot2, computationally expensive

Using gam smoothing

ggplot(economics) +
  aes(x = date, y = unemploy) +
  geom_point() +
  geom_smooth(method = "gam", colour = "#BC0511"
                )

Alternative

Generalized additive models with integrated smoothness estimation

Moving average

Code

economics |> 
  # Rolling average of 30 measurements before and after
  mutate(unemploy_ma = stats::filter(unemploy, rep(1/30,30), sides =2)) |> 
  relocate(unemploy_ma) |> 
  ggplot() +
  geom_point(aes(x = date, y = unemploy), color = "#DC021B", size = 0.5, alpha = 0.7) +
  geom_line(aes(x = date, y = unemploy_ma), colour = "#00a4e1")

Thank you for your attention!

gapminder_label <-
  semi_join(gapminder,
            gapminder |>
              filter(year == 1952,
                     pop > 90000000),
            join_by(country)) |>
  filter(year == max(year))

gapminder_large |>
  semi_join(gapminder_label, join_by(country)) |>
  ggplot(aes(
    x = year,
    y = pop,
    colour = country,
    group = country
  )) +
  geom_line(linewidth = 7, show.legend = FALSE) +
  scale_y_continuous(trans = "log10"  ) +
  labs(title = NULL,
    x = NULL,
       y = NULL ) +  
  theme(legend.position = "none") +
  coord_cartesian(xlim = c(1955, 2004)) + 
  theme_void() 
   ggsave("../img/logo_trends.png")

Trends

Session aims

Learning objectives

Material

Brazil rain forest

Small multiple plot with years as category

Time data should be lines

Basic line chart

Direct labeling using ggrepel

Using a secondary axis

Direct labeling with geomtextpath

Summary

Time series

Basics

Unemployment rate in the US

Connecting times with lines

Subsampling economics

LOESS smoothing

Using gam smoothing

Moving average

Thank you for your attention!

Direct labeling using `ggrepel`

Direct labeling with `geomtextpath`

Subsampling `economics`