Data wrangling

Exercises for Day 2

Author

Tyler McInnes

Published

November 27, 2025

We will work with the flights data for these exercises. The exercises aim to help you remember the format of functions and test how well you can combine the functions logically. The exercises build in complexity.

1 Recap the basic dplyr verbs

Remind yourself what the flights data looks like (column names, dimensions, a general summary of the flights object).

1.1 How many flights departed in January or February?

1.2 List the flights that departed with UA or AA and flew more than 1000 miles.

Return only the first 10 flights, showing the carrier, flight number, and distance flown.

1.3 Identify whether any flights departed on time but arrived more than 30 minutes late.

1.4 Sort flights by carrier, and then within carrier sort by departure delay.

1.5 Calculate a new variable called gain, which is the arrival delay - departure delay

1.6 Create a delay_category variable

“on time” (delay ≤ 0), “minor” (0-30), “moderate” (30-60), “severe” (>60)

2 Complex exercises

2.1 Assuming you wanted to depart on time, which carrier would you recommend?

2.2 Which destinations were served by the most carriers?

2.3 Find the total distance flown by each carrier.

2.4 What were the fastest flights (highest air speed)?

Create a speed variable (distance/air_time * 60) and arrange accordingly.

2.5 Create delay categories, then count flights in each category by carrier

2.6 Plot the number of flights per month.

TipHints
  • to create a line chart, use geom_line()

  • to count the total number of rows in a group use n()

flights |> 
  group_by(month) |> 
  summarise(n = n()) |> 
  remove_missing() |> 
ggplot(aes(month, n)) +
  geom_line() +
   scale_x_continuous(expand = c(0, NA))