From raw data to presentable figures

Concluding practical

Author

Tyler McInnes

Published

November 28, 2025

Overview

In the concluding practical you will use publicly available data and complete a small analysis. You will import, explore, transform, and visualise this data set. Your work must include well written and well documented code following the principles we have discussed during the workshop.

There is no prescribed outcome or set questions for you to answer in this practical. You will choose a dataset, carry out exploratory analysis to familiarise yourself with the available data, and decide what hypothesis to test or trends to present. You will produce one or more publication-quality figures to present your observations.

This practical aims to simulate a realistic experience for a data scientist. The decisions you make regarding how to interact with the data, what to show and how best to present the data is at least as important as the code you write.

Details

Setup

Create a new Quarto document (.qmd), give it an appropriate title and add it to the git repository.

All code and observations should be included within this .qmd file. Your code should be reproducible (i.e., a novice R user should be able to reproduce your work with only your Quarto file to work from).

Visit the tidytuesday github and choose a dataset to work with. Choose a dataset that is important to you personally (e.g., the data relates to a tv show you like, a cause you believe in, or something you want to learn more about). Note your initial observations and thoughts.

To easily browse the data in tidytuesday first use the README page to get a brief description of all the projects. Clicking on a project in the Data column will take you directly to that project’s data. There are two ways to access the data. We recommend you use Option 2: Read directly from Github which uses the readr::read_csv function and a url to import the data from github.

The expected end-product of this practical is a publication-quality image presenting one or more aspects of the data. At this point you should clearly define your aim or hypothesis, including what data you will present and how you might present it.

It is reasonable to expect that your final product will differ to some degree from your initial plan. During the course of the analysis you may find that your initial aim was unachievable or unreasonable, and it is perfectly acceptable to produce a final product that is very different from your original plan. This practical encourages you to accurately document your process data analysis process.

Git

Make sure you commit your changes to the local repository often. We advise once you have completed one code and text chunk. Don’t forget to push the final code.

Exploratory analysis

Carry out an exploratory analysis of the data. Remember that the data in tidytuesday are completely real data which will likely include groupings, outliers, errors, or other unexpected findings. You must document your thought process, code, observations, and any figures you use to interrogate your data set.

As an example, remember that in the day 3 workflow with the parfumo data we looked at two hypotheses (both looking at a plausible correlation between two variables). The outcome of our exploratory analyses informed future choices about our analysis. It is expected that you will explore your data in a similar way.

Data transformation and visualisation

Include a minimum of two draft figures. This may be the same data visualised differently, or may be different aspects of the data that you are considering for visualisation. These figures should be minimal examples (e.g., default axis labels, no additional aesthetic mapping outside of what is required to show the data).

You may need to carry out data wrangling to transform your data for the final visualisation. This could include deriving new variables, re-ordering variables, filtering or arranging rows. It is also possible that you are not required to perform any data wrangling in order to produce your final visualisation. If that is the case you must instead create a minimal object to store the data you are visualising using the data wrangling skills and practices covered in the workshop. A minimal object will include only the data you are visualising and this data should be in a format that maximises readability and interpretability.

Final visualisation

Produce a publication quality visualisation of some aspect of the dataset. The figure should ‘stand alone’, meaning that all relevant information required to interpret the plot is included in the visualisation. The type of plot should make sense for the data you have chosen and it should be visually aesthetic.