3  Data Visualisation

In this part, you will learn about ggplot2 as one of the most common packages for data visualisation within R. This was my entry point into R as during my PhD I wanted to customise plots that were simply not possible in SPSS. ggplot2 is part of the tidyverse family which is a collection of packages for wrangling, visualising, and analysing data. Each task is possible using functions built into R or other packages but the tidyverse works together like a little ecosystem and I like the function names as verbs to make your code very readable.

ggplot2 works through a layering system, meaning you build plots by adding and customising each component. This provides great flexibility to produce publication-quality plots that you can insert into manuscripts and presentations. We will work through different plot types such as histograms, scatterplots, and violin-boxplots to demonstrate key functionality. You will then have the skills and awareness to look into the components you need for your own research.

3.1 Intended Learning Outcomes

By the end of this part, you will be able to:

  • Read data files into R/RStudio.

  • Understand the ggplot2 layering system of creating plots.

  • Create and edit histograms to visualise the frequency of observations collated into bins.

  • Create and edit barplots to visualise the frequency of different categories.

  • Create and edit a scatterplot to visualise the relationship between two continuous variables.

  • Create and edit a violin-boxplot to visualise the density of data points in a continuous outcome.

  • Save plots as an image to use in reports or presentations.

3.2 Key chapters

3.3 Target Dataset

For most of this chapter, we are using open data from Lopez et al. (2023). The abstract of their article is:

Imagine a bowl of soup that never emptied, no matter how many spoonfuls you ate—when and how would you know to stop eating? Satiation can play a role in regulating eating behavior, but research suggests visual cues may be just as important. In a seminal study by Wansink et al. (2005), researchers used self-refilling bowls to assess how visual cues of portion size would influence intake. The study found that participants who unknowingly ate from self-refilling bowls ate more soup than did participants eating from normal (not self-refilling) bowls. Despite consuming 73% more soup, however, participants in the self-refilling condition did not believe they had consumed more soup, nor did they perceive themselves as more satiated than did participants eating from normal bowls. Given recent concerns regarding the validity of research from the Wansink lab, we conducted a preregistered direct replication study of Wansink et al. (2005) with a more highly powered sample (N = 464 vs. 54 in the original study). We found that most results replicated, albeit with half the effect size (d = 0.45 instead of 0.84), with participants in the self-refilling bowl condition eating significantly more soup than those in the control condition. Like the original study, participants in the selfrefilling condition did not believe they had consumed any more soup than participants in the control condition. These results suggest that eating can be strongly controlled by visual cues, which can even override satiation.

In summary, they replicated an (in)famous experiment that won the Ig-Nobel prize. Participants engaged in a intricate setting (seriously, go and look at the diagrams in the article) where they ate soup from bowls on a table. In the control group, participants could eat as much soup as they wanted and could ask for a top-up from the researchers. In the experimental group, the soup bowls automatically topped up through a series of hidden tubes under the table. The idea behind the control group is they get an accurate visual cue by the soup bowl reducing, and the experimental group get an inaccurate visual cue by the soup bowl seemingly never reducing. So, the inaccurate visual cue would interfere with natural signs of getting full and lead to people eating more.

In the original article, participants in the experimental group ate more soup than participants in the control group, but the main author was involved in a series of research misconduct cases. Lopez et al. (2023) wanted to see if the result would replicate in an independent study, so they predicted they would find the same results.

In the dataset, we have the following variables:

Variable Description
ParticipantID Participant ID number.
Sex Participant sex.
Age Participant age in years.
Ethnicity Participant ethnicity.
OzEstimate Estimated soup consumption in ounces (Oz).
CalEstimate Estimated soup consumption in calories (kcals).
M_postsoup Actual soup consumption in ounces (Oz).
F_CaloriesConsumed Actual soup consumption in calories (kcals).
Condition Condition labelled numerically as 0 (Control) and 1 (Experimental).

We will use this data to demonstrate how to create different types of plots in ggplot2.