Group task

Overview

Now you have learnt how to create different types of plots in R using the package ggplot2, we have a group task for you to complete and apply what you have learnt.

We have a choice of three data sets for you to use, each of which are suited to different types of plot we have covered:

  1. Farias et al. (2019) – Atheist’s and Christian’s motivations to hike a pilgrimage trail. Best suited to creating a violin-boxplot.

  2. Dawtry et al. (2015) – Perceptions of income inequality, household income, support for wealth redistribution. Best suited to creating a scatterplot.

  3. Glasgow city council - Road accident severity, speed limit, and day of the week. Best suited to creating a bar plot.

Your task

Using what you learnt about ggplot2 and the principles of data visualization, we would like you to use one of these data sets and create:

  1. One “good” plot to transparently communicate the findings.

  2. One “bad” plot by being intentionally misleading or poorly communicating the findings.

Below, we have provided some scaffolding to complete the task and include a code book for each data set so you know what each column represents. Once you have finished, save the plot as an image and save it to our Padlet board.

Once you are done, look at the submissions and add a heart to your favourite “good” and “bad” plot.

Load packages

If you are running this for the first time, make sure you load the tidyverse library to get all the functions you need.

# If you are curious, the settings above stop messages and warnings when you knit to tidy things up 
library(tidyverse)

Farias et al. (2019)

Codebook

For this data set, we have two option depending on whether you want to visualise one type of motivation (Farias_2019.csv) or try and plot all the types of motivation side by side (Farias_2019_long.csv).

Farias_2019.csv

  • Participant - ID for each participant

  • Age - Age in years

  • Religion - Self-reported religious group as Christian or Atheist

  • Nature_close - Mean score across sub-scale items 1 (not at all important) to 6 (very important) for closeness to nature

  • Sen_seek - Mean score across sub-scale items 1 (not at all important) to 6 (very important) for sensation seeking

  • Community - Mean score across sub-scale items 1 (not at all important) to 6 (very important) for being a part of a community

  • Rel_growth - Mean score across sub-scale items 1 (not at all important) to 6 (very important) for religious growth

  • Spirit_growth - Mean score across sub-scale items 1 (not at all important) to 6 (very important) for spiritual seeking or finding the meaning of life

  • Seek_life - Mean score across sub-scale items 1 (not at all important) to 6 (very important) for searching for life direction.

Farias_2019_long.csv

  • Participant - ID for each participant

  • Age - Age in years

  • Religion - Self-reported religious group as Christian or Atheist

  • Motivation_subscale - Long version of the data set above where different types of motivation are listed for each participant across rows: Nature_close, Sen_seek, Community, Rel_growth, Spirit_growth, and Seek_life

  • Average_rating - Long version of the data set above for the mean score across sub-scale items 1 (not at all important) to 6 (very important).

Creating the plot

Load the data set

# If you want to focus on plotting one type of motivation
farias_data <- read_csv("Data/Farias_2019.csv")

# If you want to plot all the types of motivation side by side 
farias_long_data <- read_csv("Data/Farias_2019_long.csv")

Create the plot

# Enter code here to create the plot 

Saving the plot

# Enter code here to save your plot as an image

Dawtry et al. (2015)

Codebook

  • household_income - Participant’s own household income in dollars ($).

  • population_inequality_gini_index - Gini index of population inequality based on participant’s estimates for bottom and top income quintiles. Higher scores mean greater estimated inequality with 0 meaning perfect equality and 100 meaning perfect inequality.

  • population_mean_income - Estimated household income across the whole US population in dollars ($).

  • fairness_and_satisfaction - Mean of two items (1 extremely fair - 9 Extremely unfair) asking participants how fair and satisfied they are with income distribution across the US.

  • support_for_redistribution - Mean of four items (1 strongly disagree to 6 strongly agree) measuring attitudes for wealth redistribution in the US such as heavier taxes on the rich.

Creating the plot

Load the data set

# load the data from Dawtry et al. (2015) 
dawtry_data <- read_csv("Data/Dawtry_2015.csv")

Create the plot

# Enter code here to create the plot 

Saving the plot

# Enter code here to save your plot as an image

Road accident severity

Codebook

  • ACCIDENT_INDEX - 13 digit code for each unique accident.

  • ACCIDENT_SEVERITY - A category for whether the accident was slight, serious, or fatal.

  • SPEED_LIMIT - The speed limit of the road where the accident occurred, e.g., 30, 40, 50, 70.

  • DAY_OF_WEEK - The day of the week when the accident happened, e.g., Monday, Tuesday etc.

Creating the plot

Load the data set

# load the road accident severity data
accident_severity <- read_csv("Data/road_accidents.csv")

Create the plot

# Enter code here to create the plot 

Saving the plot

# Enter code here to save your plot as an image