Now you have learnt how to create different types of plots in R using
the package ggplot2
, we have a group task for you to
complete and apply what you have learnt.
We have a choice of three data sets for you to use, each of which are suited to different types of plot we have covered:
Farias et al. (2019) – Atheist’s and Christian’s motivations to hike a pilgrimage trail. Best suited to creating a violin-boxplot.
Dawtry et al. (2015) – Perceptions of income inequality, household income, support for wealth redistribution. Best suited to creating a scatterplot.
Glasgow city council - Road accident severity, speed limit, and day of the week. Best suited to creating a bar plot.
Using what you learnt about ggplot2
and the principles
of data visualization, we would like you to use one of these data sets
and create:
One “good” plot to transparently communicate the findings.
One “bad” plot by being intentionally misleading or poorly communicating the findings.
Below, we have provided some scaffolding to complete the task and include a code book for each data set so you know what each column represents. Once you have finished, save the plot as an image and save it to our Padlet board.
Once you are done, look at the submissions and add a heart to your favourite “good” and “bad” plot.
If you are running this for the first time, make sure you load the
tidyverse
library to get all the functions you need.
# If you are curious, the settings above stop messages and warnings when you knit to tidy things up
library(tidyverse)
For this data set, we have two option depending on whether you want
to visualise one type of motivation (Farias_2019.csv
) or
try and plot all the types of motivation side by side
(Farias_2019_long.csv
).
Farias_2019.csv
Participant - ID for each participant
Age - Age in years
Religion - Self-reported religious group as Christian or Atheist
Nature_close - Mean score across sub-scale items 1 (not at all important) to 6 (very important) for closeness to nature
Sen_seek - Mean score across sub-scale items 1 (not at all important) to 6 (very important) for sensation seeking
Community - Mean score across sub-scale items 1 (not at all important) to 6 (very important) for being a part of a community
Rel_growth - Mean score across sub-scale items 1 (not at all important) to 6 (very important) for religious growth
Spirit_growth - Mean score across sub-scale items 1 (not at all important) to 6 (very important) for spiritual seeking or finding the meaning of life
Seek_life - Mean score across sub-scale items 1 (not at all important) to 6 (very important) for searching for life direction.
Farias_2019_long.csv
Participant - ID for each participant
Age - Age in years
Religion - Self-reported religious group as Christian or Atheist
Motivation_subscale - Long version of the data set above where different types of motivation are listed for each participant across rows: Nature_close, Sen_seek, Community, Rel_growth, Spirit_growth, and Seek_life
Average_rating - Long version of the data set above for the mean score across sub-scale items 1 (not at all important) to 6 (very important).
# If you want to focus on plotting one type of motivation
farias_data <- read_csv("Data/Farias_2019.csv")
# If you want to plot all the types of motivation side by side
farias_long_data <- read_csv("Data/Farias_2019_long.csv")
# Enter code here to create the plot
# Enter code here to save your plot as an image
household_income - Participant’s own household income in dollars ($).
population_inequality_gini_index - Gini index of population inequality based on participant’s estimates for bottom and top income quintiles. Higher scores mean greater estimated inequality with 0 meaning perfect equality and 100 meaning perfect inequality.
population_mean_income - Estimated household income across the whole US population in dollars ($).
fairness_and_satisfaction - Mean of two items (1 extremely fair - 9 Extremely unfair) asking participants how fair and satisfied they are with income distribution across the US.
support_for_redistribution - Mean of four items (1 strongly disagree to 6 strongly agree) measuring attitudes for wealth redistribution in the US such as heavier taxes on the rich.
# load the data from Dawtry et al. (2015)
dawtry_data <- read_csv("Data/Dawtry_2015.csv")
# Enter code here to create the plot
# Enter code here to save your plot as an image
ACCIDENT_INDEX - 13 digit code for each unique accident.
ACCIDENT_SEVERITY - A category for whether the accident was slight, serious, or fatal.
SPEED_LIMIT - The speed limit of the road where the accident occurred, e.g., 30, 40, 50, 70.
DAY_OF_WEEK - The day of the week when the accident happened, e.g., Monday, Tuesday etc.
# load the road accident severity data
accident_severity <- read_csv("Data/road_accidents.csv")
# Enter code here to create the plot
# Enter code here to save your plot as an image