Overview of reproducible documents.
papaja walkthrough.
Work on your own project.
Preparation
Have you downloaded the workshop materials?
Are tidyverse
, papaja
, and pwr
installed?
Do you have a LaTeX system installed?
Computational reproducibility
31% reproducibility success rate in one journal (Hardwicke et al., 2018)
58% reproducibility success rate in registered reports (Obels et al., 2020)
Reporting errors
Connection between code and output
Reproducible manuscripts
Potential to avoid these errors as you combine code, results, and prose in one document
When there are errors, there are reproducible errors
It took me several years and projects to adopt this workflow:
Copying results from SPSS output
Copying results from R output
Using R Markdown to create reproducible results
Using papaja to write reproducible manuscripts
papaja (Aust & Barth, 2022) adds templates when you create a new R Markdown document
In this tutorial, we will walk through:
YAML options
Citations/references via a .bib file
Inline code
Tables / figures
We have created a mock example using papaja and simulated data to show it capabilities
Building on Hoffman and Elmi (2021): What is the effect of teaching debugging skills on students’ data wrangling ability?
Randomly allocate students to an error-full or error-free lecture (IV) and measure performance on a data skills assignment (DV)
Name and affiliation for each author, but only one corresponding author
Option to include contributorship roles, such as CRediT.
---
author:
- name : "James Bartlett"
affiliation : "1"
corresponding : yes # Define only one corresponding author
address : "62 Hillhead Street, Glasgow"
email : "james.bartlett@glasgow.ac.uk"
role:
- "Conceptualization"
- "Writing - Original Draft Preparation"
- "Writing - Review & Editing"
---
I recommend Zotero as a reference manager: https://www.zotero.org/
Create a collection
Export collection
Format BibTeX and OK
Save as in your document working directory
---
bibliography : ["references.bib", "r-references.bib"]
---
Once you have a .bib file, you can easily change the style by selecting a different citation style language (CSL)
Over 10,000 in the Zotero style repository, just save as and add .csl to the file: https://www.zotero.org/styles
E.g., APA 7th edition
---
csl : apa7.csl
---
Once you have a .bib file, you can easily change the style by selecting a different citation style language (CSL)
Over 10,000 in the Zotero style repository, just save as and add .csl to the file: https://www.zotero.org/styles
E.g., Vancouver
---
csl : vancouver.csl
---
Depending on the journal submission guidelines, you can change different features like:
Floating figures/tables in-text or at the end
Being kind to your reviewer and adding line numbers
Masking the manuscript and omitting author information
---
floatsintext : yes # Figures and tables floating or at the end?
linenumbers : yes # Add line numbers?
draft : no # Add draft watermark on every page?
mask : no # Hide author details for blind submission?
---
---
output : papaja::apa6_pdf
---
---
output : papaja::apa6_word
---
Using an effect size of d = 0.38, we aimed to recruit 149 participants per group for an independent samples t-test (\(\alpha\) = 0.05, power = 0.9).
Behind the scenes…
Using an effect size of d = 'r small_telescopes'
, we aimed to recruit 'r sample_size'
participants per group for an independent samples t-test ('$\alpha$'
= 'r alpha'
, power = 'r power'
).
papaja supports adding external images via knitr: http://frederikaust.com/papaja_man/reporting.html#figures
You display reproducible graphs from your code chunks
Behind the scenes…
mock_data %>%
ggplot(aes(x = Group, y = DV, fill = Group)) +
geom_violin() +
# remove the median line with fatten = NULL
geom_boxplot(width = .2,
fatten = NULL, colour = "black") +
stat_summary(fun = "mean", geom = "point") +
stat_summary(fun.data = "mean_se",
geom = "errorbar",
width = .1) +
scale_fill_viridis_d(option = "D", begin = 0.3, end = 0.6) +
theme_classic() +
theme(legend.position = "None") +
labs(x = "Lecture Group",
y = "Data skills test score (%)")
In the code chunk settings, you can do things like reference a caption and control the size of figures
papaja has some helper functions for creating APA style tables (which don’t play nicely with html…): http://frederikaust.com/papaja_man/reporting.html#tables
Group | Mean | SD | Min | Max |
---|---|---|---|---|
Error-Free | 49.94 | 11.03 | 14.91 | 75.80 |
Error-Full | 54.88 | 10.42 | 24.13 | 92.22 |
Note. Test scores could range from 0-100%
Behind the scenes…
# Calculate descriptives
mock_descriptives <- mock_data %>%
group_by(Group) %>%
summarise(Mean = mean(DV),
SD = sd(DV),
Min = min(DV),
Max = max(DV))
# papaja function to round and save as character
descriptives <- printnum(mock_descriptives)
# papaja function to creata APA table
apa_table(descriptives,
caption = "Descriptive statistics of...",
note = "Test scores could range from 0-100%")
papaja has helper functions for creating APA style result formatting: http://frederikaust.com/papaja_man/reporting.html#statistical-models-and-tests
“Consistent with our hypothesis, a Welch t-test shows that participants in the error-full group produced significantly higher data skills assignment scores than those in the error-free group, \(\Delta M = -4.94\), 95% CI \([-7.49, -2.40]\), \(t(272.23) = -3.82\), \(p < .001\).”
Behind the scenes…
“Consistent with our hypothesis, a Welch t-test shows that participants in the error-full group produced significantly higher data skills assignment scores than those in the error-free group, 'r apa_ttest'
.”
Saving objects
If you have code which takes a long time to run, you can save model objects:
Loading objects
You can then load in the objects quickly within a code chunk:
Slides and folder containing mock example available on Github: https://github.com/BartlettJE/papaja_demo
Full example from James’ recent publication: https://osf.io/gm4jr/
papaja manual (section 7 includes published manuscripts using papaja): http://frederikaust.com/papaja_man/