25 Data Visualisation

In this research skills chapter, we focus on the role of data visualisation and formatting. We will explore why data visualisation is useful for both you and your reader, but how it is important to consider which types of visualisation transparently and effectively communicate your message. We will also cover some key APA formatting points for presenting tables and figures.

25.1 The role of data visualisation and formatting

First, data visualisation is important for you as the researcher. Exploratory data analysis should be the first step in your data analysis toolkit to see what your data look like, check for potential problems, and identify initial patterns.

To demonstrate why this is important, the dinosaur dozen shows how the same summary statistics (e.g., mean and standard deviation) can be identical but result from dramatically different data patterns. For example, creating a graph of a star, a cross, and yes, even a dinosaur can have the same underlying mean, standard deviation, and correlation. Yanai and Lercher (2020) found that students would not recognise an image of a gorilla recreated as data points when they were focused on rushing to apply a statistical test. Therefore, exploratory data analysis is important in the initial steps of data analysis to check what patterns are present and if the data properties are consistent with what you expect.

Second, data visualisation is important for your reader. Visualisations can communicate your message more effectively than a written summary or a wall of numbers, but it is crucial to balance efficiency with transparency. It can be easy to confuse or mislead people with poorly designed visualisations.

25.1.1 Data visualisation principles

Effective (or misleading) data visualisation is a whole field of study, so we will just outline a few key principles to keep in mind when designing your plots to help communicate your data as effectively and honestly as possible. If this is something that interests you, we recommend chapter one - looking at data - from Data Visualisation by Healy (2018) and the comprehensive review article The Science of Visual Data Communication: What Works by Franconeri et al. (2021) for further reading.

25.1.1.1 Visual illusions

Your visual system is pretty powerful and allows you to rapidly search for patterns in visual information. But for the same reasons that make graphs effective at communicating information, design features can also play tricks on the visual system and create illusions.

Within the bar bias

Bar plots are extremely common to see in journal articles and the media as they are easy to interpret. They are designed to communicate frequencies or percentages of observations where the top of each bar corresponds to the frequency of each variable or level you are plotting. Bar plots visualise frequencies well but they should not be used to summarise averages of continuous data, such as the mean of different groups. As the height of the bar only corresponds with the value of the mean, it hides other information like the distribution of observations behind the mean, and perception research shows people mistakenly think values within the bars are more likely than values outside the bars.

For example, the bars in A below look identical when you just plot the means, but if you superimpose the data points in B, it shows a very different underlying pattern. This shows why it is important to consider what type of plot would effectively communicate the data you are working with and it is best to use something like a violin-boxplot for continuous outcomes.

Demonstrating within the bar bias for continuous outcomes.

Y-axis truncation

One of the most powerful and most common illusions that can mislead people is truncated or non-zero axes, where (typically) the y-axis is shortened to zoom in on a smaller range of values. Franconeri et al. (2021) discuss studies that show people overestimate differences between two groups when you tell them the truncation is present and even if you get people to manually enter the values from each bar. For example, the bar plot below shows the same difference across the full 0 to 100 scale (A), then truncated between 45 and 60 to highlight the difference (B).

This is another area where it takes time and experience to recognise where y-axis trunctation is misleading or not. Although we are not trying to turn this into a bar plot witch hunt, as a general rule of thumb, its usually not a good idea to truncate the y-axis of a bar plot as they are meant to display frequencies with a logical zero point. On the other hand, its acceptable to truncate line plots as they are designed to show changes across time.

Colour-vision impairments

One important design feature is how you will distinguish between different elements of your graph. Colour can be used to effectively code different groups or conditions, but many analysts do not think carefully about colour combinations. Colour-vision impairments affect a significant number of people, so it is important to consider whether someone who is colour blind could distinguish between groups/conditions and understand the message you are trying to communicate. In the plot below, on the top (A) is a scatterplot using green and purple, which can look identical for some types of colourblindness. On the bottom (B) is the same scatterplot using a colour blind friendly palette of greens.

Differences in colour palettes to help colour blindness.

25.1.1.2 Highlight comparisons of interest

If you create plots with multiple variables, you will have control over which variable you place on the x-axis and which you place on the legend. It is important to think about which comparison you want your readers to make. Comparing features is a serial process which takes time and working memory, so your readers’ eyes must move between the different components and consider which are higher or lower as they move around the graph. This means you should make it easier for your readers to make the key comparisons by using connectivity and proximity.

In the graph below, there are two ways of presenting the same data. In plot A, condition is on the x-axis while language is a grouping variable. In plot B, these are flipped with language on the x-axis and condition as a grouping variable. When creating this plot, you would need to consider whether you want to draw people’s attention to the comparison between language groups or between the word/non-word conditions.

If you wanted to emphasise the difference between conditions, then plot A forces people to shift their attention back and forth between non-word and word conditions across the whole plot. Compare this to plot B where the two conditions are placed side by side. In this version, it is much easier to compare the two conditions as they are proximal to each other. If you wanted to emphasise language, then the opposite would apply with plot A having language proximal to each other.

Controlling variable order to highlight comparisons of interest.

25.1.1.3 Guide viewers to your conceptual message

Finally, it is important to respect associations between visualisation designs and data types. When interpreting plots, people rely on schemas to interpret the information they are presented. These associations are relatively universal like top vs bottom for the position (closer to the top means a greater value) and light vs dark for luminescence (darker colours on a light background means a greater value). Similarly, plot types are designed to work with certain combinations of data, like a bar plot uses categorical variables for bars and the bar height shows frequencies or your outcome. When you go against these schemas, it can be deeply confusing for your reader.

This is another area where subject knowledge is important as some disciplines have their own conventions which can change over time. For example, in EEG research (Electroencephalography - where brain activity is measured with electrodes stuck to the scalp) it was conventional to plot amplitude with negative values at the top and positives values at the bottom (plot A below). This can look a little odd to those unfamiliar with EEG data and breaks conventional understanding that top means higher numbers. Over time though, this convention has changed and more studies report amplitude with positive values at the top (plot B below). This shows how conventions change over time and it is important to keep your audience in mind to make your data visualisation as accessible and intuitive as possible.

Thinking about your conceptual message for how the data should be plotted and honour schemas.

25.1.2 Formatting tables

After working through some data visualisation principles, it is time to highlight key APA formatting details if you want to include them in your report.

Tables are designed to efficiently communicate a large amount of data, when it would take too long to outline in the main text. So, the first key consideration of a table is whether you need one at all. If you are only reporting a few numbers or you include them all in the main text anyway, you do not need a table.

If you do have enough information to communicate, then an APA formatted table looks like the figure below and includes the following features:

It should be numbered sequentially for the order you place it in the report (e.g., Table 1 comes before Table 2).
It includes an informative title in italics to explain to the reader what information it contains.
The row and column headers are informative, so the reader can understand the table in isolation.
There are no vertical lines and you limit borders to those needed for clarity.
If you need to provide further information, you can include a note below the table, such as if you need to define abbreviations. This is not always applicable.

For full details, the APA style website has a great page on the key features of an APA formatted table.

25.1.3 Formatting figures

In contrast to tables, figures are always useful to help communicate your findings and supplement any statistics you report. We recommend including a figure for each main analysis or component of the analysis.

Note they are called figures and not numbered plots or graphs in APA style. Figure is a more general term as it could contain a plot, a screenshot, or a drawing depending on what you need to communicate your reader. Essentially, it could be anything other than a table, but most of the time in a psychology report you will be communicating a plot.

An APA formatted figure looks like below and includes the following features:

Figures should be numbered sequentially for the order you place them in the report (e.g., Figure 1 comes before Figure 2), and they are number separately to tables (i.e., you can have both a Table 1 and Figure 1).
It includes an informative title in italics to explain to the reader what information it contains.
You should edit the axis labels so that readers can understand the figure in isolation.
If there are specific features like error bars, you should define what they represent in the title or note.
If you need to provide supplementary information, you can include a note below the figure, such as if you need to define abbreviations, but this is not always applicable.

For full details, the APA style website has a great page on the key features of an APA formatted figure.

Top tip

Top tip: If you use Word to write your assignments, figure placement can be a nightmare and become separated from the title. So, if you insert a 1x2 table, you can add the figure number and title in the first row and figure image in the second row, then make the borders transparent. This means the title and figure will always be connected.