Four things to prioritise when teaching students data literacy

Successful data science education requires a holistic approach that involves modifications in how we teach data sciences and reinforcement of lessons throughout the course, writes Ellen Bell

Ellen Bell's avatar
26 Mar 2024
bookmark plus
  • Top of page
  • Main text
  • More on this topic
Student analysing graphs and data sets on a large screen

Created in partnership with

Created in partnership with

University of East Anglia

You may also like

Foundations to lay when teaching computational and data skills
3 minute read
Image showing data processing and analytics

Developing data literacy skills requires practice and repeated exposure to different datasets and analytical methodologies. Although initiatives to increase practical experience in data analysis and interpretation are not hard to implement, if they are not well thought out, efforts can be hampered by a perception that the topic is dry, which can lead to poor engagement and subject avoidance.

To inclusively optimise student engagement, improve perception and, ultimately, maximise success within data sciences education, there are four priorities to consider:

  • To ensure that we teach students to use appropriate tools
  • To build a strong foundational understanding of data interpretation
  • To provide flexible learning resources
  • To consistently reinforce the message.

By optimising these four aspects of data science education, we will enhance students’ appreciation of what they are doing and why, and enable them to transfer these skills between modules and contexts.

Tools of the trade: transitioning away from point-and-click platforms for data analysis

Historically, practical application of data analysis and statistics theory in the biosciences has been taught with user-friendly point-and-click software. The argument for this approach is that a simple user interface allows students to focus on analysis workflow and result interpretation without getting stuck in implementation. The decision to use this type of software is intuitively sensible; however, the point-and-click interface means that students can click through instructions with little thought for the justification or interpretation of their analyses.

An alternative to point-and-click software is to use a coding language, such as R. A longstanding argument against teaching fundamentals of data analysis and statistics theory in this way has been that there are steep learning curves associated with learning both the coding language and statistical concepts. To learn both simultaneously could overwhelm the already intimidated students.

However, this need not be the case. Introducing R coding to students might be difficult in the short term, but it also creates a problem-solving element that many students come to enjoy. This also means that students are more likely to slow down, think about and engage with each analytical step as they perform it.

Because of this, students engage at a more meaningful level and begin to question the interpretation of their results, leading to deeper learning of statistical concepts and a more robust understanding.

Applied understanding: prioritising data visualisation over statistical theory

Statements such as “p < 0.05, which is significant” are relatively common in the work of students who are new to inferential statistics. At face value, this statement might be true, but it shows little engagement with the data. The direction and strength of the signal are frequently ignored, and the significance of this at the subject-specific level (in this case the biological relevance) rarely comes through in students’ writing.

We have addressed this p-value obsession by placing a heavy initial focus on data visualisation. While students are starting to get to grips with using R, their initial analytical tasks centre around how to present their data. Students learn how to select the best plot type for the hypotheses, data types and sample sizes that they are working with. We then hold class discussions around how these plots might be best interpreted, guiding students to think about what the data are showing and then what that means within the context of the dataset. By basing initial interpretation skills only on what students can see in their plots, we are taking emphasis away from what becomes a blind reliance on p-values, laying a solid foundation, on which students can build a stronger understanding of inferential statistics.

Resource flexibility: optimising the learning resources for all students

As already mentioned, the main criticism of teaching data literacy and analytical skills using R (or other coding languages) refers to the steep learning curves. The gradient of these can, however, be decreased with the application of flexible learning resources that allow students to work where and when they want to. Our first-year undergraduates all get access to a workbook, currently published in Bookdown and hosted on GitHub.

To remove barriers and maximise accessibility to these resources, students can easily modify the aesthetics (font size and style as well as background tone) and can download them or view them on a browser.

The workbook is structured with chapters that correspond to teaching weeks, so each week all students are expected to complete one chapter of the workbook (these should take an hour or so to complete). As chapters progress, the level of instruction of both what to do and how to do it (in R) decreases, so the requirement for students to write or edit R script incrementally increases.

A second flexible resource we use is Posit Cloud (a platform that hosts RStudio and other data science tools in a cloud environment). Students begin the unit by setting up a free account with it, meaning that they don’t run into errors in downloading and installing RStudio on their devices. In Posit Cloud, the file system is uniform for every workspace, and students can access their work anywhere, provided they have a browser installed and an internet connection.

Predictably, students frequently run into errors and require assistance troubleshooting their scripts. Help is always available in timetabled sessions, but to maximise flexibility and inclusivity they can also share their projects remotely with teaching staff through the workspace-sharing feature outside of timetabled sessions.

Reinforcement: supporting staff to create a consistent approach to teaching data literacy

We often observe that students struggle to carry over and apply knowledge of concepts or techniques between modules and years of study. In STEMM subjects, many modules throughout the course will require students to apply their analytical and interpretive skills to datasets. Different disciplines can have different specialised approaches to this, which means that most modules with data analysis or interpretive elements must run some teaching and support sessions for this content. To prevent contradictions or confusion, it is vital to identify a series of agreed core values and approaches, so that teaching data analysis and interpretation is consistent between modules and session leads.

As with any broad-scale update to content, this requires a champion and support for staff continuing professional development. We have approached this with staff “coding retreats” to help them become more confident with the uses of R and more familiar with how to teach first-year analysis and interpretation.

In addition, we support individual module organisers to modify existing taught content so that it aligns with students’ prior learning and data literacy skills while still meeting the discipline-specific learning outcomes. This approach means that, over time, all aspects of data science within the overall curriculum will align across modules and years of study, which will reinforce taught analytical concepts and approaches and produce graduates with stronger overall data literacy.

Successful data science education requires a holistic approach that involves modifications in how we teach data sciences and reinforcement of lessons throughout the course. Through our efforts, we have observed increased engagement with analytical methodologies and more meaningful interpretations of plots and statistical outputs from our students.

Ellen Bell is a lecturer in ecology and bioinformatics at the University of East Anglia.

If you would like advice and insight from academics and university staff delivered direct to your inbox each week, sign up for the Campus newsletter.


You may also like

sticky sign up

Register for free

and unlock a host of features on the THE site