## Facilitating API lookup for novices learning data wrangling using thumbnail graphics

 1 University of Glasgow, UK 2 American University in Cairo, Egypt 3 University of Helsinki, Finland

* Corresponding author: l.sundin.1@research.gla.ac.uk

Received  July 2021 Revised  September 2021 Early access December 2021

With the rising demand for data science skills, the ability to wrangle data programmatically becomes a crucial barrier. In this paper, we discuss the centrality of API (application programming interface) lookup to data wrangling, and how an ontology-structured command menu could facilitate it. We design thumbnail graphics as visual alternatives to explaining data wrangling operations and use a survey to validate their quality. We furthermore predict that thumbnail graphics make the menu more navigable, improving lookup efficiency and performance. Our predictions are tested using Slice N Dice, an online data wrangling tutorial platform that collects learner activity. It includes both non-programmatic and programmatic data wrangling exercises. Participants from a multi-institutional sample (n = 200) were randomly assigned the tutorial either with or without thumbnail graphics. Our results show that thumbnail graphics reduce the need for clarifications, thereby assisting API lookup for novices learning data wrangling. We further present some negative results regarding performance gain and follow up with a discussion on why the differences are subtle and how they can be improved. Last but not least, we complement our statistical results with a qualitative study where we receive positive feedback from our participants on the design and helpfulness of the thumbnail graphics.

,15]">Figure 1.  Kelleher & Ichinko's Collection and Organization of Information for Learning (COIL) model [20,15]
The platform is split into three parts. Part 1 introduces the user to an ontology of data wrangling operations. Part 2 introduces programming. Part 3 contains 18 programmatic data wrangling exercises
The sidebar menu and a Part 1 operation card under the two conditions
A snippet from the sidebar menu. Calculate is one of five top-level categories, while the next level is split by data structure (e.g. dataframes). In TG, each leaf node has a thumbnail graphic
Three examples of graphical thumbnail, using color in different ways to convey operation semantics
Part 1 contains a series of exercises in which the user selects operations from the menu and drags it to the corresponding subgoal
Part 3 involves programming exercises. The user is guided by a list of subgoals, each of which has associated hints. The sidebar menu serves as a menu for looking up API documentation (shown above). In reality, the menu, subgoals and documentation are all tabs within the same sidebar panel
Participants were asked to rate their experience with Excel, Python, and R on a five-point Likert-scale (1 = Not at all to 5 = Advanced). The distribution among people who started and completed each part is illustrated and does not provide any visual evidence for differences that would reflect that more experiences participants are more likely to persevere
The distributions in the number of tooltip events per person, grouped by condition, in Part 1 (left) and Part 3 (right). Dashed lines indicate medians. In both cases, the TG group uses the tooltip significantly less often
Total number of menu clicks per participant in Part 3, grouped by condition. Dashed lines indicate medians
Total reading times of operation cards per person, grouped by condition. Dashed lines indicate medians. The TG group is quicker on average, but the difference is non-significant
Total time on task per person for Part 1 (left) and 3 (right), grouped by condition. Dashed lines indicate medians. The median differences are in both parts negligible
Number of incorrect attempts per person for Part 1 (left) and 3 (right), grouped by condition. Dashed lines indicate medians. In both parts, the TG group makes fewer incorrect attempts, but the difference is not significant
Responses to the evaluation survey item asking participants how helpful they found the thumbnail graphics and tooltips (1 = Not at all, 5 = Very much) in Part 1 (N = 187) and Part 3 (N = 115). This survey was given both after Part 1 (left) and Part 3 (right)
