Addressing confirmation bias in middle school data science education

  • More research is needed involving middle school students' engagement in the statistical problem-solving process, particularly the beginning process steps: formulate a question and make a plan to collect data/consider the data. Further, the increased availability of large-scale electronically accessible data sets is an untapped area of study. This interpretive study examined middle school students' understanding of statistical concepts involved in making a plan to collect data to answer a statistical question within a social issue context using data available on the internet. Student artifacts, researcher notes, and audio and video recordings from nine groups of 20 seventh-grade students in two gifted education pull-out classes at a suburban middle school were used to answer the study research questions. Data were analyzed using a priori codes from previously developed frameworks and by using an inductive approach to find themes.

    Three themes that emerged from data related to confirmation bias. Some middle school students held preconceptions about the social issues they chose to study that biased their statistical questions. This in turn influenced the sources of data students used to answer their questions. Confirmation bias is a serious issue that is exacerbated due to endless sources of data electronically available. We argue that this type of bias should be addressed early in students' educational experiences. Based on the findings from this study, we offer recommendations for future research and implications for statistics and data science education.

  • Table 1.  Groups' Question and Study Design Development

    Group (Size) Question Versions Initial Question Final Question Study Design for Final Question
    1 (2) 3 How does overpopulation affect quality of life? How does overpopulation affect mortality rate? Most and least populated country in each continent in 2016 from World Bank, Statistics Times, and Worldometers
    2 (2) 10 What affects poverty? [Do] race and ethnicity of people affect the average yearly income they make? Race/ethnicity and highest and lowest income categories in 2016 from U.S. Census
    3 (2) 8 How many people are living in poverty? Is the rate of children being born into poverty decreasing in the United States of America? Child poverty in U.S. in 2000 – 2016 from KidsCount
    4 (2) 4 How many people are eating based on MyPlate regulations in the United States of America per day? What percentage of students get a free or reduced lunches [sic] per county (in Ohio)? Six largest counties in Ohio in 2017 from CDC website
    5 (3) 3 How many cats and dogs are abused each year in the USA? What percent of the animal abuse cases in New York City are neglected? Animal abuse cases reported in 2017-2108 from NYC Open Data
    6 (2) 6 What are the ages of people in poverty? What percent of people in the U.S. are in poverty per state? Poverty per state in 2016 from KidsCount
    7 (2) 11 How are animals affected by climate change? How has average precipitation within the U.S.A. changed over the course of 17 years? (2000-2017)? Average precipitation in 2000–2017 from National Centers for Environmental Information (NCEI)
    8 (3) 6 What is the current poverty rate in the United States? What is the income to poverty ratio for people of different age groups? Income to poverty ratios by age group in 2016 from U.S. Census
    9 (2) 4 What was the most dangerous modern war? What is the total number of civilian deaths in Afghanistan that were [sic] killed by ISAF from 2010 to 2013? Subset of CIVCAS database provided in Science Magazine
    Table 2.  Pertinent Data Codes and Themes

    Type Code/Theme
    In vivo "reliable data"
    Emergent Sampling from the extremes Preconceptions
    Table 3.  Students' Meanings of the Term Reliable Data Source

    Reliable data come from... Count (Groups)
    Government organization or from the teacher 4 (1, 6, 7, 9)
    Varied/multiple sources 4 (2, 3, 7, 9)
    No explanation given 1 (8)
    Note: Groups 7 and 9 gave different definitions at different times.
