# American Institute of Mathematical Sciences

doi: 10.3934/fods.2021007
Online First

Online First articles are published articles within a journal that have not yet been assigned to a formal issue. This means they do not yet have a volume number, issue number, or page numbers assigned to them, however, they can still be found and cited using their DOI (Digital Object Identifier). Online First publication benefits the research community by making new scientific discoveries known as quickly as possible.

Readers can access Online First articles via the “Online First” tab for the selected journal.

## The (homological) persistence of gerrymandering

 1 Department of Mathematics, Tufts University, Medford, MA 02155, USA 2 Department of Mathematics, Florida State University, Tallahassee, FL 32306, USA 3 Department of Mathematics and Statistics, UNC Greensboro, Greensboro, NC 27402, USA

* Corresponding author: Tom Needham

Received  August 2020 Revised  February 2021 Early access March 2021

Fund Project: The first and third author were supported by NSF Convergence Accelerator Grant No. OIA-1937095

We apply persistent homology, the dominant tool from the field of topological data analysis, to study electoral redistricting. We begin by combining geographic and electoral data from a districting plan to produce a persistence diagram. Then, to see beyond a particular plan and understand the possibilities afforded by the choices made in redistricting, we build methods to visualize and analyze large ensembles of alternative plans. Our detailed case studies use zero-dimensional homology (persistent components) of filtered graphs constructed from voting data to analyze redistricting in Pennsylvania and North Carolina. We find that, across large ensembles of partitions, the features cluster in the persistence diagrams in a way that corresponds strongly to geographic location, so that we can construct an average diagram for an ensemble, with each point identified with a geographical region. Using this localization lets us produce zonings of each state at Congressional, state Senate, and state House scales, show the regional non-uniformity of election shifts, and identify attributes of partitions that tend to correspond to partisan advantage.

The methods here are set up to be broadly applicable to the use of TDA on large ensembles of data. Many studies will benefit from interpretable summaries of large sets of samples or simulations, and the work here on localization and zoning will readily generalize to other partition problems, which are abundant in scientific applications. For the mathematically and politically rich problem of redistricting in particular, TDA provides a powerful and elegant summarization tool whose findings will be useful for practitioners.

Citation: Moon Duchin, Tom Needham, Thomas Weighill. The (homological) persistence of gerrymandering. Foundations of Data Science, doi: 10.3934/fods.2021007
##### References:

show all references

##### References:
This paper studies the aggregation of votes from geographic units to districts, which is intractable to study by complete enumeration of possibilities. We develop persistent homology techniques to summarize the districting problem. This figure shows a precinct map of Pennsylvania with key cities identified. The precincts are colored by voting results from the 2016 Presidential election (blue for Democratic, red for Republican). Current districts in Pennsylvania are shown with the same coloring. Note that the Congressional districts are balanced to within one-person population deviation (by 2010 Census count)
Abstract graph representations of three 18-district plans for Pennsylvania
The main method: turning a districting plan into a persistence diagram. The persistence diagram captures both the adjacency information and the vote shares for the districts in a plan
Pairing each plan in a districting ensemble with PRES16 vote data gives an ensemble of persistence diagrams. These plots show overlays of the diagram ensembles for $k = 18, 50, 203$ respectively. In the second row, we add the Fréchet mean $\mathcal{F}$ for each ensemble, shown in red, representing an "average" persistence diagram for the ensemble
Overlaid persistence diagrams with Fréchet mean (in red) for each of the NC ensembles (with PRES16 vote data)
Overlaid Fréchet means for various elections in PA. In all three levels of redistricting, the geographic localization is strong enough to label the Pittsburgh feature (circled)
Comparison for the weighted ensembles of Senate plans, with Democratic-favoring in blue and Republican-favoring in red. The simple weighting adjustment to the Markov chain has successfully created a difference of 3-4 seats between the ensemble means. Despite the significant change in the partisan outcome, we see only small movement in the Fréchet means
Geographical localization of Fréchet features in Pennsylvania Congressional and Senate plans $(k = 18, k = 50)$ with respect to PRES16 voting. The Senate features fairly clearly capture the medium-sized cities of Pennsylvania
We isolate the Fréchet means for two successive Presidential elections in PA. The mean for PRES12 has a third off-diagonal feature at Scranton, while PRES16 does not. A uniform partisan swing would show each red point displaced by $(.03, .03)$ relative to its paired green point, so the diagram summarizes the non-uniformity of the electoral shift
Comparison of point plots for the successive Fréchet features in the PA Senate ensembles that are biased for Democratic and Republican safe seats, respectively. Note the contrast between the rigidity of the Erie feature (similar placement for both ensembles) and the manipulability of the Harrisburg feature, though both anchor competitive districts
Maps of North Carolina. Top: results from the 2016 Presidential election (blue for Democratic, red for Republican). Bottom: North Carolina has a significant rural Black population, unlike Pennsylvania. Greenville and Wilson are the largest towns in the Northeast, but they are not the hubs of Black population
Geographical localization of Fréchet features in North Carolina Congressional and Senate plans $(k = 13, k = 50)$ with respect to PRES16 voting
Overlaid Fréchet means for various elections in NC. In all three levels of redistricting, the geographic localization is strong enough to label the second and third features, shown circled, as Charlotte (left) and Asheville (right)
Comparing two enacted Congressional plans for North Carolina with the ensemble mean, against the PRES16 vote pattern in all three cases
Comparison of point plots for the successive Fréchet features in the NC Senate ensembles that are biased for Democratic and Republican safe seats, respectively
A small change to districts can cause a large change in persistence. In this example (closely modeled on the enacted NC Congressional plan from 2012) we have changed the assignment of only one geographic unit (a precinct of 3, 300 people), and it removes a highly persistent off-diagonal feature
Maps of enacted plans before and after perturbation by 1000 flip steps; district adjacencies changed in both cases (e.g., blue/lavender for Congress, and purple/purple for state Senate districts in the Southeast). Nonetheless we see only a small change in persistence, which is typical for repetitions of this experiment
Effect of iterating small geographic perturbations (single-unit flip steps) on persistence. The red line indicates the average bottleneck distance over the ${100 \choose 2}$ plans in a ReCom ensemble $\mathcal{E} = \{P^{(1)}, \dots, P^{(100)}\}$, shown as a baseline for the distance between approximately independent plans. For each $i = 1, \dots, 100$, we run 1000 flip steps from $P^{(i)}$ and track the bottleneck distance to $P^{(i)}$. For $k = 18$, the persistence diagrams are seen to be stable. The same is true for most 50-district plans, but not all
Histograms of statistics of the dual graphs for the ensemble of 1000 18-district plans for Pennsylvania. (Density is the ratio of the number of edges to the number of possible edges, ${18 \choose 2} = 153$.)
Geographical localization of eleven Fréchet features in Pennsylvania House plans $(k = 203)$ with respect to PRES16 voting. These now pinpoint small cities, all the way down to Hermitage (population 16, 220)
. Point plots and heat maps for the successive Fréchet features in the PA Senate ensembles that are biased for Democratic and Republican safe seats, respectively. Note the contrast between the rigidity of the Erie feature (similar placement for both ensembles) and the manipulability of the Harrisburg feature, though both anchor competitive districts">Figure 21.  Expanding on Figure 10. Point plots and heat maps for the successive Fréchet features in the PA Senate ensembles that are biased for Democratic and Republican safe seats, respectively. Note the contrast between the rigidity of the Erie feature (similar placement for both ensembles) and the manipulability of the Harrisburg feature, though both anchor competitive districts
Geographical localization of ten Fréchet features in North Carolina Senate plans $(k = 50)$ with respect to PRES16 voting
Comparison of point plots and heat maps for the successive Fréchet features in the NC Senate ensembles that are biased for Democratic and Republican safe seats, respectively
Location and average number of districts in connected components of Democratic-won districts for Pennsylvania Congressional plans ($k = 18$) with respect to PRES16 voting
Location and average number of districts in connected components of Democratic-won districts for Pennsylvania state Senate plans ($k = 50$) with respect to PRES16 voting
Location and average number of districts in connected components of Democratic-won districts for Pennsylvania state House plans ($k = 203$) with respect to PRES16 voting
Location and average number of districts in connected components of Democratic-won districts for North Carolina Congressional plans ($k = 13$) with respect to PRES16 voting
Location and average number of districts in connected components of Democratic-won districts for North Carolina state Senate plans ($k = 50$) with respect to PRES16 voting
Location and average number of districts in connected components of Democratic-won districts for North Carolina state House plans ($k = 120$) with respect to PRES16 voting
Distribution of pairwise bottleneck distances in the Pennsylvania ReCom ensembles. The distances help contextualize the size of the displacement by Flip steps
Effect of iterating ReCom steps (merging and resplitting pairs of adjacent districts) on bottleneck distance. We choose 100 starting plans $P^{(i)}$ from our Pennsylvania ensembles. For each of $i = 1, \dots, 100$, we run 1000 ReCom steps (with no subsampling) from $P^{(i)}$ and track the bottleneck distance to $P^{(i)}$ on the $y$-axis
Overall Republican vote shares (with respect to two-party vote) for a range of recent statewide elections in Pennsylvania
 PRES12 PRES16 SEN10 SEN12 SEN16 GOV10 ATG12 ATG16 R % 47.29 50.35 51.05 45.44 50.72 54.52 42.58 48.57
 PRES12 PRES16 SEN10 SEN12 SEN16 GOV10 ATG12 ATG16 R % 47.29 50.35 51.05 45.44 50.72 54.52 42.58 48.57
Overall Republican vote shares (with respect to two-party vote) for a range of recent statewide elections in North Carolina
 PRES12 PRES16 SEN10 SEN14 SEN16 GOV12 GOV16 R % 51.08 51.98 56.02 49.17 53.02 55.87 49.95
 PRES12 PRES16 SEN10 SEN14 SEN16 GOV12 GOV16 R % 51.08 51.98 56.02 49.17 53.02 55.87 49.95
 [1] J. Alberto Conejero, Marko Kostić, Pedro J. Miana, Marina Murillo-Arcila. Distributionally chaotic families of operators on Fréchet spaces. Communications on Pure & Applied Analysis, 2016, 15 (5) : 1915-1939. doi: 10.3934/cpaa.2016022 [2] J. Leonel Rocha, Sandra M. Aleixo. An extension of Gompertzian growth dynamics: Weibull and Fréchet models. Mathematical Biosciences & Engineering, 2013, 10 (2) : 379-398. doi: 10.3934/mbe.2013.10.379 [3] George Siopsis. Quantum topological data analysis with continuous variables. Foundations of Data Science, 2019, 1 (4) : 419-431. doi: 10.3934/fods.2019017 [4] Tyrus Berry, Timothy Sauer. Consistent manifold representation for topological data analysis. Foundations of Data Science, 2019, 1 (1) : 1-38. doi: 10.3934/fods.2019001 [5] Éder Rítis Aragão Costa. An extension of the concept of exponential dichotomy in Fréchet spaces which is stable under perturbation. Communications on Pure & Applied Analysis, 2019, 18 (2) : 845-868. doi: 10.3934/cpaa.2019041 [6] Olli-Pekka Tossavainen, Daniel B. Work. Markov Chain Monte Carlo based inverse modeling of traffic flows using GPS data. Networks & Heterogeneous Media, 2013, 8 (3) : 803-824. doi: 10.3934/nhm.2013.8.803 [7] Samuel N. Cohen, Lukasz Szpruch. On Markovian solutions to Markov Chain BSDEs. Numerical Algebra, Control & Optimization, 2012, 2 (2) : 257-269. doi: 10.3934/naco.2012.2.257 [8] Vladislav Kruglov, Dmitry Malyshev, Olga Pochinka. Topological classification of $Ω$-stable flows on surfaces by means of effectively distinguishable multigraphs. Discrete & Continuous Dynamical Systems, 2018, 38 (9) : 4305-4327. doi: 10.3934/dcds.2018188 [9] Ajay Jasra, Kody J. H. Law, Yaxian Xu. Markov chain simulation for multilevel Monte Carlo. Foundations of Data Science, 2021, 3 (1) : 27-47. doi: 10.3934/fods.2021004 [10] Baolan Yuan, Wanjun Zhang, Yubo Yuan. A Max-Min clustering method for $k$-means algorithm of data clustering. Journal of Industrial & Management Optimization, 2012, 8 (3) : 565-575. doi: 10.3934/jimo.2012.8.565 [11] Wolfgang Krieger, Kengo Matsumoto. Markov-Dyck shifts, neutral periodic points and topological conjugacy. Discrete & Continuous Dynamical Systems, 2019, 39 (1) : 1-18. doi: 10.3934/dcds.2019001 [12] L. Cioletti, E. Silva, M. Stadlbauer. Thermodynamic formalism for topological Markov chains on standard Borel spaces. Discrete & Continuous Dynamical Systems, 2019, 39 (11) : 6277-6298. doi: 10.3934/dcds.2019274 [13] Marcelo Sobottka. Right-permutative cellular automata on topological Markov chains. Discrete & Continuous Dynamical Systems, 2008, 20 (4) : 1095-1109. doi: 10.3934/dcds.2008.20.1095 [14] Mark F. Demers, Christopher J. Ianzano, Philip Mayer, Peter Morfe, Elizabeth C. Yoo. Limiting distributions for countable state topological Markov chains with holes. Discrete & Continuous Dynamical Systems, 2017, 37 (1) : 105-130. doi: 10.3934/dcds.2017005 [15] Jingzhi Tie, Qing Zhang. An optimal mean-reversion trading rule under a Markov chain model. Mathematical Control & Related Fields, 2016, 6 (3) : 467-488. doi: 10.3934/mcrf.2016012 [16] Ralf Banisch, Carsten Hartmann. A sparse Markov chain approximation of LQ-type stochastic control problems. Mathematical Control & Related Fields, 2016, 6 (3) : 363-389. doi: 10.3934/mcrf.2016007 [17] Kun Fan, Yang Shen, Tak Kuen Siu, Rongming Wang. On a Markov chain approximation method for option pricing with regime switching. Journal of Industrial & Management Optimization, 2016, 12 (2) : 529-541. doi: 10.3934/jimo.2016.12.529 [18] Sho Matsumoto, Jonathan Novak. A moment method for invariant ensembles. Electronic Research Announcements, 2018, 25: 60-71. doi: 10.3934/era.2018.25.007 [19] Mario Roy, Mariusz Urbański. Multifractal analysis for conformal graph directed Markov systems. Discrete & Continuous Dynamical Systems, 2009, 25 (2) : 627-650. doi: 10.3934/dcds.2009.25.627 [20] Kengo Matsumoto. Continuous orbit equivalence of topological Markov shifts and KMS states on Cuntz–Krieger algebras. Discrete & Continuous Dynamical Systems, 2020, 40 (10) : 5897-5909. doi: 10.3934/dcds.2020251

Impact Factor:

## Tools

Article outline

Figures and Tables