# American Institute of Mathematical Sciences

doi: 10.3934/fods.2021007

## The (homological) persistence of gerrymandering

 1 Department of Mathematics, Tufts University, Medford, MA 02155, USA 2 Department of Mathematics, Florida State University, Tallahassee, FL 32306, USA 3 Department of Mathematics and Statistics, UNC Greensboro, Greensboro, NC 27402, USA

* Corresponding author: Tom Needham

Received  August 2020 Revised  February 2021 Published  March 2021

Fund Project: The first and third author were supported by NSF Convergence Accelerator Grant No. OIA-1937095

We apply persistent homology, the dominant tool from the field of topological data analysis, to study electoral redistricting. We begin by combining geographic and electoral data from a districting plan to produce a persistence diagram. Then, to see beyond a particular plan and understand the possibilities afforded by the choices made in redistricting, we build methods to visualize and analyze large ensembles of alternative plans. Our detailed case studies use zero-dimensional homology (persistent components) of filtered graphs constructed from voting data to analyze redistricting in Pennsylvania and North Carolina. We find that, across large ensembles of partitions, the features cluster in the persistence diagrams in a way that corresponds strongly to geographic location, so that we can construct an average diagram for an ensemble, with each point identified with a geographical region. Using this localization lets us produce zonings of each state at Congressional, state Senate, and state House scales, show the regional non-uniformity of election shifts, and identify attributes of partitions that tend to correspond to partisan advantage.

The methods here are set up to be broadly applicable to the use of TDA on large ensembles of data. Many studies will benefit from interpretable summaries of large sets of samples or simulations, and the work here on localization and zoning will readily generalize to other partition problems, which are abundant in scientific applications. For the mathematically and politically rich problem of redistricting in particular, TDA provides a powerful and elegant summarization tool whose findings will be useful for practitioners.

Citation: Moon Duchin, Tom Needham, Thomas Weighill. The (homological) persistence of gerrymandering. Foundations of Data Science, doi: 10.3934/fods.2021007
##### References:

show all references

##### References:
This paper studies the aggregation of votes from geographic units to districts, which is intractable to study by complete enumeration of possibilities. We develop persistent homology techniques to summarize the districting problem. This figure shows a precinct map of Pennsylvania with key cities identified. The precincts are colored by voting results from the 2016 Presidential election (blue for Democratic, red for Republican). Current districts in Pennsylvania are shown with the same coloring. Note that the Congressional districts are balanced to within one-person population deviation (by 2010 Census count)
Abstract graph representations of three 18-district plans for Pennsylvania
The main method: turning a districting plan into a persistence diagram. The persistence diagram captures both the adjacency information and the vote shares for the districts in a plan
Pairing each plan in a districting ensemble with PRES16 vote data gives an ensemble of persistence diagrams. These plots show overlays of the diagram ensembles for $k = 18, 50, 203$ respectively. In the second row, we add the Fréchet mean $\mathcal{F}$ for each ensemble, shown in red, representing an "average" persistence diagram for the ensemble
Overlaid persistence diagrams with Fréchet mean (in red) for each of the NC ensembles (with PRES16 vote data)
Overlaid Fréchet means for various elections in PA. In all three levels of redistricting, the geographic localization is strong enough to label the Pittsburgh feature (circled)
Comparison for the weighted ensembles of Senate plans, with Democratic-favoring in blue and Republican-favoring in red. The simple weighting adjustment to the Markov chain has successfully created a difference of 3-4 seats between the ensemble means. Despite the significant change in the partisan outcome, we see only small movement in the Fréchet means
Geographical localization of Fréchet features in Pennsylvania Congressional and Senate plans $(k = 18, k = 50)$ with respect to PRES16 voting. The Senate features fairly clearly capture the medium-sized cities of Pennsylvania
We isolate the Fréchet means for two successive Presidential elections in PA. The mean for PRES12 has a third off-diagonal feature at Scranton, while PRES16 does not. A uniform partisan swing would show each red point displaced by $(.03, .03)$ relative to its paired green point, so the diagram summarizes the non-uniformity of the electoral shift
Comparison of point plots for the successive Fréchet features in the PA Senate ensembles that are biased for Democratic and Republican safe seats, respectively. Note the contrast between the rigidity of the Erie feature (similar placement for both ensembles) and the manipulability of the Harrisburg feature, though both anchor competitive districts
Maps of North Carolina. Top: results from the 2016 Presidential election (blue for Democratic, red for Republican). Bottom: North Carolina has a significant rural Black population, unlike Pennsylvania. Greenville and Wilson are the largest towns in the Northeast, but they are not the hubs of Black population
Geographical localization of Fréchet features in North Carolina Congressional and Senate plans $(k = 13, k = 50)$ with respect to PRES16 voting
Overlaid Fréchet means for various elections in NC. In all three levels of redistricting, the geographic localization is strong enough to label the second and third features, shown circled, as Charlotte (left) and Asheville (right)
Comparing two enacted Congressional plans for North Carolina with the ensemble mean, against the PRES16 vote pattern in all three cases
Comparison of point plots for the successive Fréchet features in the NC Senate ensembles that are biased for Democratic and Republican safe seats, respectively
A small change to districts can cause a large change in persistence. In this example (closely modeled on the enacted NC Congressional plan from 2012) we have changed the assignment of only one geographic unit (a precinct of 3, 300 people), and it removes a highly persistent off-diagonal feature
Maps of enacted plans before and after perturbation by 1000 flip steps; district adjacencies changed in both cases (e.g., blue/lavender for Congress, and purple/purple for state Senate districts in the Southeast). Nonetheless we see only a small change in persistence, which is typical for repetitions of this experiment
Effect of iterating small geographic perturbations (single-unit flip steps) on persistence. The red line indicates the average bottleneck distance over the ${100 \choose 2}$ plans in a ReCom ensemble $\mathcal{E} = \{P^{(1)}, \dots, P^{(100)}\}$, shown as a baseline for the distance between approximately independent plans. For each $i = 1, \dots, 100$, we run 1000 flip steps from $P^{(i)}$ and track the bottleneck distance to $P^{(i)}$. For $k = 18$, the persistence diagrams are seen to be stable. The same is true for most 50-district plans, but not all
Histograms of statistics of the dual graphs for the ensemble of 1000 18-district plans for Pennsylvania. (Density is the ratio of the number of edges to the number of possible edges, ${18 \choose 2} = 153$.)
Geographical localization of eleven Fréchet features in Pennsylvania House plans $(k = 203)$ with respect to PRES16 voting. These now pinpoint small cities, all the way down to Hermitage (population 16, 220)
. Point plots and heat maps for the successive Fréchet features in the PA Senate ensembles that are biased for Democratic and Republican safe seats, respectively. Note the contrast between the rigidity of the Erie feature (similar placement for both ensembles) and the manipulability of the Harrisburg feature, though both anchor competitive districts">Figure 21.  Expanding on Figure 10. Point plots and heat maps for the successive Fréchet features in the PA Senate ensembles that are biased for Democratic and Republican safe seats, respectively. Note the contrast between the rigidity of the Erie feature (similar placement for both ensembles) and the manipulability of the Harrisburg feature, though both anchor competitive districts
Geographical localization of ten Fréchet features in North Carolina Senate plans $(k = 50)$ with respect to PRES16 voting
Comparison of point plots and heat maps for the successive Fréchet features in the NC Senate ensembles that are biased for Democratic and Republican safe seats, respectively
Location and average number of districts in connected components of Democratic-won districts for Pennsylvania Congressional plans ($k = 18$) with respect to PRES16 voting
Location and average number of districts in connected components of Democratic-won districts for Pennsylvania state Senate plans ($k = 50$) with respect to PRES16 voting
Location and average number of districts in connected components of Democratic-won districts for Pennsylvania state House plans ($k = 203$) with respect to PRES16 voting
Location and average number of districts in connected components of Democratic-won districts for North Carolina Congressional plans ($k = 13$) with respect to PRES16 voting
Location and average number of districts in connected components of Democratic-won districts for North Carolina state Senate plans ($k = 50$) with respect to PRES16 voting
Location and average number of districts in connected components of Democratic-won districts for North Carolina state House plans ($k = 120$) with respect to PRES16 voting
Distribution of pairwise bottleneck distances in the Pennsylvania ReCom ensembles. The distances help contextualize the size of the displacement by Flip steps
Effect of iterating ReCom steps (merging and resplitting pairs of adjacent districts) on bottleneck distance. We choose 100 starting plans $P^{(i)}$ from our Pennsylvania ensembles. For each of $i = 1, \dots, 100$, we run 1000 ReCom steps (with no subsampling) from $P^{(i)}$ and track the bottleneck distance to $P^{(i)}$ on the $y$-axis
Overall Republican vote shares (with respect to two-party vote) for a range of recent statewide elections in Pennsylvania
 PRES12 PRES16 SEN10 SEN12 SEN16 GOV10 ATG12 ATG16 R % 47.29 50.35 51.05 45.44 50.72 54.52 42.58 48.57
 PRES12 PRES16 SEN10 SEN12 SEN16 GOV10 ATG12 ATG16 R % 47.29 50.35 51.05 45.44 50.72 54.52 42.58 48.57
Overall Republican vote shares (with respect to two-party vote) for a range of recent statewide elections in North Carolina
 PRES12 PRES16 SEN10 SEN14 SEN16 GOV12 GOV16 R % 51.08 51.98 56.02 49.17 53.02 55.87 49.95
 PRES12 PRES16 SEN10 SEN14 SEN16 GOV12 GOV16 R % 51.08 51.98 56.02 49.17 53.02 55.87 49.95
 [1] Ajay Jasra, Kody J. H. Law, Yaxian Xu. Markov chain simulation for multilevel Monte Carlo. Foundations of Data Science, 2021, 3 (1) : 27-47. doi: 10.3934/fods.2021004 [2] Cheng-Kai Hu, Fung-Bao Liu, Hong-Ming Chen, Cheng-Feng Hu. Network data envelopment analysis with fuzzy non-discretionary factors. Journal of Industrial & Management Optimization, 2021, 17 (4) : 1795-1807. doi: 10.3934/jimo.2020046 [3] Xiaoyi Zhou, Tong Ye, Tony T. Lee. Designing and analysis of a Wi-Fi data offloading strategy catering for the preference of mobile users. Journal of Industrial & Management Optimization, 2021  doi: 10.3934/jimo.2021038 [4] Eric Babson and Dmitry N. Kozlov. Topological obstructions to graph colorings. Electronic Research Announcements, 2003, 9: 61-68. [5] Antonio Rieser. A topological approach to spectral clustering. Foundations of Data Science, 2021, 3 (1) : 49-66. doi: 10.3934/fods.2021005 [6] Roberto Civino, Riccardo Longo. Formal security proof for a scheme on a topological network. Advances in Mathematics of Communications, 2021  doi: 10.3934/amc.2021009 [7] Ralf Hielscher, Michael Quellmalz. Reconstructing a function on the sphere from its means along vertical slices. Inverse Problems & Imaging, 2016, 10 (3) : 711-739. doi: 10.3934/ipi.2016018 [8] Juliang Zhang, Jian Chen. Information sharing in a make-to-stock supply chain. Journal of Industrial & Management Optimization, 2014, 10 (4) : 1169-1189. doi: 10.3934/jimo.2014.10.1169 [9] Davide La Torre, Simone Marsiglio, Franklin Mendivil, Fabio Privileggi. Public debt dynamics under ambiguity by means of iterated function systems on density functions. Discrete & Continuous Dynamical Systems - B, 2021  doi: 10.3934/dcdsb.2021070 [10] Liqiang Jin, Yanqing Liu, Yanyan Yin, Kok Lay Teo, Fei Liu. Design of probabilistic $l_2-l_\infty$ filter for uncertain Markov jump systems with partial information of the transition probabilities. Journal of Industrial & Management Optimization, 2021  doi: 10.3934/jimo.2021070 [11] Chenchen Wu, Wei Lv, Yujie Wang, Dachuan Xu. Approximation Algorithm for Spherical $k$-Means Problem with Penalty. Journal of Industrial & Management Optimization, 2021  doi: 10.3934/jimo.2021067 [12] Paul E. Anderson, Timothy P. Chartier, Amy N. Langville, Kathryn E. Pedings-Behling. The rankability of weighted data from pairwise comparisons. Foundations of Data Science, 2021, 3 (1) : 1-26. doi: 10.3934/fods.2021002 [13] Kehan Si, Zhenda Xu, Ka Fai Cedric Yiu, Xun Li. Open-loop solvability for mean-field stochastic linear quadratic optimal control problems of Markov regime-switching system. Journal of Industrial & Management Optimization, 2021  doi: 10.3934/jimo.2021074 [14] Liqin Qian, Xiwang Cao. Character sums over a non-chain ring and their applications. Advances in Mathematics of Communications, 2021  doi: 10.3934/amc.2020134 [15] Min Li, Jiahua Zhang, Yifan Xu, Wei Wang. Effects of disruption risk on a supply chain with a risk-averse retailer. Journal of Industrial & Management Optimization, 2021  doi: 10.3934/jimo.2021024 [16] Jan Prüss, Laurent Pujo-Menjouet, G.F. Webb, Rico Zacher. Analysis of a model for the dynamics of prions. Discrete & Continuous Dynamical Systems - B, 2006, 6 (1) : 225-235. doi: 10.3934/dcdsb.2006.6.225 [17] Habib Ammari, Josselin Garnier, Vincent Jugnon. Detection, reconstruction, and characterization algorithms from noisy data in multistatic wave imaging. Discrete & Continuous Dynamical Systems - S, 2015, 8 (3) : 389-417. doi: 10.3934/dcdss.2015.8.389 [18] Woocheol Choi, Youngwoo Koh. On the splitting method for the nonlinear Schrödinger equation with initial data in $H^1$. Discrete & Continuous Dynamical Systems, 2021, 41 (8) : 3837-3867. doi: 10.3934/dcds.2021019 [19] Sohana Jahan. Discriminant analysis of regularized multidimensional scaling. Numerical Algebra, Control & Optimization, 2021, 11 (2) : 255-267. doi: 10.3934/naco.2020024 [20] Wen-Bin Yang, Yan-Ling Li, Jianhua Wu, Hai-Xia Li. Dynamics of a food chain model with ratio-dependent and modified Leslie-Gower functional responses. Discrete & Continuous Dynamical Systems - B, 2015, 20 (7) : 2269-2290. doi: 10.3934/dcdsb.2015.20.2269

Impact Factor: