PRES12 | PRES16 | SEN10 | SEN12 | SEN16 | GOV10 | ATG12 | ATG16 | |
R % | 47.29 | 50.35 | 51.05 | 45.44 | 50.72 | 54.52 | 42.58 | 48.57 |
We apply persistent homology, the dominant tool from the field of topological data analysis, to study electoral redistricting. We begin by combining geographic and electoral data from a districting plan to produce a persistence diagram. Then, to see beyond a particular plan and understand the possibilities afforded by the choices made in redistricting, we build methods to visualize and analyze large ensembles of alternative plans. Our detailed case studies use zero-dimensional homology (persistent components) of filtered graphs constructed from voting data to analyze redistricting in Pennsylvania and North Carolina. We find that, across large ensembles of partitions, the features cluster in the persistence diagrams in a way that corresponds strongly to geographic location, so that we can construct an average diagram for an ensemble, with each point identified with a geographical region. Using this localization lets us produce zonings of each state at Congressional, state Senate, and state House scales, show the regional non-uniformity of election shifts, and identify attributes of partitions that tend to correspond to partisan advantage.
The methods here are set up to be broadly applicable to the use of TDA on large ensembles of data. Many studies will benefit from interpretable summaries of large sets of samples or simulations, and the work here on localization and zoning will readily generalize to other partition problems, which are abundant in scientific applications. For the mathematically and politically rich problem of redistricting in particular, TDA provides a powerful and elegant summarization tool whose findings will be useful for practitioners.
Citation: |
Figure 1. This paper studies the aggregation of votes from geographic units to districts, which is intractable to study by complete enumeration of possibilities. We develop persistent homology techniques to summarize the districting problem. This figure shows a precinct map of Pennsylvania with key cities identified. The precincts are colored by voting results from the 2016 Presidential election (blue for Democratic, red for Republican). Current districts in Pennsylvania are shown with the same coloring. Note that the Congressional districts are balanced to within one-person population deviation (by 2010 Census count)
Figure 4. Pairing each plan in a districting ensemble with PRES16 vote data gives an ensemble of persistence diagrams. These plots show overlays of the diagram ensembles for $ k = 18, 50, 203 $ respectively. In the second row, we add the Fréchet mean $ \mathcal{F} $ for each ensemble, shown in red, representing an "average" persistence diagram for the ensemble
Figure 7. Comparison for the weighted ensembles of Senate plans, with Democratic-favoring in blue and Republican-favoring in red. The simple weighting adjustment to the Markov chain has successfully created a difference of 3-4 seats between the ensemble means. Despite the significant change in the partisan outcome, we see only small movement in the Fréchet means
Figure 9. We isolate the Fréchet means for two successive Presidential elections in PA. The mean for PRES12 has a third off-diagonal feature at Scranton, while PRES16 does not. A uniform partisan swing would show each red point displaced by $ (.03, .03) $ relative to its paired green point, so the diagram summarizes the non-uniformity of the electoral shift
Figure 10. Comparison of point plots for the successive Fréchet features in the PA Senate ensembles that are biased for Democratic and Republican safe seats, respectively. Note the contrast between the rigidity of the Erie feature (similar placement for both ensembles) and the manipulability of the Harrisburg feature, though both anchor competitive districts
Figure 11. Maps of North Carolina. Top: results from the 2016 Presidential election (blue for Democratic, red for Republican). Bottom: North Carolina has a significant rural Black population, unlike Pennsylvania. Greenville and Wilson are the largest towns in the Northeast, but they are not the hubs of Black population
Figure 16. A small change to districts can cause a large change in persistence. In this example (closely modeled on the enacted NC Congressional plan from 2012) we have changed the assignment of only one geographic unit (a precinct of 3, 300 people), and it removes a highly persistent off-diagonal feature
Figure 17. Maps of enacted plans before and after perturbation by 1000 flip steps; district adjacencies changed in both cases (e.g., blue/lavender for Congress, and purple/purple for state Senate districts in the Southeast). Nonetheless we see only a small change in persistence, which is typical for repetitions of this experiment
Figure 18. Effect of iterating small geographic perturbations (single-unit flip steps) on persistence. The red line indicates the average bottleneck distance over the $ {100 \choose 2} $ plans in a ReCom ensemble $ \mathcal{E} = \{P^{(1)}, \dots, P^{(100)}\} $, shown as a baseline for the distance between approximately independent plans. For each $ i = 1, \dots, 100 $, we run 1000 flip steps from $ P^{(i)} $ and track the bottleneck distance to $ P^{(i)} $. For $ k = 18 $, the persistence diagrams are seen to be stable. The same is true for most 50-district plans, but not all
Figure 21. Expanding on Figure 10. Point plots and heat maps for the successive Fréchet features in the PA Senate ensembles that are biased for Democratic and Republican safe seats, respectively. Note the contrast between the rigidity of the Erie feature (similar placement for both ensembles) and the manipulability of the Harrisburg feature, though both anchor competitive districts
Figure 31. Effect of iterating ReCom steps (merging and resplitting pairs of adjacent districts) on bottleneck distance. We choose 100 starting plans $ P^{(i)} $ from our Pennsylvania ensembles. For each of $ i = 1, \dots, 100 $, we run 1000 ReCom steps (with no subsampling) from $ P^{(i)} $ and track the bottleneck distance to $ P^{(i)} $ on the $ y $-axis
Table 1. Overall Republican vote shares (with respect to two-party vote) for a range of recent statewide elections in Pennsylvania
PRES12 | PRES16 | SEN10 | SEN12 | SEN16 | GOV10 | ATG12 | ATG16 | |
R % | 47.29 | 50.35 | 51.05 | 45.44 | 50.72 | 54.52 | 42.58 | 48.57 |
Table 2. Overall Republican vote shares (with respect to two-party vote) for a range of recent statewide elections in North Carolina
PRES12 | PRES16 | SEN10 | SEN14 | SEN16 | GOV12 | GOV16 | |
R % | 51.08 | 51.98 | 56.02 | 49.17 | 53.02 | 55.87 | 49.95 |
[1] | T. Abrishami, N. Guillen, P. Rule, Z. Schutzman, J. Solomon, T. Weighill and S. Wu, Geometry of graph partitions via optimal transport, SIAM J. Sci. Comput., 42 (2020), A3340-A3366. doi: 10.1137/19M1295258. |
[2] | H. Adams, T. Emerson, M. Kirby, R. Neville and C. Peterson, et al., Persistence images: A stable vector representation of persistent homology, J. Mach. Learn. Res., 18 (2017), 35pp. |
[3] | P. K. Agarwal, K. Fox, A. Nath, A. Sidiropoulos and Y. Wang, Computing the Gromov-Hausdorff distance for metric trees, ACM Trans. Algorithms, 14 (2018), 20pp. doi: 10.1145/3185466. |
[4] | P. Bajardi, M. Delfino, A. Panisson, G. Petri and M. Tizzoni, Unveiling patterns of international communities in a global city using mobile phone data, EPJ Data Science, 4 (2015). doi: 10.1140/epjds/s13688-015-0041-5. |
[5] | A. Banman and L. Ziegelmeier, Mind the gap: {A} study in global development through persistent homology, in Research in Computational Topology, Assoc. Women Math. Ser., 13, Springer, Cham, 2018,125-144. doi: 10.1007/978-3-319-89593-2_8. |
[6] | P. Bendich, J. S. Marron, E. Miller, A. Pieloch and S. Skwerer, Persistent homology analysis of brain artery trees, Ann. Appl. Stat., 10 (2016), 198-218. doi: 10.1214/15-AOAS886. |
[7] | R. Brüel-Gabrielsson, B. J. Nelson, A. Dwaraknath, P. Skraba, L. J. Guibas and G. Carlsson, A topology layer for machine learning, preprint, arXiv: 1905.12200. |
[8] | P. Bubenik, Statistical topological data analysis using persistence landscapes, J. Mach. Learn. Res., 16 (2015), 77-102. |
[9] | P. Bubenik, J. Scott and D. Stanley, An algebraic Wasserstein distance for generalized persistence modules, preprint, arXiv: 1809.09654. |
[10] | S. Cannon, M. Duchin, D. Randall and P. Rule, A reversible recombination chain for graph partitions, work in progress. |
[11] | G. Carlsson, Topological pattern recognition for point cloud data, Acta Numer., 23 (2014), 289-368. doi: 10.1017/S0962492914000051. |
[12] | G. Carlsson, V. de Silva and D. Morozov, Zigzag persistent homology and real-valued functions, in Proceedings of the Twenty-Fifth Annual Symposium on Computational Geometry, ACM, 2009,247-256. doi: 10.1145/1542362.1542408. |
[13] | G. Carlsson and F. Mémoli, Characterization, stability and convergence of hierarchical clustering methods, J. Mach. Learn. Res., 11 (2010), 1425-1470. |
[14] | M. Carrière, F. Chazal, Y. Ike, T. Lacombe, M. Royer and Y. Umeda, PersLay: A neural network layer for persistence diagrams and new graph topological signatures, preprint, arXiv: 1904.09378. |
[15] | F. Chazal, D. Cohen-Steiner, M. Glisse, L. J. Guibas and S. Y. Oudot, Proximity of persistence modules and their diagrams, in Proceedings of the Twenty-Fifth Annual Symposium on Computational Geometry, ACM, 2009,237-246. doi: 10.1145/1542362.1542407. |
[16] | F. Chazal, D. Cohen-Steiner, L. J. Guibas, F. Mémoli and S. Y. Oudot, Gromov-Hausdorff stable signatures for shapes using persistence, in Computer Graphics Forum, 28, Wiley Online Library, 2009, 1393-1403. doi: 10.1111/j.1467-8659.2009.01516.x. |
[17] | F. Chazal, B. T. Fasy, F. Lecci, A. Rinaldo and L. Wasserman, Stochastic convergence of persistence landscapes and silhouettes, in Computational Geometry (SoCG'14), ACM, New York, 2014,474-483. doi: 10.1145/2582112.2582128. |
[18] | F. Chazal, M. Glisse, C. Labruère and B. Michel, Convergence rates for persistence diagram estimation in topological data analysis, J. Mach. Learn. Res., 16 (2015), 3603-3635. |
[19] | F. Chazal and B. Michel, An introduction to topological data analysis: Fundamental and practical aspects for data scientists, preprint, arXiv: 1710.04019. |
[20] | M. Chikina, A. Frieze and W. Pegden, Assessing significance in a Markov chain without mixing, Proc. Natl. Acad. Sci. USA, 114 (2017), 2860-2864. doi: 10.1073/pnas.1617540114. |
[21] | S. Chowdhury, B. Dai and F. Mémoli, The importance of forgetting: Limiting memory improves recovery of topological characteristics from neural data, PLoS One, 13 (2018). doi: 10.1371/journal.pone.0202561. |
[22] | D. Cohen-Steiner, H. Edelsbrunner and J. Harer, Stability of persistence diagrams, Discrete Comput. Geom., 37 (2007), 103-120. doi: 10.1007/s00454-006-1276-5. |
[23] | L. Crawford, A. Monod, A. X. Chen, S. Mukherjee and R. Rabadán, Predicting clinical outcomes in glioblastoma: An application of topological and functional data analysis, J. Amer. Statist. Assoc., 115 (2020), 1139-1150. doi: 10.1080/01621459.2019.1671198. |
[24] | J. Curry, The fiber of the persistence map for functions on the interval, J. Appl. Comput. Topol., 2 (2018), 301-321. doi: 10.1007/s41468-019-00024-z. |
[25] | Y. Dabaghian, F. Mémoli, L. Frank and G. Carlsson, A topological paradigm for hippocampal spatial map formation using persistent homology, PLoS Computational Biology, 8 (2012). doi: 10.1371/journal.pcbi.1002581. |
[26] | D. DeFord and M. Duchin, Redistricting reform in Virginia: Districting criteria in context, MGGG, 2019. Available from: https://mggg.org/va-criteria.pdf. |
[27] | D. DeFord, M. Duchin and J. Solomon, Comparison of districting plans for the Virginia house of delegates, MGGG, 2018. Available from: https://mggg.org/VA-report.pdf. |
[28] | D. DeFord, M. Duchin and J. Solomon, Recombination: A family of Markov chains for redistricting., preprint, arXiv: 1911.05725. |
[29] | M. Duchin, Gerrymandering metrics: How to measure? What's the baseline?, Bull. Amer. Acad. Arts Sci., 71 (2018), 54-58. |
[30] | M. Duchin, Outlier analysis for Pennsylvania congressional redistricting, report, 2018. Available from: https://mggg.org/uploads/md-report.pdf. |
[31] | M. Duchin and O. Walch, Political Geometry, in press, Birkhäuser. |
[32] | H. Edelsbrunner and J. L. Harer, Computational Topology. An Introduction, American Mathematical Society, Providence, RI, 2010. doi: 10.1090/mbk/069. |
[33] | K. Emmett, B. Schweinhart and P. Rabadan, Multiscale topology of chromatin folding, in Proceedings of the 9th EAI International Conference on Bio-Inspired Information and Communications Technologies, ACM, 2016,177-180. doi: 10.4108/eai.3-12-2015.2262453. |
[34] | M. Feng and M. A. Porter, Persistent homology of geospatial data: A case study with voting, SIAM Rev., 63 (2021), 67--99. doi: 10.1137/19M1241519. |
[35] | M. Feng and M. A. Porter, Spatial applications of topological data analysis: Cities, snowflakes, random structures, and spiders spinning under the influence, Phys. Rev. Research, 2 (2020). doi: 10.1103/PhysRevResearch.2.033426. |
[36] | P. Gabriel, Unzerlegbare Darstellungen. I, Manuscripta Math., 6 (1972), 71-103. doi: 10.1007/BF01298413. |
[37] | B. Grofman and G. King, The future of partisan symmetry as a judicial test for partisan gerrymandering after LULAC v. Perry, Election Law J., 6 (2007), 2-35. doi: 10.1089/elj.2006.6002. |
[38] | L. Guth, A. Nieh and T. Weighill, Three applications of entropy to gerrymandering, preprint, arXiv: 2010.14972. |
[39] | G. Herschlag, H. S. Kang, J. Luo, C. Vaughn Graves, S. Bangia, R. Ravier and J. C. Mattingly, Quantifying gerrymandering in North Carolina, Stat. Public Policy, 7 (2020), 30-38. doi: 10.1080/2330443X.2020.1796400. |
[40] | Y. Hiraoka, T. Shirai and K. D. Trinh, Limit theorems for persistence diagrams, Ann. Appl. Probab., 28 (2018), 2740-2780. doi: 10.1214/17-AAP1371. |
[41] | C. D. Hofer, R. Kwitt and M. Niethammer, Learning representations of persistence barcodes, J. Mach. Learn. Res., 20 (2019), 45pp. |
[42] | M. Lesnick, The theory of the interleaving distance on multidimensional persistence modules, Found. Comput. Math., 15 (2015), 613-650. doi: 10.1007/s10208-015-9255-y. |
[43] | J. Levitt, A citizen's guide to redistricting, SSRN, 2008,139pp. doi: 10.2139/ssrn.1647221. |
[44] | A. Lytchak, Open map theorem for metric spaces, St. Petersburg Math. J., 17 (2006), 477-491. doi: 10.1090/S1061-0022-06-00916-2. |
[45] | C. Maria, J.-D. Boissonnat, M. Glisse and M. Yvinec, The Gudhi library: Simplicial complexes and persistent homology, in International Congress on Mathematical Software, Lecture Notes in Computer Science, 8592, Springer, Berlin, Heidelberg, 2014,167-174. doi: 10.1007/978-3-662-44199-2_28. |
[46] | V. Maroulas, F. Nasrin and C. Oballe, A Bayesian framework for persistent homology, SIAM J. Math. Data Sci., 2 (2020), 48-74. doi: 10.1137/19M1268719. |
[47] | J. Mattingly, Report on Redistricting: Drawing the Line., Common Cause v. Rucho, 318 F.Supp.3d 777, Ex.3002, 871 (2018). |
[48] | M. D. McDonald and R. E. Best, Unfair partisan gerrymanders in politics and law: A diagnostic applied to six cases, Election Law J., 14 (2015), 312-330. doi: 10.1089/elj.2015.0358. |
[49] | MGGG, MAUP: The geospatial toolkit for redistricting data., Available from: https://github.com/mggg/maup. |
[50] | MGGG, MGGG-states: Open collection of precincts shapefiles for U.S. states., Available from: https://github.com/mggg-states. |
[51] | Y. Mileyko, S. Mukherjee and J. Harer, Probability measures on the space of persistence diagrams, Inverse Problems, 27 (2011), 22pp. doi: 10.1088/0266-5611/27/12/124007. |
[52] | D. Morozov, K. Beketayev and G. Weber, interleaving distance between merge trees., Discrete Comput. Geom., 49 (2013), 22-45. |
[53] | M. Nicolau, A. J. Levine and G. Carlsson, Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival, PNAS, 108 (2011), 7265-7270. doi: 10.1073/pnas.1102826108. |
[54] | W. Pegden, Pennsylvania's Congressional Districting is an Outlier: Expert Report, League of Women Voters, et. al v. the Commonwealth of Pennsylvania, et. al. 159 MM, 2017. |
[55] | J. Rodden and T. Weighill, Political geography: A case study of districting in Pennsylvania., preprint, arXiv: 2010.14608. |
[56] | N. O. Stephanopoulos and E. M. McGhee, Partisan gerrymandering and the efficiency gap., The University of Chicago Law Review, (2015), 831-900. |
[57] | B. J. Stolz, H. A. Harrington and M. A. Porter, The topological "shape" of Brexit, SSRN, (2016), 9pp. doi: 10.2139/ssrn.2843662. |
[58] | K. Turner, Y. Mileyko, S. Mukherjee and J. Harer, Fréchet means for distributions of persistence diagrams, Discrete Comput. Geom., 52 (2014), 44-70. doi: 10.1007/s00454-014-9604-7. |
[59] | Voting Rights Data Institute, GerryChain, GitHub Repository, 2018. |
[60] | A. Zomorodian and G. Carlsson, Computing persistent homology, Discrete Comput. Geom., 33 (2005), 249-274. doi: 10.1007/s00454-004-1146-y. |
This paper studies the aggregation of votes from geographic units to districts, which is intractable to study by complete enumeration of possibilities. We develop persistent homology techniques to summarize the districting problem. This figure shows a precinct map of Pennsylvania with key cities identified. The precincts are colored by voting results from the 2016 Presidential election (blue for Democratic, red for Republican). Current districts in Pennsylvania are shown with the same coloring. Note that the Congressional districts are balanced to within one-person population deviation (by 2010 Census count)
Abstract graph representations of three 18-district plans for Pennsylvania
The main method: turning a districting plan into a persistence diagram. The persistence diagram captures both the adjacency information and the vote shares for the districts in a plan
Pairing each plan in a districting ensemble with PRES16 vote data gives an ensemble of persistence diagrams. These plots show overlays of the diagram ensembles for
Overlaid persistence diagrams with Fréchet mean (in red) for each of the NC ensembles (with PRES16 vote data)
Overlaid Fréchet means for various elections in PA. In all three levels of redistricting, the geographic localization is strong enough to label the Pittsburgh feature (circled)
Comparison for the weighted ensembles of Senate plans, with Democratic-favoring in blue and Republican-favoring in red. The simple weighting adjustment to the Markov chain has successfully created a difference of 3-4 seats between the ensemble means. Despite the significant change in the partisan outcome, we see only small movement in the Fréchet means
Geographical localization of Fréchet features in Pennsylvania Congressional and Senate plans
We isolate the Fréchet means for two successive Presidential elections in PA. The mean for PRES12 has a third off-diagonal feature at Scranton, while PRES16 does not. A uniform partisan swing would show each red point displaced by
Comparison of point plots for the successive Fréchet features in the PA Senate ensembles that are biased for Democratic and Republican safe seats, respectively. Note the contrast between the rigidity of the Erie feature (similar placement for both ensembles) and the manipulability of the Harrisburg feature, though both anchor competitive districts
Maps of North Carolina. Top: results from the 2016 Presidential election (blue for Democratic, red for Republican). Bottom: North Carolina has a significant rural Black population, unlike Pennsylvania. Greenville and Wilson are the largest towns in the Northeast, but they are not the hubs of Black population
Geographical localization of Fréchet features in North Carolina Congressional and Senate plans
Overlaid Fréchet means for various elections in NC. In all three levels of redistricting, the geographic localization is strong enough to label the second and third features, shown circled, as Charlotte (left) and Asheville (right)
Comparing two enacted Congressional plans for North Carolina with the ensemble mean, against the PRES16 vote pattern in all three cases
Comparison of point plots for the successive Fréchet features in the NC Senate ensembles that are biased for Democratic and Republican safe seats, respectively
A small change to districts can cause a large change in persistence. In this example (closely modeled on the enacted NC Congressional plan from 2012) we have changed the assignment of only one geographic unit (a precinct of 3, 300 people), and it removes a highly persistent off-diagonal feature
Maps of enacted plans before and after perturbation by 1000 flip steps; district adjacencies changed in both cases (e.g., blue/lavender for Congress, and purple/purple for state Senate districts in the Southeast). Nonetheless we see only a small change in persistence, which is typical for repetitions of this experiment
Effect of iterating small geographic perturbations (single-unit flip steps) on persistence. The red line indicates the average bottleneck distance over the
Histograms of statistics of the dual graphs for the ensemble of 1000 18-district plans for Pennsylvania. (Density is the ratio of the number of edges to the number of possible edges,
Geographical localization of eleven Fréchet features in Pennsylvania House plans
Expanding on Figure 10. Point plots and heat maps for the successive Fréchet features in the PA Senate ensembles that are biased for Democratic and Republican safe seats, respectively. Note the contrast between the rigidity of the Erie feature (similar placement for both ensembles) and the manipulability of the Harrisburg feature, though both anchor competitive districts
Geographical localization of ten Fréchet features in North Carolina Senate plans
Comparison of point plots and heat maps for the successive Fréchet features in the NC Senate ensembles that are biased for Democratic and Republican safe seats, respectively
Location and average number of districts in connected components of Democratic-won districts for Pennsylvania Congressional plans (
Location and average number of districts in connected components of Democratic-won districts for Pennsylvania state Senate plans (
Location and average number of districts in connected components of Democratic-won districts for Pennsylvania state House plans (
Location and average number of districts in connected components of Democratic-won districts for North Carolina Congressional plans (
Location and average number of districts in connected components of Democratic-won districts for North Carolina state Senate plans (
Location and average number of districts in connected components of Democratic-won districts for North Carolina state House plans (
Distribution of pairwise bottleneck distances in the Pennsylvania ReCom ensembles. The distances help contextualize the size of the displacement by Flip steps
Effect of iterating ReCom steps (merging and resplitting pairs of adjacent districts) on bottleneck distance. We choose 100 starting plans