We use persistent cohomology and circular coordinates to investigate three datasets related to infectious diseases. We show that all three datasets exhibit circular coordinates that carry information about the data itself. For one of the datasets we are able to recover time post infection from the circular coordinate itself – for the other datasets, this information was not available, but in one we were able to relate the circular coordinate to red blood cell counts and weight changes in the subjects.
Citation: |
Figure 1. Topological analysis on the energy levels and malaria data from [1]. The barcode graph to the left has one bar noticeably longer than the other bars. The corresponding circle-valued coordinate is plotted with the hsv colormap to the right, we see that most of the data splits into two relatively constant regions, with some points in a transition between them
Figure 3. Topological analysis on RNA Expressions. This barcode graph shows a very clear long bar indicating a highly persistent cocycle encoding a circular coordinate inherent to the data. The corresponding circular coordinate has been plotted with the hsv colormap to the right. We can see the coordinate map constant near the bulk of the points, and then sweeping through a range of values as we go around the cycle. This indicates that the cycle we see in the PCA plot and in the Mapper graph in [5] is present in the full 109-dimensional data space
Figure 2. Further exploration of the dataset from [1]: the circular coordinate value discriminates quite clearly between low/high red blood cell counts (left) and between weight loss/weight gain (right)
Figure 4. Further exploration: To the left we are plotting the circular coordinate from Figure 3b against the variable Days Post Infection from the original dataset. We can see clearly how the circular coordinate increases with time post infection and marks out a region stretching from approximately day 5 to day 15. To the right, we plot body temperature on the PCA plot. We can see that the points where the coordinate function varies in Figure 3b are precisely the points with deviations in body temperature
Figure 6. Topological analysis on Hepatitis C RNA Expressions, after dimensionality reduction to 2 dimensions. In the 2-dimensional projection there is a loud and clear topological signal as can be seen with the long bar in the left figure. It produces a circular coordinate function that we depict in the right figure
Figure 7. The Hepatitis C RNA Expressions data was collected in four time intervals for each experimental subject: pre-infection, early acute, late acute and late follow up. We plot these four time stamps against the circular coordinate in the left-hand figure, and use the four time stamps for coloring the projected data in the right-hand figure. Even though the projected data has a very clear circular coordinate function, it does not seem to capture the time variable
[1] | K. Cumnock, A. S. Gupta, M. Lissner, V. Chevee, N. M. Davis and D. S. Schneider, Host energy source is important for disease tolerance to malaria, Current Biology, 28 (2018), 1635-1642. doi: 10.1016/j.cub.2018.04.009. |
[2] | V. de Silva, D. Morozov and M. Vejdemo-Johansson, Persistent cohomology and circular coordinates, Discrete Comput. Geom., 45 (2011), 737-759. doi: 10.1007/s00454-011-9344-x. |
[3] | F. Pedregosa, G. Varoquaux, A. Gramfort and al. et, Scikit-learn: Machine learning in {P}ython, J. Mach. Learn. Res., 12 (2011), 2825-2830. |
[4] | B. R. Rosenberg, M. Depla, C. A. Freije, D. Gaucher and S. Mazouz, et al., Longitudinal transcriptomic characterization of the immune response to acute hepatitis c virus infection in patients with spontaneous viral clearance, PLoS Pathogens, 14 (2018). doi: 10.1371/journal.ppat.1007290. |
[5] | B. Y. Torres, J. H. M. Oliveira, A. T. Tate, P. Rath, K. Cumnock and D. S. Schneider, Tracking resilience to infections by mapping disease space, PLoS biology, 14 (2016). doi: 10.1371/journal.pbio.1002436. |
[6] | M. Vejdemo-Johansson and A. Leshchenko, Certified mapper: Repeated testing for acyclicity and obstructions to the nerve lemma, in Topological Data Analysis, Abel Symposia, 15, Springer, Cham, 2020, 491–515. doi: 10.1007/978-3-030-43408-3_19. |
Topological analysis on the energy levels and malaria data from [1]. The barcode graph to the left has one bar noticeably longer than the other bars. The corresponding circle-valued coordinate is plotted with the hsv colormap to the right, we see that most of the data splits into two relatively constant regions, with some points in a transition between them
Topological analysis on RNA Expressions. This barcode graph shows a very clear long bar indicating a highly persistent cocycle encoding a circular coordinate inherent to the data. The corresponding circular coordinate has been plotted with the hsv colormap to the right. We can see the coordinate map constant near the bulk of the points, and then sweeping through a range of values as we go around the cycle. This indicates that the cycle we see in the PCA plot and in the Mapper graph in [5] is present in the full 109-dimensional data space
Further exploration of the dataset from [1]: the circular coordinate value discriminates quite clearly between low/high red blood cell counts (left) and between weight loss/weight gain (right)
Further exploration: To the left we are plotting the circular coordinate from Figure 3b against the variable Days Post Infection from the original dataset. We can see clearly how the circular coordinate increases with time post infection and marks out a region stretching from approximately day 5 to day 15. To the right, we plot body temperature on the PCA plot. We can see that the points where the coordinate function varies in Figure 3b are precisely the points with deviations in body temperature
Topological analysis on Hepatitis C RNA Expressions, in full 56 300 dimensions. There is a barcode with one bar significantly longer than all the other bars, indicating a possible circular structure. It produces a circular coordinate function that we depict in the right figure
Topological analysis on Hepatitis C RNA Expressions, after dimensionality reduction to 2 dimensions. In the 2-dimensional projection there is a loud and clear topological signal as can be seen with the long bar in the left figure. It produces a circular coordinate function that we depict in the right figure
The Hepatitis C RNA Expressions data was collected in four time intervals for each experimental subject: pre-infection, early acute, late acute and late follow up. We plot these four time stamps against the circular coordinate in the left-hand figure, and use the four time stamps for coloring the projected data in the right-hand figure. Even though the projected data has a very clear circular coordinate function, it does not seem to capture the time variable