Article Contents
Article Contents

# Bayesian topological signal processing

• Topological data analysis encompasses a broad set of techniques that investigate the shape of data. One of the predominant tools in topological data analysis is persistent homology, which is used to create topological summaries of data called persistence diagrams. Persistent homology offers a novel method for signal analysis. Herein, we aid interpretation of the sublevel set persistence diagrams of signals by 1) showing the effect of frequency and instantaneous amplitude on the persistence diagrams for a family of deterministic signals, and 2) providing a general equation for the probability density of persistence diagrams of random signals via a pushforward measure. We also provide a topologically-motivated, efficiently computable statistical descriptor analogous to the power spectral density for signals based on a generalized Bayesian framework for persistence diagrams. This Bayesian descriptor is shown to be competitive with power spectral densities and continuous wavelet transforms at distinguishing signals with different dynamics in a classification problem with autoregressive signals.

Mathematics Subject Classification: Primary: 55N31, 68T07.

 Citation:

• Figure 1.  Shown above (a) are the sublevel sets $C_{-0.5}$, $C_{0}$, $C_{0.25}$, and $C_1$ for a damped cosine $e^{-2t}\cos(8\pi t)$. (b) shows the persistence diagram of the sublevel set filtration. The points in (b) are colored to match the connected components their birth coordinates correspond to. The transition from $C_0$ to $C_{0.25}$ depicts the Elder rule; notice that in $C_0$, there are light blue and purple connected components, which merge together in $C_{0.25}$. A similar merging happens in the transition from $C_{0.25}$ to $C_{0.5}$. Since the purple component has a later birth value, it disappears into the light blue component, which persists until it merges into the green component by the same line of reasoning.

Figure 2.  This figure illustrates sources of uncertainty in persistence diagrams. Shown above are signals with additive noise (a) $\mathcal{N}(0,0.01)$, and (b) $\mathcal{N}(0,0.1)$ along with their persistence diagrams. The persistence diagram for the true underlying signal is shown in red. Spurious features arise due to noise and additionally, true features also shift around.

Figure 3.  Top: We consider three signals. The blue signal (Signal 1) and the red signal (Signal 2) are modeled by $a_{\beta}(t)\cos(8\pi t)$ where $a_{\beta}(t) = 5e^{-{\beta}t}$ with ${\beta} = 1,4$ in Signals 1 and 2 respectively. The green signal (Signal 3) is then added to each case and the amplitudes are translated to have global minima equal to zero. Bottom: The associated persistence diagrams are plotted using the method described in Section 2.2. We observe that as ${\beta}$ increases, the high-frequency oscillations are less affected by the low-frequency signal and converge faster towards the uniform shape of the green signal. This leads to a decrease in the variance of the persistence coordinates in the red diagram.

Figure 4.  (a) The damped cosine $e^{-2t}\cos(8\pi t)$ with additive noise $\mathcal{N}(0,0.01)$ and (b) its persistence diagram. (b) shows an uninformative prior intensity with a single component at $(1,1)$ with covariance matrix $10I$. Using the model from Equation (7) with the prior in (c) and the observed diagram in (b) results in the posterior intensity shown in (d). To account for spurious points, which we suspected to be low persistence in this example, we placed components of $\lambda_{S}$ at $(0.5,0.1), (1,0.1),(0.75,0.1)$ and $(1.75,0.1)$.

Figure 5.  This figure demonstrates the effect of greater low frequency power on the persistence diagram of a signal. Figures (a) and (c) show two signals, respectively, which are the result of summing low-frequency and high-frequency oscillators. The power of the low-frequency signal is greater in (a) than in (c). To ensure that persistence diagrams in (b) and (d) lie in $\mathbb{W}$, the aggregate signals in both (a) and (c) have been translated so that their absolute minima are at zero. Notice in (b) that elements of the persistence diagram show greater spread along the Birth axis than in (d). This results in greater birth variance of the corresponding posterior intensity. Also notice the isolated high-persistence mode in (b), which is not present in (d). These phenomena arise because the low frequency signal scatters the higher frequency peaks along the Amplitude axis.

Figure 6.  This plot depicts the relationship between the cardinality of persistence diagrams and the frequency of the dominant oscillation for one second autoregressive signals across various damping factors. For each included frequency and damping factor, we simulated thirty signals (each had a component fixed at zero to give the $1/f$ PSD commonly seen in EEG), computed their persistence diagrams, then recorded their average cardinality. We see a strong positive correlation between this average cardinality and the frequency of the dominant oscillation (i.e., PSD Peak Frequency) consistent with the idealized deterministic sinusoid case.

Figure 7.  The peak frequency $f$ for $\mathcal{A}_{f}^{\beta}$ plotted against the average birth variance for its persistence diagrams. Colors depict the damping factor $\beta$.

Figure 8.  The average (log) power spectral densities along with examples of signals and persistence diagrams from each class for damping factors of top) 4, and bottom) 32

Table 2.  Parameter values for autoregressive model determined by fitting to real EEG. Missing values indicate that the optimal AR model order did not include a corresponding frequency component

 Signal Length 1 Second 5 Seconds $f_1$ $f_2$ $f_3$ $f_4$ $\beta_1$ $\beta_2$ $\beta_3$ $\beta_4$ $f_1$ $f_2$ $f_3$ $f_4$ $\beta_1$ $\beta_2$ $\beta_3$ $\beta_4$ Signal 1 0 5.87 18.59 - 344.80 5.37 16.6 - 0 6.00 14.4 20.85 24.98 10.54 31.64 26.97 Signal 2 0 10.70 - - 202.78 7.41 - - 0 10.16 23.02 - 17.24 4.06 20.37 -

Table 1.  Precisions and recalls for each feature and classifier. Results are reported as mean $\pm$ standard error across each class

 Bayesian PSD CWT Classifier Precision Recall Precision Recall Precision Recall LR $0.84 \pm 0.06$ $0.85 \pm 0.07$ $0.90 \pm 0.04$ $0.90 \pm 0.04$ $0.91 \pm 0.03$ $0.90 \pm 0.04$ SVM - Lin. $0.92 \pm 0.05$ $0.91 \pm 0.04$ $0.91 \pm 0.03$ $0.90 \pm 0.05$ $0.91 \pm 0.04$ $0.91 \pm 0.03$ MLP $0.89 \pm 0.05$ $0.88 \pm 0.04$ $0.90 \pm 0.02$ $0.89 \pm 0.02$ $0.92 \pm 0.03$ $0.93 \pm 0.02$

Figures(8)

Tables(2)