An Incremental Approach to Online Dynamic Mode Decomposition for Time-Varying Systems with Applications to EEG Data Modeling

Dynamic Mode Decomposition (DMD) is a data-driven technique to identify a low dimensional linear time invariant dynamics underlying high-dimensional data. For systems in which such underlying low-dimensional dynamics is time-varying, a time-invariant approximation of such dynamics computed through standard DMD techniques may not be appropriate. We focus on DMD techniques for such time-varying systems and develop incremental algorithms for systems without and with exogenous control inputs. We consider two classes of algorithms that rely on (i) a discount factor on previous observations, and (ii) a sliding window of observations. Our algorithms leverage existing techniques for incremental singular value decomposition and allow us to determine an appropriately reduced model at each time and are applicable even if data matrix is singular. We apply the developed algorithms for autonomous systems to Electroencephalographic (EEG) data and demonstrate their eﬀectiveness in terms reconstruction and prediction. Our algorithms for non-autonomous systems are illustrated using randomly generated linear time-varying systems.


Introduction
Emergence of low cost sensors and their widespread deployment has led to an unprecedented amount of data.Extraction of actionable information from such plethora of data remains a challenge.It is often the case that a low dimensional dynamical system governs the high-dimensional spatiotemporal sensory data.Dynamic Mode Decomposition (DMD) has emerged as a popular data-driven technique to efficiently compute such low dimensional dynamics [17,27,30].One of the attractive features of the DMD approach is that it requires minimal assumptions on the data.Furthermore, its strong connections with the Koopman operator [15,[20][21][22]26] makes it further appealing and theoretically grounded.The efficacy and simplicity of the DMD has inspired its application in a wide range of areas from fluid dynamics [17,27] to video processing [10,29] to epidemiology [25] to neuroscience [3].
For high-dimensional data that is generated by an underlying low dimensional time-varying dynamics, the standard DMD approach may be applicable only locally in time and appropriate time-varying DMD operators can be computed by discounting old data or considering a sliding window of observations [32].Even in scenarios when underlying dynamics is time-invariant and nonlinear, such time-varying linear approximations have been shown to be very effective [7].In this paper, we develop incremental methods for the computation of the DMD for time-varying systems and demonstrate the utility of the developed algorithm using Electroencephalographic (EEG) data.
Incremental approaches to DMD refer to methods that allow for efficiently computing the DMD when data is provided sequentially instead of being provided as a batch.The sequential arrival might be due to inherent nature of the application or it may be used for computational efficiency even if all the data is accessible.One such technique is streaming DMD [12,13] that first computes the projection of new data on the current DMD basis and compares the norm of the difference between the projection and the new data with a threshold.If the difference is larger than the threshold, then it appends the DMD basis with an additional element.Incremental total DMD introduced in [18] applies incremental Singular Value Decomposition (SVD) [1,2,23] for incremental computation of dynamic modes and subsequent identification of dominant modes.
Within the context of time-varying systems, online DMD [32] incrementally computes the DMD operator using Sherman-Morrison identity [28].Authors of [32] consider two strategies in their DMD computation to account for time-varying systems: (i) discounting the old data, and (ii) using a sliding window of observations.They also consider DMD for systems with control input [16,24].In particular, they extend the DMD with control approach proposed in [24] to develop an online DMD with control algorithm for time-varying system.
In this paper, we build upon the work in [32] and develop incremental SVD based techniques for the computation of the online DMD for time-varying systems.The proposed approach endows the online DMD approach with additional capability to access singular values of the data matrix at each time and hence allows it to determine a reduced order model and improve its future prediction accuracy.Furthermore, for scenarios where Sherman-Morrison identity cannot be applied, e.g., if the data matrix remains singular even after sufficiently long time, the access to singular values allows for efficient computation of appropriate pseudo-inverse.The major contributions of this work are fourfold: (i).We leverage incremental SVD techniques to develop two algorithms for the computation of the online DMD for time-varying systems that rely on discounting old data and a sliding window of data, respectively; (ii).We extend both these algorithms to develop incremental SVD based algorithms for the computation of the online DMD with control input for time-varying systems; (iii).We apply our algorithms for the case without control input on an EEG dataset and show their efficacy in reconstructing and predicting error related potential, a slow cortical potential seen in the EEG signal that is elicited by an unexpected outcome; (iv).We apply our algorithms for the case with control input on randomly generated linear timevarying dynamical systems, and demonstrate their efficacy.
The remainder of the paper is organized as follows.In Section 2, we presents some background on the DMD for time-invariant and time-varying systems.In Section 3, we present some background for the incremental SVD algorithms.In Section 4, we leverage the incremental SVD algorithms to develop novel incremental DMD algorithms.We apply these algorithms to EEG data in Section 5.

Dynamic Mode Decomposition
Consider the following discrete-time system: where x k ∈ R n is a high-dimensional state vector (n 1) sampled at t k = k∆t, k ∈ {1, . . ., (m + 1)}, and f is an unknown map which describes the evolution of the state vector between two subsequent sampling times.Suppose that the evolution of the high-dimensional state x is governed by some underlying low-dimensional dynamics.Then, the DMD computes a data-driven linear approximation to the system (1) as follows [17,30].
Consider a collection of (m+1) sequential measurements arranged in the following two datasets: Let the projection of X onto its leading r singular vectors be X = Ūx Σx V * x , where r is the targeted dimension of the underlying low-dimensional dynamical system, Ūx ∈ C n×r , Vx ∈ C r×m , and Σx = diag{σ 1 , . . ., σ r } ∈ C r×r .Then, the DMD framework approximates the dynamics (1) by where Ā ∈ R n×n is called the DMD operator and is given by Let the projection of the state x and the DMD operator Ā onto the space spanned by leading r singular vectors of X be xk = Ū * x k and respectively.Then, the approximation to the underlying r-dimensional dynamics is Let Λ be the diagonal matrix of the eigenvalues of Ã and W be the matrix of the associated eigenvectors such that ÃW = WΛ.
Then, two types of DMD modes can be identified using the eigen-decomposition in (4): the projected DMD modes Φ = Ūx W and the exact DMD modes Φ = Y Vx Σ−1 x W [30]. Either of these DMD modes and the associated eigenvalues describe the evolution of the high-dimensional system using low-dimensional dynamics (3).

Dynamic Mode Decomposition with Control
Consider the following non-autonomous discrete-time system: where x k ∈ R n is a high-dimensional state vector (n 1) sampled at t k = k∆t, k ∈ {1, . . ., (m+1)}, and f c is an unknown time-varying map which describes the evolution of the state vector between two subsequent sampling times, and γ k ∈ R l is the exogenous input.Then, the Dynamic Mode Decomposition with Control (DMDc) computes a data-driven linear approximation to the system (5) as follows [24].
Consider a collection of m sequential measurements x i ∈ R n and the associated exogeneous inputs γ i ∈ R l , i ∈ {1, . . ., m}.DMDc approximates the non-autonomous dynamics underlying these measurements by where A ∈ R n×n is called the DMD operator, B ∈ R n×l is called the input matrix, and G = A B ∈ R n× (n+l) .In particular, the DMDc algorithm constructs matrices 1) , and X c = U c Σ c V c * be its singular value decomposition, where U c ∈ C (n+l)×(n+l) , Σ c ∈ C (n+l)×(n+l) , and The DMDc algorithm estimates G by Finally, U c * can be written in a partitioned form of n+l) .Then, the DMDc algorithm computes the matrices A c and B c given by Similar to the case of the DMD, the DMDc algorithm also enables identification of lowdimensional non-autonomous system underlying the high-dimensional measurements.Assume that the matrix X c can be approximated by its projection onto the leading p singular vectors by Xc = Ūc Σc Vc * where Ūc ∈ C (n+l)×p , Σc ∈ C p×p , and Vc ∈ C (m−1)×p , and the matrix X can be approximated by its projection onto the leading r ≤ p singular vectors by X = Ū Σ V * , where Ū ∈ C n×r , Σ ∈ C r×r , and V ∈ C (m−1)×r .Then, a reduced order model can be represented as follows: where the lower-dimensional matrices Ã ∈ R r×r and B ∈ R r×l can be calculated by:

Problem Formulation
In this paper, we study incremental algorithms for the time-varying DMD and DMDc problems defined below.

Time-varying DMD
Consider the time-varying discrete time system of the form where x k ∈ R n is a high-dimensional state vector (n 1) sampled at t k = k∆t, k ∈ {1, . . ., (m+1)}, and f is an unknown time-varying map which describes the evolution of the state vector between two subsequent sampling times.Similar to the standard DMD setup, we assume that the highdimensional dynamics ( 6) is generated by a low-dimensional time-varying dynamics.
Suppose that, at sampling time t k+1 , we have access to a collection of (w + 1) sequential measurements Our objective is to design computationally efficient techniques to identify the time-varying DMD operator A k that approximate the system (6) by the following linear time-varying system such that the cost function is minimized, where ρ ∈ (0, 1] be a discounting factor.

Time-varying DMDc
Consider the following variant of the system in ( 6) where f c is an unknown time-varying map which describes the evolution of the state vector between two subsequent sampling times, and γ k ∈ R l is exogenous input.Assume that, at sampling time t k+1 , two collections of sequential measurements are available, one for the system states x k−w+1 x k−w+2 x k−w+3 • • • x k+1 , and the other one for the exogenous inputs γ k−w+1 γ k−w+2 γ k−w+3 • • • γ k+1 .Our objective is to design computational techniques to identify the time-varying DMDc operator A c k and input matrix B c k that approximate the system (9) by the following linear time-varying system such that the cost function is minimized, where ρ ∈ (0, 1] is a discounting factor. Then, the linear model ( 10) can be written as and the cost function (11) can be equivalently written as Along with the efficient computation of the time-varying DMD and DMDc operators, the proposed computational techniques should also enable efficient computation of (i) the associated DMD eigenvalues and modes, and (ii) the linear time-varying systems that approximate the lowdimensional dynamics underlying ( 6) and ( 9), respectively.

Block Computation of Time-varying DMD and DMDc
We refer to a computation that requires all the data until the sampling time t k to compute the desired solution at time t k as a block computation.In contrast, an incremental computation uses only the data at time t k and the solution at time t k−1 to compute the desired solution.
In this section, we describe block computation techniques for DMD and DMDc.We will use these block computations to derive the incremental computations later in the paper.
Lemma 1.Consider a sequence of (w + 1) measurements {x k−w+1 , . . ., x k+1 } at sampling time t k+1 that is arranged in the following two matrices where x j ∈ R n , for each j ∈ {k − w + 1, • • • , k + 1}, and y j = x j+1 .Then, the following statements hold for the cost function( 8) where orthonormal columns, and are defined by the reduced SVD of (ii). if KerX * k is non-trivial, then there exists infinitely many minimizers of (8), and the unique minimizer with the smallest induced two-norm is where The proof of Lemma 1 is provided in Appendix A. We now consider the block computation of the time-varying DMDc in the following lemma.Lemma 2. Consider two sequences of (w+1) measurements {x k−w+1 , . . ., x k+1 }, and {γ k−w+1 , • • • , γ k+1 } at sampling time t k+1 that are arranged in the following two matrices where n+l) .Then the following statements hold for the cost function in (13) k is trivial, then the unique minimizer of (11) is where x k ∈ C k×(n+l) has orthonormal columns, and are defined by the reduced SVD of (ii). if KerX * k is non-trivial, then there exists infinitely many minimizers of (11), and the unique minimizer with the smallest induced two-norm is where Ūc x k ∈ C (n+l)×r , Σc x k = diag{σ 1 , . . ., σ r } ∈ C r×r , Vc x k ∈ C k×r , and r = (n + l) − dim (KerX c * k ), are defined by the reduced SVD of X c k ≈ Ūc Lemma 2 can be proved analogously to Lemma 1 and its proof is omitted.Note that using the partitioned form of r) and U c b k ∈ C l×r , the matrices Āc k and Bc k associated with (19) can be obtain as follows In this work, we focus on two special cases of the cost functions ( 8) and ( 13), namely, the weighted cost functions and the windowed cost functions.The weighted cost functions consider a gradual elimination for the old measurements by assigning the discounting factor to be ρ ∈ (0, 1) and consider all the available data, i.e., they select w = k.The windowed cost functions consider a sharp cut-off window and use a non-discounted window of recent measurements, i.e., they select ρ = 1 and w < k.In the remainder of the paper, we will refer to the DMD and the DMDc operators/matrices obtained using the weighted (resp., windowed) cost functions by the weighted (resp., windowed) DMD and DMDc operator/matrices and denote them by A ρ k (resp.,A w k ) and (A ρc k , B ρc k ) (resp., (A wc k , B wc k )), respectively.

Background on Incremental SVD Algorithms
At the heart of the DMD framework is the utilization of the SVD to identify an invariant subspace for the low-dimensional dynamics underlying the high-dimensional data.The techniques proposed in this paper for the computation of the time-varying DMD operator require access to the SVD of certain time-varying data matrix at each time.For efficient computation of these SVDs, we resort to incremental SVD techniques proposed in [1,2,23].In this section, we present these incremental SVD techniques.The presentation below is adapted from [1] in order to facilitate better exposition in Section 4.

Weighted Incremental SVD Algorithm
Consider a weighted dataset where 0 ≤ ρ ≤ 1 is a scalar discounting factor and k ≥ n.Given an additional datum x k+1 , let be the incremented dataset.Provided that the SVD of X k is known, the SVD of X k+1 can be defined using the following proposition.
Proposition 1.Let at sampling time t k the SVD of the dataset in (21) be defined by Assume that at sampling time t k+1 , a new datum x k+1 is accessed.Then, the SVD for the dataset in (22), defined by x k+1 , is given by : where U s k , Σ s k , and , are defined by the following SVD Proof.The incremented dataset X k+1 can be represented in term of X k using the following additive updating formula where and X k+1 in ( 22) can be written as: .
Thus, the SVD of x k+1 is given by The above incremental SVD computation requires the computation of the SVD of matrix S k ∈ C n×(n+1) .The size of this matrix is smaller than the size of dataset X k+1 and it has the so-called broken arrow structure which enables efficient computation of its SVD [6,11,14].

Windowed Incremental SVD Algorithm
Consider a windowed dataset where n is the dimension of measurements and w is the length of the time-window of the desired measurements.Define q as follows q n, if n < w, w, otherwise.
Consider another dataset Then, the incremental SVD computes the SVD of χ k+1 = U χ k+1 Σ χ k+1 V * χ k+1 as follows.First, it eliminates x k−w+1 from the dataset to obtain decremented dataset and the associated SVD Then, it increments χk with x k+1 to obtain χk+1 and the associated SVD.The SVD of the dataset χk can be obtained by the following proposition.
Proposition 2. Let at sampling time t k , the SVD of the dataset in (25) be defined by . ., σ q } 0 q×(w−q) ∈ C q×w , and V χ k ∈ C w×w is a unitary matrix.Then, the SVD of the dataset in (28) defined as χk = U χk Σ χk V * χk , is given by : where U śk , V śk , and Σ śk are defined by the following SVD: and V χ k,2 is the submatrix of V χk obtained after removing its first row.
Proof.Consider the updated dataset where χk 0 x k−w+2 • • • x k is the χk dataset padded with n-dimensional zero column, and where , where Substituting the SVD from equation ( 31) into equation( 30), the dataset Xk can be expressed as: A noteworthy property of the above decomposition that we will use in the later developments is that v χ k,1 V śk = 0. To establish this property, we note that since

and consequently
In the incremental step, the new datum x k+1 is appended to χk and the SVD of the resulting matrix χ k+1 can be obtained using the procedure presented in Section 3. In particular, let Ŝk Σ χk U * χk x k+1 = U ŝk Σ ŝk V * ŝk , and using equation ( 23), the SVD of The windowed incremental SVD computation requires the computation of the SVD of Śk ∈ R q×w .In general, Śk may not possess any special structure.However, if n w, in which case q = w, the above incremental update maybe a lot cheaper than the computation of the SVD of an n × w matrix.Also, in this context, the above procedure helps in term of storage as it does not require the large matrix χ k+1 to be stored and uses only the first and last column of this matrix along with the SVD at the previous iteration.

Incremental DMD Algorithms
In this section, we present two algorithms, namely weighted incremental DMD and windowed incremental DMD, respectively, for incremental computation of a time-varying lower-dimensional DMD operator.

Weighted Incremental DMD Algorithm
In this algorithm, the time-varying DMD operator is estimated by assigning a decaying weight to past measurements in order to gradually discount their representation as newer measurements become available.Assume that at sampling time t k+1 , we have access to the following weighted datasets: where y k = x k+1 , for each k ∈ N. Assume that at sampling time t k+2 , the datasets in (35) are updated with a pair of measurements (x k+1 , y k+1 ) such that Suppose that at sampling time t k+1 , the DMD operator A ρ k and the SVD of X k are known, then at sampling time t k+2 , the DMD operator can be updated with the new pair of measurements (x k+1 , y k+1 ) using the following theorem.
x k be known, and let the DMD operator A ρ k ∈ R n×n minimizes the cost function in (8) with Y k and X k given in (35).Assume that at sampling time t k+2 , a new pair of measurements (x k+1 , y k+1 ) is used to incrementally compute the SVD of X k+1 using Proposition 1.Then, the DMD operator A ρ k+1 ∈ R n×n which minimizes the cost function in (8) is Proof.Let the SVD of X k+1 = U x k+1 Σ x k+1 V * x k+1 be calculated by the incremental SVD update (23).Then, the time-varying DMD operator (15) at sampling time t k+1 is: Substituting the incrementally computed V k+1 from ( 23) into the DMD operator (38) yields: where I n is the identity matrix of order n, and A ρ k has been added and subtracted in (39).The term where ( 41) is obtained by substituting incremental update for V x k+1 from (23), and (42) is obtained by using the fact that the rows in are orthogonal to each other.Substituting equation (43) into the equation (40): If X k+1 is well-approximated by its projection onto its leading r singular vectors given by Xk+1 = Ūx k+1 Σx k+1 V * x k+1 , then the DMD operator update in (37) takes the form Moreover, the DMD operator Āρ k+1 ∈ R n×n can be projected onto a subspace spanned by the leading r left singular vectors of X k+1 to obtain a lower-dimensional DMD operator Ãρ k+1 ∈ R r×r given by The update in equation (37) requires only the current DMD operator, measurement pair (x k+1 , y k+1 ), and the incremental SVD update.Specifically, these updates do not require the data matrix to be stored.

Windowed Incremental DMD Algorithm
We now focus on windowed incremental DMD algorithm in which a sliding-window of w most recent measurements is used to estimate the time-varying DMD operator.Assume that we have access to the following windowed datasets at sampling time t k+1 : Accordingly, at sampling time t k+2 , we have access to Suppose that at sampling time t k+1 , the DMD operator A w k and the SVD of χ k are known, then at sampling time t k+2 , the DMD operator can be updated with the new pair of measurements (x k+1 , y k+1 ) using the following theorem.
χ k be known, and let the DMD operator A w k ∈ R n×n minimizes the cost function in (8) with Υ k and χ k given in (46).Assume that at sampling time t k+2 , a new pair of measurements (x k+1 , y k+1 ) is used to incrementally compute the SVD of χ k+1 using Proposition 2 and equation (34).Then, the DMD operator A w k+1 ∈ R n×n which minimizes the cost function in (8) is Proof.Let the SVD of χ k+1 = U χ k+1 Σ χ k+1 V * χ k+1 be calculated by the windowed SVD update (34).Then, the time-varying DMD operator (15) at sampling time t k+1 is Substituting the incrementally computed V k+1 from (34) into the DMD operator (49) yields: where equation (50) holds because v χ k,1 V śk = 0 from equation (33).
After following the same steps presented in proof of Theorem 1, equation (51) becomes: The term where equation ( 53) is obtained after substitution the windowed incremental update formula from (34), and ( 54) is obtained after substituting The substitution of (55) in (52) yields: If χ k+1 is well-approximated by its projection onto its leading r singular vectors given by χk+1 = Ūχ k+1 Σχ k+1 V * χ k+1 , then the DMD operator takes the form Moreover, the DMD operator Āw k+1 ∈ R n×n can be projected onto a subspace spanned by the leading r left singular vectors of χ k+1 to obtain a lower-dimensional DMD operator Ãw k+1 ∈ R r×r given by Similar to the weighted incremental DMD, The update in equation ( 48) requires only the current DMD operator, measurement pair (x k+1 , y k+1 ), and the windowed incremental SVD update.

Dynamic Mode Decomposition for Error-Related Potentials in EEG data
Error Related Potentials (ErrPs) are slow cortical potentials seen in the EEG signal of human subjects that is elicited by an unexpected (erroneous) outcome.For example, within the context of human-machine interaction, such signals are observed when the machine takes an unexpected action [4,5,9].In this section, we illustrate the efficacy of the proposed DMD algorithms on EEG recordings taken from Monitoring Error-Related Potentials datbase [4,5].These EEG recordings were collected from six subjects while they were observing the movement of a cursor between two labeled targets on a screen.The subjects had no control on the motion of the cursor.However, they had a priori knowledge about the intended directions of movement.The cursor movements were set to elicit ErrPs by generating two types of events, namely, correct events and erroneous events.The correct events correspond to the cursor movement in the intended direction, while the erroneous events correspond to the cursor movement in any unintended direction.EEG signals were recorded at sampling rate of 512 Hz, using Biosemi ActiveTwo system with 64 electrodes distributed according to the standard 10/20 international system as it is shown in Figure 1.A two step standard EEG preprocessing for ErrPs which consists of Common Average Reference (CAR) filtering, and then (1 − 10) Hz Band Pass (BP) filtering is performed on the data.Segments of EEG signals, from all the participants, corresponding to the correct and erroneous events were extracted into two separate groups.Then, the segments within each group were averaged to obtain a signal each for the correct and erroneous events.Finally, the ErrPs are calculated by subtracting the signal for correct event from the signal for the erroneous event.Figure 2 shows the ERP signals and topographical views corresponding to the correct and erroneous events, and the calculated ErrPs.EEGLAB was used for the processing of EEG signals and generating the topographical views for the related brain activities [8].Note that for consistency with the literature, we investigate the averaged ERP signal.However, the confidence regions around the averaged signal shown in Figure 2 reveal that despite the variability in these signals and the key features are also seen in the signal associated with a single event.Thus, the following analysis can be performed on a single subject real time data as well.
As reported in [4,5], the ErrPs are characterized by a sequence of three peaks/troughs after the onset of the event in the EEG signal measured at channel FCz (see Figure 1).A sample ErrP is shown in Figure 2c where the first small positive peak is observed at 200 ms, followed by a negative trough at 260 ms, and finally a larger positive peak at 330 ms.
Two observation can be made by comparing the temporal behavior and the topographical views during the correct and erroneous events in Figures 2a and 2b, respectively.First, the ERP signal during the correct event tends to have broader peaks in comparison with the ERP signal during erroneous events, which indicates the it has slower dynamics, i.e., has smaller growth or decay rates.Second, by comparing the topographical views during the three characterizing peaks at FCz channel, it can be noticed that the ERP peaks during erroneous events have higher amplitudes than the peaks during correct events in both negative and positive directions.Note that the colormap bar values are different for the topographical views for the correct ERP, the erroneous ERP, and the ErrP in Figure 2. As shown in Figure 2, the EEG patterns show coherent activities within the brain suggesting the possibility of a low-dimensional dynamics underlying this high dimensional data.Furthermore, these patterns change significantly with time and can possibly be better explained using a time-varying underlying model.To investigate these hypotheses, we apply the incremental DMD algorithms to the EEG recordings corresponding to correct and erroneous events.The weighted incremental DMD algorithm is applied with discounting factors ρ = {0.1,0.2, 0.4, 0.8}, and the windowed incremental DMD algorithm with a time-window of width w = 512 samples.All singular values greater than σ thr = {0.01,0.001} are used for computing the reduced order DMD operator.
An initial window of 512 samples before the event occurrence is used to initialized the DMD models for both algorithms.
We compare the proposed algorithms with the online DMD approach proposed in [32].To this end, we use the same parameters (discounting factor and width of time-window) in the online DMD algorithms as the incremental DMD algorithms.Recall that the online DMD algorithms rely on Sherman-Morrison identity to compute DMD operator and do not have access to the (time-varying) singular values of the data matrix and hence, the reduction of the DMD operator to leading singular vectors does not apply to online DMD algorithms.
In order to compute time-varying DMD operator, we first compute a standard DMD operator using EEG data from a one-second window (512 samples) just before the correct or erroneous event.Subsequently, the EEG measurements are streamed with a rate of 1 sample/iteration to update the initial DMD operator.At each iteration, both weighted and windowed versions of the incremental DMD and online DMD algorithms are used to predict 64 future samples of the EEG signal at channel FCz.We compare the performance of these algorithms using normalized Root Mean Square (RMS) prediction error, denoted as e nrms (k) at each t k ∈ {0, • • • , 0.6} using the following formula: where ŷi and y i are the estimated and recorded values, respectively, y k = max k+1≤i≤k+64 y i , and Figure 3 shows the mean of the normalized RMS prediction error computed over all iterations as well as the associated 95% confidence sets for weighted and windowed incremental DMD algorithms and different choices of parameters.The performance of the incremental DMD does not appear to vary much with the weighing factor and the two choices of the threshold on singular values.The performance of the windowed incremental DMD algorithm is similar to the weighted incremental DMD.However, incremental DMD algorithms seem to outperform online DMD algorithms.This suggests that having a small threshold σ thr is more beneficial than having no threshold as in the case of online DMD algorithms.The evolution of the normalized RMS prediction error for each event is presented in Figure 13 in Appendix B.
In order to gain further insight into the influence of different thresholds σ thr and weighing factors ρ, we investigate how accurately the incremental DMD predicts the signature ERPs at channel FCz.To this end, we updated the initial DMD operator using incremental DMD algorithms until the first peak at 200 msec.We then used the updated lower-dimensional DMD operator to predict the future ERP until the third peak at 330 msec.The predicted signal and the associated normalized RMS are shown in Figures 4 and 5, respectively.For the erroneous event, the predicted signal from incremental DMD algorithms capture the trend of the original signal, while for the correct event, most incremental DMD algorithms fail to capture the final dip in the original signal.The only exception is the windowed incremental DMD with σ thr = 0.001.For σ thr = 0.01, the number of DMD modes used by the incremental DMD algorithms range from 26 to 31 for correct events and 28 to 32 for erroneous events.Similarly, for σ thr = 0.001, the number of DMD modes used by the incremental DMD algorithms range from 26 to 31 for correct events and 28 to 33 for erroneous events.In comparison, the predicted signal from the online DMD algorithms performs poorly.The key reason for this poor performance, is that the data matrix obtained after prepossessing is not full rank even if all the measurements are included.Thus, the Sherman-Morrison update cannot be applied in the standard form.Recall that the online DMD algorithm starts by initializing the DMD operator A k and the inverse of covariance matrix P k = X k X T k −1 .For initialization of online DMD algorithms, we adopted the suggestion in [32]: in the event that X k X T k is not full rank, A k can be initialized by an n × n zero matrix, and P k can be initialized by P init = αI where α is a large positive scalar.
During the correct events, there appears to be an interplay between the threshold σ thr and weighing factor ρ. As shown in Figure 5 for correct events, the normalized RMS decreases with decreasing the weighing factor until ρ = 0.2 and increases with further decrease in ρ.For ρ = 0.2, the higher dimension of the reduced system (σ thr = 0.001) leads to higher prediction error compared with the lower dimension reduced system (σ thr = 0.01).This suggests that at for a weighing factor that leads to the smallest prediction error, including too many modes is not beneficial for prediction performance, since it leads to over-fitting.A similar interplay is not seen for the erroneous event in Figure 5.However, if we choose a smaller initial window to initialize the DMD model, we observe similar effects as shown in Figure 14 in Appendix C. Similar effects are observed for the windowed DMD as shown in Figure 15 in Appendix C.
To further compare the online DMD with the incremental DMD, we took the raw EEG data, i.e., data without any preprocessing (CAR and BP filtering).In this case, the data matrix is wellconditioned and the Sherman-Morrison identity is well defined.The performance of online DMD and the incremental DMD in terms of predicting future signal is shown in Figure 6.In this case, both incremental and online DMD algorithms have similar poor performance.This suggests that the difference in the performance of these algorithms is primarily due to ill-conditioned dataset.This also highlights the utility of applying appropriate band-pass filtering to the raw EEG data.Without such filtering, the key activities seem to be lost in the background noise resulting in a poor prediction performance.
Topographical views for the real part of the four dominant DMD modes at 200 msec obtained from the incremental DMD algorithms for the correct and erroneous events are shown in Figure 7 and Figure 8, respectively.A comparison between the DMD modes during the erroneous and correct events shows that there is a stronger activity in the frontal lobe during the erroneous events especially around the FCz channel.This is consistent with the topographical views from EEG recordings shown Figures 2a and 2b that show stronger activity in the frontal lobe during an erroneous event.This illustrates that a few principal DMD modes are able to capture the dominant activity in the brain during these experiments.
Figure 9 shows the logarithm of eigenvalues of the DMD operator at 200 msec associated with different thresholds σ thr , weight factor ρ and time-window width.For smaller value of σ thr , the range of eigenvalues in the left half plane is bigger.This is consistent with the fact that smaller σ thr implies a larger reduced order model in which some states converge to zero much faster than other states.For the erroneous event, some eigenvalues are spread in the right half plane, which suggest a locally unstable dynamics underlie the evolution of ERP signal during erroneous trials.This is also consistent with the faster dynamics (sharper peaks) in the EEG signal during the errorneous event (see Figure 2).
We now investigate the effectiveness of the incremental DMD algorithms in terms of reconstructing the ERP signals during the correct and erroneous events at FCz channel.To this end, we reconstruct the signal starting from the onset of the events until the end of the third peak at 330 msec.The reconstructed signals and the normalized RMS error for the reconstruction are shown in Figures 10 and 11, respectively.These results indicate that the incremental DMD models incur smaller reconstruction error than the online DMD models.This result is counter-intuitive, since online DMD uses more modes and should lead to smaller reconstruction error.This discrepancy happens due to the fact that the data matrix never achieves full rank and the online DMD is not able to overcome the heuristic initialization discussed above.There is no significant difference in the reconstruction error for incremental DMD models with different ρ and σ thr values as shown in Figure 10a.
Finally, we summarize the above investigation into the utility of incremental and online DMD in modeling EEG data.It appears that the pre-processing of the EEG data that requires domainspecific knowledge such as the band of frequency in which the event of interest is observed is vital to obtain sensible DMD-based data-driven models.However, after such pre-processing, the data matrix may become ill-conditioned and under such scenarios incremental DMD techniques proposed in this paper appear more promising that the online DMD techniques.There exists a trade-off between the threshold on the singular values used for model reduction and the windowwidth or the discount factor used in the incremental DMD algorithms.We also observed that the principal DMD modes are consistent with the activity in the brain during the studied experiments.This suggests that the incremental DMD algorithms can compute an efficient basis for describing the evolution of the EEG activity.Finally, we observed that under erroneous events, some DMD eigenvalues moved towards the right-half complex plane suggesting that certain events trigger (small time) unstable dynamics.

Incremental Dynamic Mode Decomposition for Systems with Control Input
In this section, we extended the incremental DMD algorithms to the case of non-autonomous dynamical systems.We first recall the Dynamic Mode Decomposition with Control (DMDc) algorithm proposed in [24] that estimates time-invariant non-autonomous system dynamics underlying high-dimensional data.For the scenarios, in which the non-autonomous system underlying the measurements is timevarying, the DMDc algorithm has been extended to the online DMDc algorithm [32].The online DMDc algorithm operates similarly to the online DMD algorithm and yields time-varying matrices Similar to the online DMD algorithm, the DMDc algorithm also does not allow for incremental computation of the lower dimensional non-autonomous system underlying high dimensional measurements.In the following, we extend the DMDc algorithm to the weighted and windowed incremental DMDc algorithms that enable us to obtain a lower-dimensional time-varying approximation for the underlying dynamics.

Weighted Incremental DMDc Algorithm
Assume that at sampling time t k+1 , the measurements and the associated exogeneous inputs are arranged in the following datasets   respectively, where X k , Y k ∈ R n×k , Γ k ∈ R l×k , and X c k ∈ R (n+l)×k .At sampling time t k+1 , the datasets are updated with a new set of measurements x k+1 , y k+1 , γ k+1 which results the following Suppose that the weighted DMDc operator and input matrix at t k+1 , and the SVD of both X k and X c k are known, then according to the new incoming measurements at t k+2 , the updates for the weighted DMDc operator and input matrix can be obtained using the following theorem.
x k be known.Assume that the pair (A ρc k , B ρc k ) minimizes the cost function in (11) with X k and Γ k defined in (59).Consider that at sampling time t k+2 , a new triplet of measurements (x k+1 , y k+1 , γ k+1 ) is used to incrementally compute the SVD of both X c k+1 and X k+1 using Proposition 1.Then, the pair (A ρc k+1 , B ρc k+1 ) which minimizes the cost function in (11) is Proof.Consider the SVD of and V c x k ∈ C k×(n+l) , and the SVD of Then, the weighted incremental SVD presented in Proposition 1 can be applied to obtain the SVD of Given the SVD of x k , the updated SVD for X x k+1 is given by (23).Following the same steps presented in the proof of Theorem 1 with the SVD updates in (61), we can obtain the following weighted incremental DMDc update where Then, the update in (62) yields the following updates for Assume that X c k+1 can be well-approximated by its projection onto its leading p singular vectors such that X c k+1 ≈ Xc k+1 = Ūc x k+1 ∈ C l×p .Then, the update in (62) yields the following updates for Āρc k+1 ∈ R n×n and Bρc k+1 ∈ R n×l : Assume that X k+1 is well-approximated by its projection onto its leading r singular vectors such that X k+1 ≈ Xk+1 = Ūx k+1 Σx k+1 V * x k+1 with r ≤ p. Then a reduced order model can be represented as: where the lower-dimensional approximation Ãρc k+1 ∈ R r×r and Bρc k+1 ∈ R r×l can obtained by pro-jecting Āρc k+1 and Bρc k+1 onto a subspace spanned by the columns of Ūx k+1 as follows: The update in equation ( 62) requires only the current DMD operator Ḡρc k , measurements set of (y k+1 , x k+1 , γ k+1 ), and the incremental SVD update.Specifically, these updates do not require the large data matrix to be stored.

Windowed Incremental DMDc Algorithm
Assume that at sampling time t k+1 , the past w states and control input measurements are arranged in the following datasets where χ k , Υ k ∈ R n×w , Γ l ∈ R l×w , and χ c k ∈ R (n+l)×w , respectively.At sampling time t k+1 , the datasets are updated by adding a new set of measurements x k+1 , y k+1 , γ k+1 , and removing the oldest set of measurements x k−w+1 , y k−w+1 , γ k−w+1 , such that: Suppose that the windowed DMDc operator and input matrix at t k+1 , and the SVD of both χ k and χ c k are known.Then, according to the new incoming measurements at t k+2 , the updates for the windowed DMDc operator and input matrix can be obtained using the following theorem.
χ k be known, and let the pair (A wc k , B wc k ) minimizes the cost function in (11) with Υ k and χ k given in (63).Assume that at sampling time t k+2 , a new triplet of measurements (x k+1 , y k+1 , γ k+1 ) is used to incrementally compute the SVD of χ c k+1 and χ k+1 using Proposition 2 and equation (34).Then, the pair (A wc k+1 , B wc k+1 ) which minimizes the cost function in (11) is Proof.Let q c be defined by .
Then, the windowed incremental SVD presented in Proposition 2 and equation (34) can be applied to obtain the SVD of χ c k+1 = U c χ k+1 Σ c χ k+1 V c * χ k+1 as follows Similarly, given the SVD of can be obtained using the updates in (34).
The same steps presented in the proof of Theorem 2 can be followed with the SVD update in (67) to obtain the following update where U ca χ k+1 ∈ C n×n and U c b χ k+1 ∈ C l×n .Then, the updates in (68) yields the following updates for A wc k+1 ∈ R n×n and B wc k+1 ∈ R n×l If χ c k+1 can be well-approximated by its projection onto it's leading p singular vectors such that χc k+1 = Ūc Vc * χ k+1 , where Ūc χ k+1 ∈ C (n+l)×p , Σc χ k+1 ∈ C p×p ,and Vc χ k+1 ∈ C w×p , then equation (68) yields the following updates for Āwc k+1 ∈ R n×n and Bwc k+1 ∈ R n×n : where Ūca For simplicity of exposition, in Section 6.2 and 6.1, we assumed that the SVD of the data matrix X k (χ k ) and the augmented data matrix X c k (χ c k ) is known.However, if the dimension of the input γ k is small, or if the large dimension of data matrix leads to storage concerns for two SVDs, it may be beneficial to store only the SVD of X k (χ k ) and subsequently, use incremental SVD updates to first compute SVD of X k+1 (χ k+1 ) and then compute the SVD of X c k+1 (χ c k+1 ).The former can be accomplished using the incremental SVD updates described in Section 3, while the latter update is described in Appendix D.

Numerical Illustration for the Incremental DMDc Algorithm
In this section, we illustrate the efficiency of incremental DMDc algorithms.To this end, we generate a time-varying linear dynamical system as follows.We generate random linear discrete time invariant system matrices A ∈ R n×n and B ∈ R n×l using MATLAB function drss.These system matrices were used to generate the following time-varying linear system:

Conclusion and Future Direction
In this paper, we developed algorithms for incremental computation of time-varying dynamic mode decomposition for autonomous and non-autonomous systems.In contrast to existing algorithms, these algorithms rely on incremental singular value decomposition to update singular values of data matrix and allow for computation of reduced order model at each time step.These algorithms are particularly useful for the cases in which the data matrix is singular and incremental matrix inversion based algorithms cannot be applied.We applied the proposed algorithms to an EEG dataset associated with error-related potentials and showed the efficacy of the algorithms in terms of predicting the future EEG signal.We also illustrated that the principal DMD modes obtained were consistent with the EEG activity seen in the brain.There are several interesting future research directions for this work.First, the proposed approach can be adapted to apply the incremental updates on a nonlinear mapping of the measurements.This could be done in the spirit of extended dynamic mode decomposition proposed in [31].Such extension may further improve the predictive power of the computed models.Second, it would be interesting to conduct human-in-the-loop experiments in which the EEG data is used in real-time to computed a DMD-based model of human performance and the control is design to improve the performance.

A Proof of Lemma 1
The least squares cost function in (8) can be written as The gradient of the cost function in (70) is After setting ∂J(A k ) ∂A k = 0, the following system of normal equations is obtained A unique solution for the system in (72) is given by if and only if (X k X * k ) is invertible.It is known [19] that (X k X * k ) is invertible if and only if KerX * k is trivial.
Let the SVD of X k be defined by X k = U x k Σ x k V * x k .Since KerX * k is trivial, it follows that (U x k Σ x k ) is invertible.Substituting the SVD of X k in (73) and leveraging the invertiblity of (U x k Σ x k ), we obtain We now consider the case in which KerX * k is a non-trivial subspace.Then, the set of all solutions for the system in (72) can be written as {A k = Āk + Āk0 : Ā * k ⊥ KerX * k , Ā * k0 ∈ KerX * k }, where Āk satisfies the system in (72), and Āk0 represents all non-zero solutions which satisfy Āk0 (X k X * k ) = 0. Let P KerX * k be the orthogonal projection matrix onto the KerX * k , and P RanX k = I − P KerX * k be the orthogonal projection matrix onto the RanX k .Then where the last equality in (75) holds because P RanX k = 1, it follows that Āk ≤ A k for all A k satisfies the system in (72).
Assume that matrix X k can be approximated by its projection onto the leading r singular vectors associated with the r non-zero singular values in the reduced SVD of X k ≈ Ūx k Σx k V * x k .Then, the orthogonal projection matrix onto the RanX k can be defined as P RanX k = Ūx k Ū * x k .Consequently, the unique minimum norm solution Āk can be written using (74) as The uniqueness of Āk follows from the strict convexity of the norm operator.

C ERP Prediction Error for Incremental DMD Algorithm with Different Initial Window Sizes
Figure 14 show the normalized RMS for the prediction error of the ERP signal during the correct and erroneous events using weighted incremental DMD with initial window size of w 0 = 128 samples, and Figure 15 shows the normalized RMS for the same error using windowed incremental DMD with w = {64, 128, 256}.

D Incremental Singular Value Decomposition for Data Matrix Augmented with Exogeneous Input
In this section, we present an incremental update to compute the SVD of the data matrix augmented with the exogeneous input.This update can be beneficial in case of low-dimensional control input or limited storage capacity.Assume that at t k , the state measurements are arranged in a matrix denoted by X k ∈ R n×qm , and the exogeneous input measurements are arranged in a matrix denoted by Γ k ∈ R l×qm , where (q m = k) for weighted incremental SVD and (q m = w) for windowed incremental SVD.Both X k and Γ k are arranged in an augmented matrix denoted by X c k ∈ R (n+l)×qm as follows: where for weighted incremental SVD we have and for windowed incremental SVD we have Let q n n for weighted incremental SVD and q n q for windowed incremental SVD, where q is given in (26).Then, the SVD of X c k can be calculated by updating the SVD of X k using the following proposition.Proposition 3. Let at sampling time t k , the SVD of X k = U k Σ k V * k be known, where U k ∈ C n×qn , Σ k ∈ C qn×qm , and V k ∈ C qm×qm .Then, the SVD of dataset in (76) defined by where , Σ c R , and V c R are given by the following SVD: Proof.The new dataset matrix X c k can be represented in term of X k using the following additive update formula: where Z n+1 = 0 0 . . .I l T ∈ R (n+l)×l , where I l is the identity matrix of order l.Then, the new dataset in (78) can be written as: Let define the matrix R c

Fig. 1 :
Fig. 1: Topographical view for EEG channels with the channel FCz, where the ErrPs can be characterized, marked in red bold font

Fig. 2 :
Fig.2:The average response to an event at t = 0: the mean ERP (confidence level = 95%) at the FCz channel (left panel) and the topographical view for brain activity across all channels (right panel).The top and middle panels show the patterns during the correct event and the erroneous event, respectively.The bottom panel shows that ErrP obtained by subtracting the signal associated with the correct event from that of the erroneous event.The topographical views are shown at the three characterizing peaks that occur at 200 msec, 260 msec, and 360 msec, respectively.

Fig. 3 :
Fig.3: The mean of the normalized RMS prediction error computed over all iterations as well as the associated 95% confidence sets for (a) correct events and (b) erroneous events.

Fig. 5 :
Fig. 5: Normalized RMS error for the predicted ERP signal at channel FCz for correct events (left panel) and erroneous events (right panel), using (a) incremental DMD, and (b) online DMD.

Fig. 6 :
Fig. 6: The predicted ERP signal at channel FCz based on well conditioned EEG datasets using incremental DMD model (left panel) and online DMD model (right panel) during (a) correct events and (b) erroneous events.

Fig. 7 :
Fig. 7: Topographical views for the real part part of the 4 dominant DMD modes during correct events using threshold values of (a) σ thr = 0.01, and (b) σ thr = 0.001.

Fig. 8 :
Fig. 8: Topographical views for the real part of the 4 dominant DMD modes during Erroneous events using threshold values of (a) σ thr = 0.01, and (b) σ thr = 0.001.

Fig. 9 :
Fig. 9: The left panel show DMD eigenvalues for σ thr = 0.01 and the right panel shows the DMD eigenvalues for σ thr = 0.001 during (a) correct events (b) erroneous events

Fig. 11 :
Fig. 11: Normalized RMS error for the reconstructed ERP signal at channel FCz for correct events (left panel) and erroneous events (right panel), using (a) incremental DMD, and (b) online DMD.

Theorem 4 .
Let at sampling time t k+1 the SVD of χ , and V c χ k ∈ C w×w .Define the matrices Śc k and Ŝc k with their associated SVDs by 69)where A k = (1 + sin ωk)A, and B k = (1 + sin ωk)B and γ k ∈ R l is selected as an i.i.d.sequence of standard Gaussian random vector.The system in (69) is used to run a simulation for m n time steps, to obtain two datasets X, Y ∈ R n×(m−1) , with ω = 1, = 0.001, m = 200, n = 20, and l = 2. Initial models are estimated using DMD and DMDc algorithms using using initial windows ofX = x 1 , • • • , x 40 , Y = y 1 , • • • ,y 40 , and Γ = γ 1 , • • • , γ 40 .The weighted (ρ = 0.9) and windowed (w = 40 samples) incremental DMD and DMDc algorithms are applied at each iteration on the generated data x k , y k , γ k , for k ∈ {41, • • • , 200}, to update initial DMD model and DMDc model.The two models are also used to predict 10 future state vectors at each iteration.The Frobenius norm of the prediction error for incremental DMDc (blue line) and the incremental DMD (red line) are shown in Figure12.The incremental DMDc models have higher prediction accuracy because of its ability to characterize the relationship between the states and the control input which is vital for any predictive model.The windowed incremental DMDc algorithm appears to have smaller prediction error than the weighted incremental DMDc algorithm.

Fig. 12 :
Fig. 12: The Frobenius norm of prediction error for a future-window of 10 samples using (a) weighted incremental DMD (red line) and weighted incremental DMDc(blue line), and (b) windowed incremental DMD (red line) and windowed incremental DMDc (blue line).

Fig. 13 :
Fig. 13: Normalized RMS error for a future-window of 64 samples of EEG states at channel FCz using incremental DMD with σ thr = 0.01 (left panel), incremental DMD with σ thr = 0.001 (middle panel), and online DMD (right panel) for (a) correct event, and (b) erroneous event.

Fig. 14 :Fig. 15 :
Fig. 14: Normalized RMS error of ERP prediction using weighted incremental DMD with initial window of 128 samples during (a) correct event, and (b) Erroneous event.
, and V c R ∈ C qm×qm .Then after χ k+1 ∈ C n×p and Ūc b χ k+1 ∈ C l×p are defined such that Ūc * χ k+1 = [ Ūca * ∈ R r×l can obtained by projecting Āwc k+1 and Bwc k+1 onto a subspace spanned by the columns of Ūχ k+1 as follows: Ãwc χ k+1 with r ≤ p. Then a reduced order model can be represented as:xk+1 = Ãwc k xk + Bwc k γ k ,where the lower-dimensional approximation of Ãwc k+1 ∈ R r×r and Bwc k+1