Topological reconstruction of sub-cellular motion with Ensemble Kalman velocimetry

Microscopy imaging of plant cells allows the elaborate analysis of sub-cellular motions of organelles. The large video data set can be efficiently analyzed by automated algorithms. We develop a novel, data-oriented algorithm, which can track organelle movements and reconstruct their trajectories on stacks of image data. Our method proceeds with three steps: (ⅰ) identification, (ⅱ) localization, and (ⅲ) linking. This method combines topological data analysis and Ensemble Kalman Filtering, and does not assume a specific motion model. Application of this method on simulated data sets shows an agreement with ground truth. We also successfully test our method on real microscopy data.


1.
Introduction. Cell physiology depends on the motion of sub-cellular structures, furthermore, cytosolic streaming and organelle motility are critical factors [4,5,11,40,51]. Such intracellular motion is particularly pronounced in plant cells and is known to be essential to many cellular functions including growth and overall health [33]. In particular, organelle motility in plant cells is driven by motor proteins that move directionally along myosin filaments or diffuse in the cell sap and occasionally switch between these modes. Additionally, different motor proteins generate patterns of motion with different characteristics such as speed, turning angles, switching frequencies between different motions. Due to the complex nature of these underlying dynamics, an understanding of organelle motility based on first principles remains uncharacterized. Instead, intracellular motion is commonly studied experimentally.
Conventional fluorescence microscopy is one of the most popular techniques employed for the direct observation of intracellular motion [38,36,39,54]. With the 2. Methods.
2.1. Description of datasets. Datasets that capture the motion of organelles through conventional fluorescence microscopy are typically provided in a video format [17,27,45]. Essentially, each video-dataset consists of a stack of pixelated images D = {F n , t n } N n=1 , where each image F n is obtained at time t n during the course of an experiment.
Ignoring imaging artifacts caused by finite frame rate, dead time, or rolling shutter [27] that are insignificant on the time-and space-scales involved in plant microscopy [29,31], we consider images obtained at time levels {t n } n that start at the experiment's onset and end with the experiment's conclusion, denoted t 1 = 0 and t N = T , respectively. Further, we consider intermediate time levels that remain equidistant t n = (n − 1)∆t, where ∆t = T /(N − 1) is the exposure period used for the acquisition of the images.
In turn, each image F n is an array of intensity values {I p n } P p=1 , where I p n denotes the intensity [18,49], recorded at time t n of a pixel located at a fixed position x p ∈ R 2 . We assume that the positions of the pixels {x p } P p=1 are given and that they are reported in physical units in the same coordinate system as the sample under imaging. Ignoring positioning parallel to the optical axis (i.e., ignoring zdepth), which is not captured in conventional fluorescence microscopy [19,27,28], we consider R 2 as a plane perpendicular to the optical axis (for example, without any loss of generality, R 2 can model the focal-or xy-plane).
Depending upon the imaging equipment employed in the experiment (e.g., cameras or other light detectors), intensities may be reported in various forms such as photon or electron counts, voltages, currents, ADU (Analog Digital Unit), etc [18,20,22,30,49]. In this study we assume {I p n } n,p are given in normalized gray scale values, i.e., I p n are measured in arbitrary units (a.u.), with the convention that lower intensities correspond to darker pixels and vice versa higher intensities correspond to brighter pixels.
To initiate our method, we model each intensity I p n as consisting of a background signal B p n , the signal produced by the organelles in the sample J p n , and noise n p n . That is, we model I p n as I p n = B p n + J p n + n p n . To find the locations of organelles, we adopt part of the data preprocessing steps and the Bayesian localization step in [44], briefly summarized in the following. In plant microscopy, typically the background signal changes smoothly across the frames. Therefore, we model it as a smooth surface over the entire field of view and remove it by least square fitting. Next, we model the organelle signal as a sum of Gaussian intensity peaks where each peak, labeled by s, is produced by a single organelle [43] that is imaged with maximum intensityh s n > 0, widthw s n > 0, and centerx s n ∈ R 2 . We obtain the total number of organelle peaksS n , present in each time level t n , through thresholding; while we obtain the organelle features {(x s n ,h s n ,w s n )}S n s=1 through the maximum a posteriori estimates [7,14] of a Bayesian model that assumes: (i) the noises {n p n } p are independent and Gaussian; (ii) the organelles are a priori uniformly positioned over the imaged plane; and (iii) the maximum intensities and widths are a priori distributed over appropriate finite intervals. Ignoring imaging artifacts that are caused by intra-frame motion, which are insignificant in plant microscopy [29,31], we model each localizationx s n ∈ R 2 , as the effective position of a single organelle at time t n . In other words, following the localization procedure above, we obtain a collection of space-time positions T ] that reveals the positions of every organelle in the sample only at the experimental time levels {t n } n , see Fig. 1. Figure 1. The motion of organelles, during an experiment starting at t 1 = 0 ending at t N = T , is identified at discrete times t n (dots). For simplicity, space is represented with one dimension, although real datasets are two dimensional. The black dots represent the locations of organelles at different time levels.R is the set contains the locations of all black dots.
To proceed with the analysis, we model each organelle's effective position as an idealized point and its motion as a 2D trajectory ignoring motion parallel to the optical axis, which is not captured in a typical dataset. Thus, each organelle, labeled by a, in our formulation corresponds to a continuous function r a : [0, T ] → R 2 , where [0, T ] represents the time course of the experiment and R 2 represents any plane compatible with the pixel positions {x p } p ⊂ R 2 . Given a dataset D of raw experimental observations and a collection of organelle space-time localizationsR identified as described above, our main objective from now on will be the computational reconstruction of {r a } a .

2.2.
Ensemble Kalman velocimetry. The motion of the entire set of organelles in the experiment can be encoded within a family of displacement fields f t→t (·) : R 2 → R 2 , which we use (see Sect. 2.3, below) to distinguish space-time positions that are visited by each organelle. According to our convention, for any organelle, its positions x, x ∈ R 2 at times t, t ∈ [0, T ] respectively are coupled by the displacement fields x = x + f t→t (x) and x = x + f t →t (x ). At the spatiotemporal scales probed by conventional fluorescence microscopy, organelles follow irreversible dynamics [37]. Accordingly, the fields f t→t (·) and f t →t (·) are generally uncorrelated. So, below we incorporate such lack of correlation by adopting a formulation with different forward and backward fields instead of a formulation using only a single field for both temporal directions.
In general, the driving dynamics of organelle motion are unknown, thus the precise form of the fields {f t→t (·)} t,t is unknown as well. Next, we describe a method to estimate these fields directly from the raw images in D. For our purpose, it is sufficient to compute displacement fields only at successive time levels. In particular, we are only interested in 1-level forward f n,+ (·) : R 2 → R 2 and 1level backward f n,− (·) : R 2 → R 2 fields, defined by f n,+ (·) = f tn→tn+1 (·), n = 1, . . . , N − 1, and f n,− (·) = f tn→tn−1 (·), n = 2, . . . , N , respectively.

Figure 2.
Here,x j is the position of an organelle producing the images shown (gray),f n,+ andf n,− illustrate 1-level forward displacement and backward displacement ofx j , respectively. For clarity, the image produced by the organelle are shown as multi-peaked and space as 1D.
We compute the displacement fields following a velocimetric approach. We first compute displacements {{f j n,+ } N −1 n=1 , {f j n,− } N n=2 } J j=1 ⊂ R 2 at the discrete time levels t n , for n = 1, . . . , N , of the images in the dataset D and arbitrarily selected discrete In particular, given a selected positionx j , we compute the displacementsf j n,+ ,f j n,− by image registration method between a sub-region of pixels, centered aroundx j , in image F n and the images F n+1 , F n−1 , respectively. Briefly, we consider a transformation T δ,θ : R 2 → R 2 that translates by δ ∈ R 2 and rotates by an angle θ ∈ [0, 2π). Further, we considerĀ j gathering all pixels P such that x j − x P ∞ ≤ w max , where w max > 0 is a parameter controlling the side length of the region under registration, and set to a small multiple of the typical organelle size. The image registration reduces to solving the following minimization problems f j n,+ = arg min Additionally, to exclude arbitrarily large displacements, we restrict each minimization over only displacements δ ≤ d max , where d max > 0 is an upper bound on the longest distance an organelle can travel during one exposure ∆t.
To extend our discrete displacements over the entire R 2 support, obtain globally defined displacement fields, and account for the errors introduced in prediction, we adopt a representation of the forward field and a similar representation for the backward field where Ψ + (·) : R 2 → R 2 and Ψ − (·) : R 2 → R 2 describe how the displacement fields change from one frame to its immediate ancestor and predecessor. Ψ + (·), Ψ − (·) could be motion equations of a dynamic system if it was known or just ansatzes based on previous experience. There is a special case when Ψ + (·), Ψ − (·) are identity functions, this happens when one trusts the displacement fields reserve the same trend as the previous level. The driving noise {u n,+ (·)} N −1 n=1 and {u n,− (·)} N n=2 are independent Gaussian processes with mean zero and covariances that correlates the x or y components of the displacement fields according to a kernel K(·, ·) : To facilitate the computations, we leave the x and y components independent from each other. To ensure smooth fields that do not change rapidly across organelles, we use the squared exponential kernel where σ 2 u > 0 is a constant and we set > 0 approximately equal to the diameter of a single organelle. Here σ 2 u presents the credibility of prediction system, i.e., if Ψ + (·), Ψ − (·) are quite believable then σ 2 u should be chosen to be small, vice versa, if Ψ + (·), Ψ − (·) are uninformative then σ 2 u should be relatively large. Due to imperfections in image registration, the displacements To account for such errors, we combine these fields with the dis-  An implementation of Eqs. (3)- (6), which apply in continuous space, is computationally intractable yielding to a pertinent discretization. Precisely, we apply a grid of fixed positions {x λ } Λ λ=1 ⊂ R 2 that may not, in general, coincide with {x j } j . Next, let φ n,+ (·) : R 2 → R, with n = 1, . . . , N − 1, denote the x component of the displacement field f n,+ (·).
We consider the Ensemble Kalman Filtering (EnKF) to compute the posterior point estimates. Let notation [· j ] denote the vector, which contains all elements corresponding possible value j. In Eqs.
Then the Kalman gain denoted by G n is calculated by G n = C n (C n + R) −1 . It controls the weight of the predictions φ (e) n and observation [φ j n,+ ] to be involved in our approximation [φ j n,+ ], where the updated estimation [φ j n,+ ] is obtained by updating the sample mean of predictions with the observation in the following way, Actually, the updated estimation [φ j n,+ ] is a weighted sum of predictions φ (e) n and observation [φ j n,+ ] depends on σ 2 u the credibility of prediction system and σ 2 v the reliability of observation. Suppose the observation is more reliable, meaning σ 2 v < σ 2 u and σ 2 v → 0, then lim σ 2 v →0 G n = I, and lim σ 2 v →0 (I − G n ) = 0, hence [φ j n,+ ] in Eq. (11) has less information from the mean of predictions m n , but contains more information from the observation [φ j n,+ ], therefore the observation holds a heavier weight in the updated estimation. Conversely, suppose the prediction process is more trustworthy, or equivalently, σ 2 u < σ 2 v and σ 2 u → 0, the updated estimation weights the predictions more heavily [2]. At the end of every iteration of an EnKF, samples in the ensemble set are further updated as opposed to Kalman Filter where the mean value is considered in the next iteration. Thus, after all iterations, a sequence of updated estimations {[φ j n,+ ]} N −1 n=1 is obtained by applying EnKF. A complete detailed algorithm of EnKF for Organelle Velocimetry is provided in the Appendix.
The y component of forward fields and backward fields filtering process works in a similar way. With the discreterized equations given in Appendix and the approach depiction in Fig. 3, one could perform the EnKF to obtain the improved 2.3. Topological reconstruction. Given a collection of organelle space-time lo-calizationsR and appropriate displacement fields {f n,+ (·)} N −1 n=1 and {f n,− (·)} N n=2 , our goal is to computationally reconstruct {r a } a . Of course, because the reconstruction of {r a } a in continuous time is impossible without a motion model capable of time interpolation, which is unavailable for plant organelles, we focus on reconstructing trajectories {r a } a that are discretized at time levels contained inR, i.e.,r a = {r a (t n )} n . As we show below, for such discrete reconstruction the computed displacement fields {f n,+ (·)} N −1 n=1 and {f n,− (·)} N n=2 are sufficient. We adopt a similar linking process as in [44]. Our algorithm (described in depth below) proceeds in three stages. See Fig. 4 for visual representation. In the first stage, we embedR into Then for any two points ( where · is the Euclidean norm in R 2 and α > 0 is constant. Since d(·, ·) is a distance in R 2 × [0, T ], our main object of interest R ⊂ R 2 × [0, T ] inherits the topological properties of a metric space [13,35,50]. Essentially, R consists of the points in space-time R 2 × [0, T ] that are visited by the organelles during the experiment. Although R globally captures the motion we are interested in revealing, it leaves individual trajectories indistinguishable. Accordingly, in the second stage, we partition R into components {R a } a such that each R a corresponds to a single trajectory r a , i.e., we partition R = ∪ a R a such that R a = (r a (t), t) t∈[0,T ] ⊂ R 2 × [0, T ]. The partitioning of R can be computationally achieved through construction of the appropriate topological nerve [12] via the Mapper algorithm [6,46]. Briefly, for any τ > 0, such that τ < ∆t, we consider the overlapping intervals {T n } N −1 n=1 defined by which are associated with the time levels {t n } n of the provided dataset. For any (x, t) ∈ R we consider the temporal projection P R : R → [0, T ] defined by Due to continuity, {P −1 R (T n )} N −1 n=1 ⊂ R forms an open covering of R. By its definition, each pre-image P −1 R (T n ) ⊂ R contains segments of at least one organelle trajectory, however, due to its inherited topology, each trajectory segment corresponds to only a single connected component within P −1 R (T n ) ⊂ R. Consequently, partitioning R into connected components is achieved by, first partitioning each P −1 R (T n ) into its connected components {S m,n } Mn m=1 and, computing subsequently the nerve of the entire resulting family of components {{S m,n } Mn m=1 } N −1 n=1 ⊂ R, which is also an open covering of R. Lastly, in the third stage, we readily obtain discrete trajectoriesr a by intersecting R a ∩R. To partition each P −1 R (T n ) into its connected components {S m,n } m , we consider and {f n,− (·)} N n=2 have been already computed, we can use to topologically characterize trajectory segments or equivalently connected components of P −1 R (T n ). Consequently, a computational characterization ofS m,n = S m,n ∩R can be achieved by an agglomerative clustering on P −1 R (T n ) ∩R with linkage . Specifically, the restriction n of in P −1 R (T n ) ∩R, required for each clustering, reduces to

Results.
3.1. Case I: Velocimetry benchmark. The displacement estimation and linking processes are tested on a simulated data set consisting of 20 organelles in 100 frames of video with a time delay ∆t = 1 s. The trajectories are exhibited in Fig. 5. The positions of an organelle in each frame are known and are generated by a diffusion process, which also contains a drift term, given by where W t is a Wiener process, v x = 3 pixel/s, v y = 1 pixel/s, and D = 1 pixel/s. The starting distances between any two adjacent organelles at t = 0 s is 10 pixels.
Given the location of all organelles in each frame, we apply our displacement estimation process detailed in Sect. 2.2 to the data set, then calculate the mean error (in pixels) between the estimated forward (backward) displacement and true displacement frame by frame, along the x-axis and y-axis respectively. The results are shown in Fig. 6. The four histograms, almost all mean errors per frame are around 0.25 pixel and smaller than one pixel, only very few are greater than one pixel and all are smaller than two pixels, we may see these as outliers.
Given the displacements, we apply the linking process of Sect. 2.3, and the results are shown in Fig. 7. All organelles are correctly connected by 20 trajectories, each trajectory spans exactly from t = 0 s to t = 99 s, and the yielding accuracy rate is 100%.
Next we perturb the y-axis direction of the location of simulated organelles, by adding noise following a uniform distribution U (− , ) at every time level, where is the largest perturbation could be added. We will investigate the cases when varies from 1 pixel to 4 pixels. Apply the displacement estimation and linking processes with our algorithm, then count number of total reconstructed trajectories, number shows R, T n in Eq. (12) and P −1 R (T n ) as blue segments; (c) shows R, T n , P −1 R (T n ),R and reconstructed discrete trajectories. For visualization purpose, space is shown in 1D.
of reconstructed trajectories longer than 10 s, number of reconstructed trajectories having 100% agreement with the truth, number of reconstructed trajectories having at least 90% agreement with the truth and number of reconstructed trajectories having at least 50% agreement with the truth. The results are in Table 1.
As shown in Fig. 8, when we increase , the trajectories contain larger fluctuations, and any two adjacent trajectories become closer or even intersect. Thus, larger causes higher difficulty to detect trajectories. From Table 1, when the noise is mild ( < 2.5 pixels), our reconstructed trajectories remain the same; but when becomes large ( ≥ 2.5 pixels), the accuracy rate decreases. In fact, when = 4 pixels, there are no clear patterns for all independent trajectories to be detected, and most of organelles just look scattered in the frame when overlapping all their positions over the time span of the video.    Table 1. Case I: Table of detection result 3.2. Case II: Complex dynamics. Now consider a complex video with 20 organelles in each frame and 100 frames in total. Each frame has a 380 by 380 pixels grid on it. This video has a frame rate of 30 frames per second, which gives ∆t = 33.33 ms. There are multiple filaments hiding in the background, and are not visible in the imagery. Three kinds of motions could happen. An organelle could attach to or detach from a filament, travel along a filament, or move randomly. Moreover, an organelle could go through multiple of these three motions in a single ∆t.
After importing the video as gray scaled images and filtering out the background from each frame, our method detects peaks iteratively and applies Bayesian identification with the following prior distributions: •ñ p n follows normal distribution N (0, 1), • z s n follows uniform distribution over the frame, • h s n follows translated beta distribution with support (50, 150), mode 100, and shape parameter α = 5,   For the approximation of displacement fields using Ensemble Kalman filtering(see Sect. 2.2), since ∆t = 33 ms is considered extremely small, this ensures the displacement fields do not change rapidly from one frame to the very next. Moreover, the displacement fields from one level should partially memorize the trend from the previous level. Thus, lacking more information about the dynamic system, we choose Ψ(x) = √ x to imitate a nonlinear decay in the displacement field. Setting σ u = 5 pixels, σ v = 2 pixels since we want to give more weight to observations, then the forward displacement fields of the 17th frame at t n = 0.53 s is displayed in Fig. 10(a), an area of pixels [140, 230] on x-axis times pixels [260, 350] on y-axis is enlarged in Fig. 10(b). We can easily observe the displacement fields around organelles. The trajectories are reconstructed in Fig. 11. The left panel shows all estimated trajectories in red concentrate upon the light area. The right panel shows trajectories with ground truth trajectories in black. Most of them coincide, except the area where our method cannot do the tracking job perfectly when more organelles collide or stick together. Specifically, we pick four sets of trajectory reconstructions, exhibited in Fig. 12, each panel shows our reconstructions compared with one true trajectory, their mean error are 1.20, 2.24, 2.09 and 1.42 pixels, respectively.
3.3. Case III: Real data. Finally, we consider a real grayscale video with a total of 299 frames, recording the motion of peroxisomes in a plant cell. In plant cells, Figure 11. Case II: Trajectories reconstruction result peroxisomes play a variety of roles including converting fatty acids to sugar and assisting chloroplasts in photorespiration. The spatial resolution for this video is 0.196 micrometers/pixel and the size of each frame is 79 by 662 pixels. The time period between successive frames is 82 ms, that is, ∆t = 82 ms. Fig. 13(a) shows the first frame of the video, the light spots in the frame are peroxisomes and their sizes range from 0.5 to 1 micrometer.
The outcomes of our method applied to this video are shown in Fig. 13(b) and 13(c). We plot the estimated trajectories, which only exist in at least 10 consecutive frames, in Fig. 13(b). We can see that the red trajectories cover almost every highlighted area. In Fig. 13(c), we exhibit all these 116 trajectories in different colors. Mainly two type of trajectories are observed: a long trail when peroxisome is traveling along the filament; a short trail when peroxisome is wiggling in the cell. Our developed method is able to track the peroxisomes in different types of motions for long time intervals.
4. Discussion and conclusion. In this work, we have developed a novel, dataoriented method for the analysis of experimental measurements. Our method combines Topological Data Analysis (TDA) and advanced filtering techniques. Key features of our approach include the adoption of Mapper with necessary modifications, while using Gaussian processes and EnKF to facilitate the clustering involved in the computation of the associated nerve.
Unlike earlier tracking approaches, our method can proceed with or, most importantly, without a motion model. Without invoking a motion model, a reasonable guess may relax a strong requirement in the analysis of biological data, especially those obtained from in vivo microscopy at the level between cellular and molecular. In the opposite case, with a motion model, more precise inference can be incorporated into final estimation. In both cases, we estimate the displacement field from the data. In essence, our approach resembles data-driven clustering. However, our method implicitly assumes a phenomenological motion type that is exclusively informed by the observations. In any case, reconstructed tracks are valid if the computed of estimated displacement fields are consistent.  Fig. 11; the upper right is amplified from the area [40,190] Fig. 11; Comparing to the work in [44], the EnKF works in the case of minimal assumptions. Moreover, it can achieve more precise results if additional knowledge of the dynamic system is enforced. EnKF has the advantage that it can inherit some trends from previous time step, and fit the nonlinear system, which could be more effectively applied when a sophisticated motion model is known in a real world analysis.
Earlier attempts obtained estimates of displacement fields using a combination of heuristics and conventional Kalman Filtering (KF). Given that motion patterns are highly non-linear, conventional KF lacks robustness and no longer works. Here, instead we develop a principled approach that models the displacement fields through Gaussian processes and we apply EnKF, which is shown to successfully estimate the desired dynamics under a wide range of motion conditions. Specifically, we have tested our novel methodology against ground truth trajectories on realistically obtained synthetic data capturing organelle motion and also on real experimental data. Of course, further investigation of the robustness of the reconstructed trajectories is problem specific and needs to consider the spatiotemporal scales that may differ among physical systems.
Despite the key innovations introduced here, our method proceeds by breaking down the tracking problem into separate phases, i.e., identification, localization, linking, which follows the conventional tracking paradigm [27]. We achieve some coupling between these phases in the velocimetry stage. However, full coupling in a manner that allows improved inference in any phase to fuse into all other phases remains to be developed. Such approach has been recently applied successfully on data from confocal microscopy [23,24,53], however, due to the massive size of wide-field datasets, such as those acquired in plant microscopy, extensive modifications are required. For this, future research is required to yield a comprehensive framework where identification, linking and velocimetry are unified and treated simultaneously. Due to high stochastisity in the underlying data, e.g. motion, uncertainty, and detection noise, it is natural to seek such a monolithic approach within the Bayesian context. However, a fully Bayesian approach requires overcoming at least two important barriers: (1) Bayesian foundations of nerves and displacement fields; and (2) non-trivial computational costs which would increase significantly.
In conclusion, we have proposed a novel algorithm to track organelles in microscopy video. Our intracellular tracking algorithm can successfully reconstruct organelle trajectories as we show with example applications to synthetic and real data. This method optimizes parameters of organelles based on data captured in images, combine predictions with observations in estimating organelle movement, and link organelles based on topological analysis.
MCB-1715794 and DMS-1821241. The authors would like to thank Steven Abel and Honor Akenuwa from Chemical & Biomolecular Engineering department in UTK for providing the simulated data set in Case II, and Andreas Nebenführ from Biochemistry & Cellular and Molecular Biology department in UTK for sharing the real data set. The authors thank them all three for providing to us with their useful insight.