On a spatial-temporal decomposition of the optical flow

In this paper we present a spatial-temporal variational decomposition algorithm for computation of the optical flow of a dynamic image sequence. We consider several applications, such as the extraction of temporal motion patterns of different scales and motion detection in dynamic sequences under varying illumination conditions, such as they appear for instance in psychological flickering experiments. For the numerical implementation we are solving an integro-differential equation by a fixed point iteration. For comparison purposes we use the standard time dependent optical flow algorithm, which in contrast to our method, constitutes in solving a spatial-temporal differential equation.


(Communicated by Sung Ha Kang)
Abstract. In this paper we present a decomposition algorithm for computation of the spatial-temporal optical flow of a dynamic image sequence. We consider several applications, such as the extraction of temporal motion features and motion detection in dynamic sequences under varying illumination conditions, such as they appear for instance in psychological flickering experiments. For the numerical implementation we are solving an integrodifferential equation by a fixed point iteration. For comparison purposes we use a standard time dependent optical flow algorithm, which in contrast to our method, constitutes in solving a spatial-temporal differential equation.

1.
Introduction. Analyzing the motion in a dynamic sequence is of interest in many fields of applications, like human computer interaction, medical imaging, psychology, to mention but a few. In this paper we study the extraction of motion in dynamic sequences by means of the optical flow, which is the apparent motion of objects, surfaces, and edges in a dynamic visual scene caused by the relative motion between an observer and the scene. There have been proposed several computational approaches for optical flow computations in the literature. In this paper we emphasize on variational methods. In this research area the first method is due to Horn & Schunck [15]. Like many alternatives and generalizations, the Horn & Schunck method calculates the flow from two consecutive frames. Here, we are calculating the optical flow from all available frames simultaneously. Spatial-temporal optical flow methods were previously studied by Weickert & Schnörr [29,30], [11], [27] and [2], to name but a few. However, in contrast to these paper we emphasize on a decomposition of the optical flow into components, instead of calculating the flow itself.
Variational modeling of patterns in stationary images has been initialized with the seminal book of Y. Meyer [21]. In the context of total variation regularization, reconstructions of patterns was studied first in [25]. Here, we are implementing similar ideas as have been used before for variational image denoising [3,4,5,6,7,13,26] and optical flow decomposition [1,18,31,32,33]. However, a conceptual difference is that we aim for extracting temporal features, while, in all the cited papers the decomposition was for finding spatial components of the flow. We emphasize that the proposed method is one of very few variational optical flow algorithms in a space-time regime and within this class, this algorithm is the only spatial-temporal decomposition algorithm.
The outline of this paper is as follows: In Section 2 we review the optical flow equation. In Section 3 we present analytical examples of the optical flow equation in case of illumination changes. In Section 4 we introduce the new model for spatialtemporal optical flow decomposition. We formulate it as a minimization problem and derive the optimality conditions for a minimizer. In Section 5 we derive a fixed point algorithm for numerical minimization of the energy functional. Finally in Section 6 and Section 7 we present experiments, results and a discussion of them.
For the sake of simplicity of presentation we consider the time interval as the unit interval all along the paper. Differentiation of (1) with respect to t for a fixed x gives Switching from a Lagrangian to an Eulerian description we obtain the optical flow equation (OFE) on Ω: Although the derivation is based on a constant brightness assumption along characteristics, mathematically, (3) even makes sense under varying illumination conditions. However, as we show in simple examples below, standard regularity assumptions on the optical flow are violated when the characteristics degenerate (collapse or originate) or when the illumination changes. Instead of solving (3) usually the relaxed problem is considered, which consists in minimizing the functional subject to appropriate constraints. The following two examples simulate a day to night illumination situation. The optical flow is calculated analytically, and the level lines of f are visualized, which represent the trajectories Ψ of constant brightness. As we show, smoothness of the optical flow is affected by changing illumination and in the first example also by joining of characteristic curves. Example 1. In this example the flow is not even an element of the Bochnerspace L 2 ((0, 1); H −1 (Ω)), meaning that the anti-derivative of u with respect to time is not in L 2 ((0, 1); L 2 (Ω)). Because L 2 ((0, 1); H −1 (Ω)) is a strict superset of L 2 ((0, 1); L 2 (Ω)), the elements have in general less regularity (smoothness). The next Example 2 provides a flow where f models again changing illumination. Here the flow is in L 2 ((0, 1); H −1 (Ω)) but not in L 2 ((0, 1); L 2 (Ω)). We conjecture from the difference of the two examples that the difference in smoothness is caused by the fact that in the first example two characteristics are joining during the evolution.
We consider the 1D optical flow equation, to solve for u in for the specific data f represents a static scenef , which is affected by brightness variations over time, described by g. We are more specific and take: The function f and the level lines are plotted in Figure 1 and the optical flow can be calculated explicitly: indicates a transport of intensities from outside to the center 1/2. We observe that u(1/2, t) and u(x, 1) are singularities of the optical flow. From the definition of u it (7). Level lines of f are parametrized by (Ψ(x, t), t).

Example 2.
This example is similar to Example 1, and simulates again a day to night illumination, with the difference that characteristics of f never join. We consider input data f of the form (6) with for (x, t) ∈Ω := (0, 1/4) × (0, 1). The optical flow is given by Integrating this function over time giveŝ and consequently with This shows that u / ∈ L 2 (Ω) for every β ∈ (0, 1 2 ], butû ∈ L 2 (Ω) for all β ∈ (0, 1). The bottom line of these examples is that illumination changes, such as flickering, may result in singularities of the optical flow and a violation of standard smoothness assumptions of the optical flow (such as u ∈ L 2 ((0, 1); H s (Ω)) for some s > 0). The potential appearance of the singularities motivates us to consider regularization terms for optical flow computations, which allow for singularities over time, such as negative Sobolev-norms. 4. Optical flow decomposition: Basic setup and formalism. In this paper we derive an optical flow model which allows for decomposing the flow field into spatial and temporal components. We consider each frame of the movie {f (·, t) : t ∈ (0, 1)} defined on the two-dimensional spatial domain Ω = (0, 1) 2 .
We assume that the optical flow field is a compound of two flow field components , ∀ x ∈ Ω and t ∈ (0, 1).
Because there appears a series of indices and variables we specify the notation in Table 1. The OFE-equation (3) contains four unknown (real valued) functions j , i, j = 1, 2, and thus is highly under-determined. To overcome the underdeterminacy, the problem is formulated as a constrained optimization problem, to determine, for some fixed α > 0, (9) argmin Instead of solving the constrained optimization problem, we use a soft variant and minimize the unconstrained regularization functional: For the sake of simplicity of presentation, we omit arguments of the functions u (i) j and f , whenever it simplifies the formulas without causing misinterpretations.
In the following we design the regularizers R (i) : • For R (1) we use a common spatial-temporal regularization functional for optical flow regularization (see for instance [30]): is a monotonically increasing, differentiable function satisfying that r → ν(r 2 ) is convex. According to [30] we use with some fixed 0 < 1 and λ > 0. Note, that in [30] the term −1 is not used. However, since it is a constant, it does not influence the optimization. Using this term guarantees that ν(0) = 0. Moreover, we denote by ν the derivative of ν with respect to r. • R (2) is designed to penalize for variations of the second component in time.
Motivated by Y. Meyer's book [21], we introduce a regularization term, which is non-local in time. We have seen in Examples 1 and 2 that u may violate L 2 -smoothness in case of changing illumination conditions. Variations of Meyer's G-norm where used in energy functionals for calculating spatial decompositions of the optical flow [1,17]. It is a challenge to compute the G-norm efficiently, and therefore workarounds have been proposed. For instance [25] proposed as an alternative to the G-norm the following realization for the H −1 -norm: For a generalized function u : (0, 1) → R, they defined Here, we use the following temporal H −1norm for regularization: To see the analogy with the · H −1 -norm from [25] we note that the second primitive of the optical flow component u (2) , as defined in Table (1), satisfies j ( x, 0) = 0, ∀j = 1, 2 and x ∈ Ω.
Then, by integration by parts it follows that and therefore Existence of a minimizer of F, defined in (9), in an infinite dimensional functional analytical setting is a complicated issue. However, for some surrogate model, we can guarantee well-posedness by using the following lemma from [1]: The term is well-defined if |∇f | = 0 and if |∇f | = 0, then we might choose an arbitrary vector in the unit ball. Then A 1/2 0 = 1 |∇f | A 0 (note, for |∇f | = 0 this term is zero), where the root of a matrix is defined via spectral decomposition. Moreover, where, Id ∈ R 2×2 denotes the identity matrix and > 0 is a small regularization parameter. A is uniformly positive definite on Ω. We note that in comparison with [1], we are not using a smoothed version of A, because we already made the After choosing a fixed > 0 we consider minimization of the surrogate model, consisting in minimizing the functional over the Bochner-Space We are proving that the surrogate model attains a minimizer on X. Note that is the space of vector valued, weakly differentiable functions with respect to space and time, and also recall that L 2 ((0, 1); L 2 (Ω; R 2 )) = L 2 ((0, 1)× Ω; R 2 ).
Theorem 1. F s attains a minimizer on X.
Proof. The proof is done by several estimates: n ) be a minimizing sequence of F s , then • For the surrogate model · A is a norm, which is equivalent to the L 2 (Ω; R 2 )norm, and thus ( u n ) is uniformly bounded in L 2 ((0, 1); L 2 (Ω)), and therefore admits a weakly convergent subsequence with limit φ ∈ L 2 ((0, 1); L 2 (Ω)). This subsequence, in turn has a subsequence (which is also denoted by · n , for the sake of simplicity of notation) such that also (∂ t u are weakly convergent in L 2 (Ω × (0, 1); R 2 ), to some µ and ψ, respectively. • Let ζ ∈ L 2 (Ω × (0, 1); R 2 ) be arbitrary, then ( x, t) → 1 t ζ( x, τ )dτ ∈ L 2 (Ω × (0, 1); R 2 ). Using the weak convergence of the subsequence it follows that (22) This means that ( u n ) converges weakly to φ − ψ, and in particular u is uniformly bounded, let us say by D. The last item shows that that ( u n ) converges to φ − ∂ t ψ in a distributional sense.
• The final results follows from the lower semicontinuity of the functionals u → We emphasize that the critical issue in the above proof is the coercivity, which could be enforced by various other modifications of the functional F as well. An alternative possibility is to add a small regularization term for the L 2 -norm of u (1) to the functional F. Since the use of surrogate models has only marginal impact on the numerical reconstructions we ignore their use in the following. 4.1. Energy functional and minimization. We are determining the optimality conditions for minimizers of F introduced in (10). Necessary conditions for a minimizer are that the directional derivatives vanish for all 2-dimensional vector-valued functions h (j) : Ω × (0, 1) → R 2 , j = 1, 2. To formulate these conditions we use the simplifying notation: (E, F) := (E, F)( u (1) , u (2) ), R (i) := R (i) ( u (i) ) and res = ∇f · ( u (1) + u (2) ) + ∂ t f .
Therefore the directional derivative of F in direction h (j) at u = ( u (1) , u (2) ) is given by: where δ ij = 1 for i = j and zero else is the Kronecker symbol. The gradient of the functional F from (10) can be determined by calculating the directional derivatives of E and R (i) , separately.
• The directional derivative of E in direction h (j) is given by res(∇f · h (j) ) d xdt.
• The directional derivative of R (1) at u (1) in direction h (1) is determined as follows: Let us abbreviate for simplicity of presentation , then the directional derivative of R (1) in direction h (1) (which we assume to have compact support in Ω × (0, 1)) at u (1) is given by where integration by parts is used to prove the final identity. • The directional derivative of R (2) is derived as follows: Moreover, it follows by integration by parts of the last line of (26) with respect to t that Now, because of (25) and (24) it follows that the minimizer u (i) , i = 1, 2 has to satisfy for every j = 1, 2, Since equations (24) and (27) hold for all h (2) j , it follows that for every j = 1, 2, Thus the optimality conditions for a minimizer are (28) and (29). The solution of Equation (28) has to be understood in a weak sense. The optimality condition is formal and would be exact if F would be coercive. As we demonstrated above there are several possibilities of surrogate models, which provide coercivity. The surrogate model outlined above leading to this optimality condition would require a Dirichlet boundary condition for u (1) at Ω × {1}. If we would indeed complement F by α 3 u (1) L 2 (Ω×(0,1)) , with small α 3 , then the boundary conditions would in fact be the natural ones.

5.
Numerics. In this section we discuss the numerical minimization of the energy functional F defined in (10). Our approach is based on solving the optimality conditions for the minimizer u (1) , u (2) from (28), (29) with a fixed point iteration. We call the iterates of the fixed point iteration u (i) j ( x, t; k), where k = 1, 2, . . . , K denotes the iteration number and K is the maximal number of iterations. We summarize all the iterates of the components of flow functions u (i) j in a tensor of size M × N × T × K. In this section we use the notation as reported in Table 2.
For every tensor H = (H(r, s, t)) ∈ R M ×N ×T (representing all frames of a complete movie) we define the discrete gradient j (r, s, τ ; k) finite difference approximation of u(·, t) Table 2. Discrete Notation.
Let us notice that r, s, t are discrete indices corresponding to the points in space r∆ x , s∆ y and in time t∆ t of a continuous function H. Moreover, we define the discrete divergence, which is the adjoint of the discrete gradient: Let (H 1 , H 2 , H 3 ) T (r, s, t), then The realization of the fixed point iteration for solving the discretized equations (28) and (29) reads as follows: • Initialization for k = 0: we initialize to 0 we leave out all indices denoting space and time positions.
where ∆ τ is the step size iteration parameter, which has been set to 10 −4 . In order to improve stability of the algorithm and ensure convergence we use a semi implicit iteration scheme as proposed in [30]. Indeed, let us notice that in (32), (33),(34), (35) one of the components of the two flow fields u (1) , u (2) on the right hand side is evaluated for iteration index k + 1. The system (32), (33), (34), (35) can be solved efficiently using the special structure of the matrix equation (see [29,30]). The matrix equation then is a sparse block tridiagonal matrix.This type of matrices allows for efficient estimation of a solution through decomposition and parallelization methods [19]. The iterations are stopped when the Euclidean norm of the relative error drops below the precision tolerance value tol = 0.05 for at least one of the component of u (1) and one of u (2) . The typical number of iterations is much below 100. except for the discretization parameters ∆ x , ∆ y , ∆ t , defined in Section 5. In this work we consider the following four dynamic image sequences: • The first experiment is performed with the video sequence from [20] (available at http://of-eval.sourceforge.net/) which consists of forty-six frames showing a rotating sphere with some overlaid patterns. The analytical results from Appendix A in 1D suggest that the intensity of the u (2) component increases monotonically with increasing rotational frequency over time. We verify this hypothesis numerically, now in higher dimensions. We simulate in particular two, four and eight times the original motion frequency;: In order to do so, we duplicate the sequence periodically, however consider it to be recorded in the same time interval (0, 1). The flow visualized in Figure 6 is the one between the 16th and the 17th frame of every sequence. We study the behavior of the sphere at different motion frequencies with the same parameter setting α (1) = 1, α (2) = 1 4 . The numerical results confirm the 1D observation that for high rotational movement u (2) is dominant (cf. Figures 6) and u (1) is always 20% of u (2) ; in particular u (1) and u (2) cannot be completely separated. . u (2) at different frequencies of rotations: 2, 4 and 8× faster than the original motion frequency. α (1) = 1, α (2) = 1 4 . The intensity of u (2) increases when the frequency of rotation is increased.
• The second experiment concerns the decomposition of the motion in a dynamic image sequence showing a projection of a cube moving over an oscillating background. The movie consists of sixty frames and can be viewed on the web-page http://www.csc.univie.ac.at/index.php?page=visualattention.
The background is oscillating in diagonal direction, from the bottom left to the top right, with a periodicity of four frames. In each frame the oscillation has a rate of 5% of the frame size. The flow visualized in Figure 6 is the one between the 20th and the 21st frame of the sequence. Applying the proposed method with a parameter setting α (1) = 10 3 , α (2) = 10 3 , ∆ τ = 10 −5 , and precision tolerance tol = 0.001, we notice that the background movement appears almost solely in u (2) and the global movement of the cube appears in u (1) . In Figure 6 we represent only flow vectors with magnitude larger than 0.05 and omit in u (2) the part in common with u (1) for better visibility. • In the third experiment the original movie consists of thirty-two frames and can be viewed together with the decomposition result on the web-page http: //www.csc.univie.ac.at/index.php?page=visualattention. The flow is decomposed into two components. The first part shows the movement of a Ferris wheel and people walking. The second part shows blinking lights and the reflection of the wheel. The flow visualized in Figure 6 is the one between the 4th and the 5th frame of the sequence with a parameter setting α (1) = 1, α (2) = 1 4 . In order to improve the visibility we represent only flow vectors with magnitude larger than 0.18 and we omit for u (2) the part in common with u (1) .
• The fourth example is flickering. In a standard flickering experiment, the difference in human attention is investigated by inclusion of blank images in a repetitive image sequence. Although, in general, these blank images are not deliberately recognized, they change the awareness of the test persons. J. K. O'Regan [23] states that "Change blindness is a phenomenon in which a very large change in a picture will not be seen by a viewer, if the change is accompanied by a visual disturbance that prevents attention from going to the change location". They test data from http://nivea.psycho.univ-paris5.fr, was used for our simulations. The proposed optical flow decomposition is able to detect regions, which also humans can recognize, but standard optical flow algorithms fail to: To show this, the input sequence is composed by four frames consisting of Frame 1, a blank image, Frame 2 and again an identical blank image (see Figure 8 (top)). This sequence is then aligned periodically to a movie. We interpret the movie as a linear interpolation between the frames. We test and compare Horn-Schunck, Weickert-Schnörr and the proposed algorithm. We set the smoothness parameter α (1) to a value of one for all the methods. Moreover, for our approach we set α (2) = 1. For Horn-Schunck we visualize the flow field in Figure 7. This flow is the one between the blank frame and the slightly changed frame, which exceeds a threshold of 3.9. The results obtained by applying Weickert-Schnörr and the u (1) field of our approach, respectively, are small in magnitude. Therefore, we do not visualize them. This behavior is coherent with the motivation of the Weickert-Schnörr method to produce an optical flow that is less sensitive to variations over space and time. Finally, we visualize in Figure 8 (down right) the u (2) flow field for the proposed approach. For the visualization we omit all vectors with magnitude lower than 0.18. In order to make transparent the result, we show in Figure 8 (down left) the difference between the two frames of the sequence containing information (see Figure 8 (top)). In this experiment, we notice that the u (1) component is Figure 8. The two frames of the flickering sequence containing information (top), the difference between these two frames (down left), and the u (2) flow field resulting from the proposed approach (down right). As predicted in Section 3 and Appendix A the u (1) component is negligible, instead u (2) detects the change of intensity across the blank sheet. negligible, instead u (2) detects the areas affected by change of intensities (see Figure 8 (down right)).
Additional information. In the following, we show the capacity of our model to extract different information compared to standard optical flow algorithms. The current literature focuses on average angular and endpoint error [8] in order to compare optical flow algorithms. Our model extracts information, that is neglected by standard algorithms. Such difference can be shown through a quantitative comparison of models. For this purpose, we use well-known test sequences from the Middlebury database http://vision.middlebury.edu/flow/, and evaluate the residual of the optical flow constraint.  Table 3. Comparison of squared residuals over space and time E between Weickert-Schnörr and the proposed method.
In order to understand how much information our method is capable to extract from an entire dynamic sequence, we calculate the residual squared over space and time: E( u (1) , u (2) ) as in (10) and compare it with the squared residual over space and time of the Weickert-Schnörr method [29,30]. We use the parameter settings α (1) = 100 (α = α (1) in Weickert-Schnörr) and α (2) = 1 4 , tolerance tol = 0.01, in order to have a good comparison of the two methods. Again the residual is smaller for the proposed method as shown in Table 3. The smaller value of the residual are due to the fact that we are calculating a minimizer from a larger space of optical flow components. However, with this approach we cannot observe oversmoothing of the flow. of the optical flow equation is given by : Let us assume that (log g)(t) − (log g)(0) can be expanded into a Fourier sin-series: n sin(nπt).
In the case of flickering data f , u (2) is pronounced (if n 0 is large u (1) mn ≈ 0) while in the quasi-static case u (1) is dominant. Moreover, we also see that spatial components belonging to Fourier-cos coefficients with large m are more pronounced in the u (2) component, and the spatial and temporal coefficients always appear in both components. In particular this means that a threshold has to be set, to assign them to the first or second module.