ON THE PARAMETER ESTIMATION PROBLEM OF MAGNETIC RESONANCE ADVECTION IMAGING

. We present a reconstruction method for estimating the pulse-wave velocity in the brain from dynamic MRI data. The method is based on solving an inverse problem involving an advection equation. A space-time discretization is used and the resulting largescale inverse problem is solved using an ac- celerated Landweber type gradient method incorporating sparsity constraints and utilizing a wavelet embedding. Numerical example problems and a real- world data test show a signiﬁcant improvement over the results obtained by the previously used method.


1.
Introduction. Magnetic Resonance Advection Imaging (MRAI) is a recently developed method to map the pulsatory signal component of dynamic echo planar imaging (EPI) data of the brain, such as acquired in functional and resting state functional Magnetic Resonance Imaging (MRI) experiments [34]. Its underlying mathematical model is specifically based on the wave equation for unidirectional waves, and as such is an advection equation. MRAI derives its information from the local spatiotemporal properties of the dynamic MRI data sets [35] and does not require an external pulse reference signal. It has been shown that MRAI maps depict the location of major arteries as well as some venous structure. In addition, colour direction maps allow for visualization of the orientation direction of blood vessels.
It has been suggested that MRAI may potentially serve as a biomarker for the health of the cerebrovascular system. The reason is that MRAI is designed to reflect the spatiotemporal properties of travelling waves, and pulse wave velocities (PWV) are a main indicator for the physical properties of blood vessels. By means of the well-known Moens-Korteweg equation, PWVs are related to vessel diameter, wall thickness, and wall stiffness. In particular wall thickness and stiffness are key parameters that change in vascular disease and with age.
However, although MRAI is designed to reflect the spatiotemporal properties of travelling waves, it has been found that PWVs are difficult to measure with the previously proposed modelling approach [34], which is based on local multiple regression of finite difference estimators of the differential operators in the unidirectional wave equation. It would be desirable to improve estimation techniques in order to overcome these limitations. It would also be desirable to have an optimized method to be able to estimate requirements on the data in order to obtain meaningful, i.e., quantitative values for the pulse wave velocities or at least representative values that are affected by differing pulse wave velocities throughout the cerebrovascular system.
In [34], a multiple regression approach was used in order to estimate the PWV. In this contribution, we consider the problem from the viewpoint of parameter estimation in PDEs. More precisely, we will define an operator F which maps a velocity v onto data y via F (v) = y , and then solve this equation for v using standard techniques in inverse problem theory, as described for example in [10]. The underlying equation connecting the velocity vector field v(x, y, z) to the dynamic MRI signal ρ(x, y, z, t) will be the advection equation (1) ∂ ∂t ρ(x, y, z, t) + v(x, y, z) · ∇ρ(x, y, z, t) = 0 .
The subsequent paper is structured as follows. In the next section, we will give some medical background, introduce the reader to dynamic MRI and give some specifics of our problem. In Section 3 we will define the mathematical model used throughout this paper and in Section 4 we will discretize the model and introduce an inverse problem for the PWV. Section 5 is concerned with the solution of the inverse problem and in Section 6 several numerical experiments with simulated and real-world data are presented. Finally, Section 7 gives some conclusions and an outlook.

2.1.
Pulse wave velocity. Cardiovascular pulse waves provide a natural physical perturbation to vascular dynamics, and their effects have been utilized in clinical diagnostics for a long time. For example, the PWV in major arteries can be measured directly and contains information about arterial compliance [18], defined as the ratio of blood volume change to blood pressure change. Arterial compliance is an important determinant of the state of the cerebrovascular system. With respect to the brain, aortic stiffness has been associated with cerebral small-vessel disease in hypertensive patients [14] and cognitive decline [26]. In addition, emerging concepts such as pulse wave encephalopathy would profit from diagnostic imaging methods of cerebral vasculature [2,13].
The PWV in arteries, or Moens-Korteweg velocity [19], follows from the Moens-Korteweg formula [23,17], The PWV depends on the three parameters vascular diameter d, wall thickness h, and the vessel wall's Young's modulus or distensibility, E, if the reasonable approximation of constant density of the blood, ρ B , is made. Equation 2 models vessels as elastic tubes with isotropic walls [40]. Pressure gradients, which would be required to determine blood flow, do not appear in the Moens-Korteweg formula.
In other words, PWVs can be modelled independently from blood flow velocity and in fact can be one to two orders of magnitude faster [19].
For a blood vessel along the direction of v, Equation (1) describes the pulsatile component of the blood flow velocity along that direction, with v being the PWV. Importantly, the same advection equation would also hold for blood pressure waves [8], which, via the Windkessel model [29], are a function of the integrated net flow into the vessel reservoir [1]. Since the local blood volume is related to local blood pressure via the compliance C as dV = C dp, the same advection equation applies to the pulsatile component of the blood volume as well. For the rest of this paper, ρ(x, y, z, t) should be understood as the MRI signal variability attributable to volume change, and we also interpret the three-dimensional velocity vector v as a PWV.
Though ρ(x, y, z, t) depends on the blood flow velocity via the relationship between pulsatile flow, pressure, and volume, the constant component of the flow, or the average blood flow, decouples from Equation (1), as it does not cause any spatial or temporal signal change. It could cause signal variability, however, if blood is not assumed to have homogeneous properties in the model domain Ω, which may be caused by variations in oxygenation or temperature, or any other property that might affect the particular MRI contrast. Here we assume that the MRI signal is not affected by any of these properties.

EPI data.
There are only few in-vivo options for imaging vascular dynamics in the human brain. Arterial spin labelling (ASL) is the most advanced method and has high spatial resolution [37]. It provides quantitative blood flow values in the capillary bed, assuming steady flow. However, arterial compliance depends on the pulsatile component of the flow. It can be imaged with specific ASL pulse sequences [36,38]. Pulsatile flow components can also be imaged over the whole brain with 4D phase contrast angiography [11]. In this contribution we are aiming at deriving PWV related quantities from echoplanar imaging [31] (EPI) data. EPI is the method of choice for functional MRI [25]. Here we are less interested in the functional aspects of EPI, but in the fact that EPI can also yield fast dynamic data over the whole brain. As all MRI imaging signal intensity is proportional to the total amount of resonant spins within a voxel, it reflects the local proton distribution, which shows pulsatile information around vessels. This has been demonstrated with phase coherence maps before [34,33], as well as with statistical parametric mapping [4,32].
In general, the MR signal S(k) in physical k space is proportional to the inverse Fourier transform of the spin density in a voxel at position r, ρ(r), i.e., where k is the k-vector. The image S(r) is the magnitude of the Fourier inversion of this expression. The vectors r and k are, depending on the acquisition scheme, two or three-dimensional. (See, for example [21].) In EPI, images are acquired very rapidly to allow for whole brain coverage within seconds. A typical EPI data set consists of three-dimensional volumes acquired repeatedly in time with a repetition time TR. Each volume contains either the whole brain or always the same part of it. Volumes are acquired slice-by-slice, i.e., the vector r is two-dimensional. The typical spatial resolution or voxel volume depends on the field-of-view, in-plane or slice matrix size, and slice thickness [7]. The field of view typically is 20 to 24 cm in order to include the whole head in axial slices. Slices are either acquired sequentially or interleaved. In sequential acquisition, first slice 1, then slice 2, etc., are acquired, where slice 1 is adjacent to slice 2 and so on. In interleaved acquisition, for example first all odd and then all even slices are acquired. Therefore, care has to be taken to assign the correct acquisition time to each slice, and the model that we are proposing can take slice acquisition order into account. It is described for sequential, specifically, ascending slices, but can easily be adapted to other acquisition schemes. It might be worth mentioning that we are not pursuing a "slice-time correction" [30], which is often performed in the analysis of functional MRI data and consists of an interpolation of intensity values to an evenly sampled time grid. Such a procedure is always of approximative nature and would not sufficiently take into account the fast dynamics of for example travelling pulse waves.
3. Mathematical model. Our goal is to estimate velocities of travelling waves in blood vessels from spatiotemporal MRI data. As a first approximation, we will neglect any frequency-dependence of the velocity, or dispersion, as well as reflected pulses travelling against the main blood flow direction. The latter assumption means that the back flow amplitude is considered to be much smaller than the forward flow amplitude. Under those assumptions, the authors of [34]considered the following local model, defined on small subdomains Ω S of the model domain Ω, e.g. 3 × 3 × 3 voxels in size. On each subdomain, the dynamic MRI signal ρ(x, y, z, t) is assumed to fulfill the advection equation where ∇ is the gradient with respect to the space variables (x, y, z) andv =v(Ω S ) is a velocity assumed to be constant on each subdomain Ω S . Using finite difference approximations of the derivatives of ρ, the authors of [34] used a multiple regression approach to get estimates for the local velocitiesv. Although yielding maps of velocity estimates that reflect main cerebral arteries, those estimates were not quantitative. Furthermore, the local regression matrices used there were ill-conditioned for many of the data points and additionally, the finite difference operators in z-direction used to derive those matrices did not take into account the limited data due to the slice-time acquisition procedure and therefore lead to crude approximations of the z-derivatives.
In this paper, we use an approach which is global in nature, retains the underlying advection equation and gets rid of the numerical instabilities of the regression approach. Following the physical arguments of [35], one can see that ρ is in essence assumed to be a conserved quantity, for which there holds the following continuity equation (5) ∂ ∂t ρ(x, y, z, t) + div (ρ(x, y, z, t) v(x, y, z)) = 0 , where v = v(x, y, z) is a constant-in-time velocity field now defined on the entire model domain Ω. Assuming v to be divergence-free, i.e., div v = 0, which is reasonable since we consider a basically incompressible carrier medium (blood), the product rule yields which is again an advection equation, now defined on the entire model domain Ω. Given measurements of ρ(x, y, z, t), we want to recover the global but now space dependent velocity vector field v satisfying the above equation. This is an ill-posed problem, one reason being that derivatives of ρ are taken in (6), which, as the data ρ is subject to measurement errors, is an ill-posed procedure in itself. Note that v is assumed to be independent of the time t. This assumption stems from the fact that the pulse-wave velocity is primarily dependent on time independent quantities such as vessel wall property parameters, see (2).
Note now that we are trying to reconstruct the vector valued quantity v from one single scalar equation. Even worse, assuming that v is a solution of (6), every v + h, where h satisfies h · ∇ρ = 0, is a solution as well. However, following again the physical arguments of [34], velocities h satisfying h · ∇ρ = 0 are of no interest to us and are in fact not detectable by our algorithm.
We could now consider the inverse problem in the continuous setting, first defining a nonlinear operator mapping between suitable function spaces, then choosing a solution method and finally discretizing. This approach turns out to be highly complicated, as the solution theory of advection equations with non-Lipschitz velocity vector fields is quite involved, see e.g., [6]. Most problematic is the fact that the Lax-Milgram framework commonly used for PDEs is no longer applicable in that case, resulting in solution concepts which are hard to handle.
Hence, we will use a first-discretize-then-regularize approach, which simplifies the subsequent computations significantly. However, we will use one fact from the classical theory, see [6], namely that v should be at least an H 1 , or locally H 1 vector field.
4. Discretization. Motivated by the above considerations, we assume that as the pulse wave travels through the brain, the dynamic MRI signal ρ = ρ(x, y, z, t) fulfills the advection equation, i.e., where is a velocity vector field assumed to be independent of time t and which is additionally assumed to be divergence-free, i.e., satisfies div v = 0. It is the aim to estimate v from measurements of ρ. We assume (for simplicity) that the brain domain Ω is a cuboid and that pointwise measurements are available at certain points. Unfortunately (see Section 2.2), the time coordinate is linked to the z-coordinate, i.e., measurements are available only at points where x i := x 0 + i∆x , y j := y 0 + j∆y , z k := z 0 + k∆z , t k,l := (k + (K + 1)l)∆t .
This corresponds to ascending slice acquisition and means that in each time step, only one z-slice can be measured and that after a full cycle, the measurement process restarts.
In a next step, the equation (7) is discretized according to the data, which leads to a space-time discretization, which we will see below. The derivative with respect to t is approximated by a backwards differential quotient, the derivatives with respect to x, y, z by central quotients in the interior and by forward or backward quotients at the boundary. We then get the following discretized system of equations: Here and D xi , D yj , D z k are suitably defined approximations of the derivatives with respect to x, y, z (see appendix, formulas (60), (61), (63) for details). Note that the denominator (K + 1)∆t in the approximation of the time derivative in (9) seems rather large. However, due to the slice-time acquisition procedure, i.e., since consecutive measurements at the same spatial position are made with a time difference of exactly (K + 1)∆t, this time step is the smallest one available. We will observe the consequences of this fact in the numerical simulations presented in Section 6. We want to write the equations (9) in matrix-vector form. For this, we first collect all ρ i,j,k,l values (l > 1) in the vector ρ and all ρ i,j,k,l values (l = 0) in the vector ρ 0 , where we use the lexicographic ordering with respect to (i, j, k, l) to sort the values inside ρ and ρ 0 . The vector ρ then has length m := (I +1)(J +1)(K +1)L and the vector ρ 0 has length n := (I + 1)(J + 1)(K + 1). If we define the indices ind m i,j,k,l := i(J + 1)(K + 1)L + j(K + 1)L + kL + l , ind n i,j,k := i(J + 1)(K + 1) + j(K + 1) + k + 1 , then the relationship between ρ i,j,k,l and ρ and ρ 0 can be written precisely via Next, we collect all v s,i,j,k values in a vector v of length 3n, using again a lexicographic ordering but now with respect to (s, i, j, k), which leads to the relation (12) v s,i,k,l = ( v) (s−1)n+ind n i,j,k . We want to write (9) in the following matrix-vector form: is an m × m matrix and b( v, ρ 0 ) is a vector of length m. This is possible, since ρ i,j,k,l appears only linearly in (9). In order to assemble the system matrix A( v), note first that (9) naturally divides into four parts, each part corresponding to the differential quotient with respect to one of the variables t, x, y or z. Hence, the system matrix A( v) naturally splits up into four parts, i.e., are m × m matrices corresponding to the differential quotients. They can be assembled by looping over all possible values of (i, j, k, l) and setting suitable values at the positions implicitly defined by the difference quotients (see appendix for details). From the assembly procedure, one can see Hence, the system matrix A( v) is sparse as well, with (note that all four matrices share the non-zero main diagonal) only 11 non-zero (off-)diagonals.
As for the right-hand side in (13), one could again loop over all indices (i, j, k, l) to assemble it, or alternatively use a closed formula (see (63) in the appendix).
The forward problem consists in calculating ρ for given v and given initial data ρ 0 , via solving (13). Let us denote this solution by ρ( v, ρ 0 ).
Note that in order to guarantee unique solvability of (6), one usually prescribes boundary conditions on ∂Ω. However, since for our problem sufficient boundary data are not available, we used forward and backward differential quotients in the definition of D xi D yj and D z k at the boundary. It can easily be seen that this amounts to linear extrapolation of ρ and is also the reason why (13) turns out to be solvable.

The inverse problem.
Let us now turn to the inverse problem. It consist in calculating the velocity v and the initial data ρ 0 for given measurements of ρ and ρ 0 . Introducing the nonlinear operator our inverse problem can be written in the standard form The additional equation ρ 0 = ρ 0 in (16) seems to be superfluous at first. Note however, that as a result of measurement errors, we are not really given ρ and ρ 0 , but only noisy data ρ δ and ρ δ 0 and hence, including this equation becomes necessary. Concerning the solvability of (16), note that in essence we are trying to reconstruct v ∈ (R n ) 3 from ρ ∈ R m , which means that we are given m data points and try to solve for 3n unknowns. Hence, in general one can only hope for a unique solution to this problem in case that m ≥ 3n, which is always satisfied given data with a large enough L, i.e., a long enough scanning time, is being used. This will always be the case in the tests below. If (16) happens to be overdetermined, our solution approach presented below will pick a suitable solution out of all possible ones. Moreover, it could happen that even for m ≥ 3n problem (16) is underdetermined (compare with the well-known aperture problem). As described above, the main reason for this is that it could happen that the velocity vector field v is orthogonal to ∇ρ at certain points in space for the entire scanning period. In this very unlikely case, velocity components orthogonal to ∇ρ could not be detected at certain points. However, as noted above, those components are not of interest to us and the reconstruction algorithm introduced below will compute approximations of the velocity vector field without those orthogonal components in this case.
In order to solve (16), we will need the derivative and its adjoint of F . For this, we consider F as an operator from X to Y, where We equip X and Y with the inner products where H is a positive definite 3n × 3n matrix chosen such that the inner product is an approximation of the H 1 -inner product of functions v, see also Section 4.3. Before we proceed with the derivation of the Frechet derivative and its adjoint, we introduce the following notation: Whenever we have an arbitrary Frechetdifferentiable function G between suitable spaces A and B and we are given x ∈ A and ∆x ∈ A, then we denote by G (x)∆x the Frechet derivative of G at x in the direction of ∆x. This notation will be used multiple times in the following: Lemma 4.1. Let F : X → Y be given as in (15) and let ( v, ρ 0 ) ∈ R 3n+n and (∆ v, ∆ ρ 0 ) ∈ R 3n+n . Then for the Frechet derivative of F there holds , the Frechet derivative of ρ, is given as the solution of where A and b are the Frechet derivatives of A and b, respectively, and therefore Proof. First, note that (19) follows immediately from the definition of the Frechet derivative. Now, from equation (13), we know that Applying the Frechet-derivative at the point ( v, ρ 0 ) in the direction of (∆ v, ∆ ρ 0 ) to this equation and using the chain rule yields from which the statement of the lemma now immediately follows.
It follows from (19) and (20) needs to be solved. It is possible to calculate those vectors without assembling the matrices (A (v)∆ v) and b ( v, ρ 0 ). However, as we will see in the lemma below, the assembly of three specific matrices will be inevitable for calculating the adjoint of the derivative of F , and those matrices can then also be used to compute the two required vectors.
To arrive at these matrices, note first that it follows from the assembly procedure Together with (14) and the definition of the Frechet derivative, it follows that and hence A ( v)∆ v is not only linear in ∆ v but also independent of v. As a result, it is possible to find a matrix D A (ρ) ∈ R m×3n such that As for the other two matrices, note that once one has assembled b ( w, ρ 0 ), which can be calculated easily using (63), this matrix can be split up into two sub-matrices, i.e., (25) b Thanks to the special structure of A( v) and b( v, ρ 0 ), when following the above derivation steps in detail, one finds out that most of the elements of the matrices are zero, with at most three non-zero elements in each row in both cases.
Using the above derivations, we can now prove the following Lemma 4.2. Let F : X → Y be given as in (15) and let ( v, ρ 0 ) ∈ R 3n+n and ( w, w 0 ) ∈ R m+n . Then for the adjoint of the Frechet derivative of F there holds Proof. To compute the adjoint, consider first (25), we get and hence, using the definition of the inner product in X , the statement follows.

4.2.
Incorporating the divergence-free condition. Up to now, the divergencefree condition div v = 0 on the velocity field to be reconstructed did not enter the reconstruction method. However, it is a modelling assumption and has to be taken care of.
One possible way to do so would be to incorporate the condition into the space X , i.e., allowing only divergence-free vector fields in X . This approach essentially, except at the boundary, implies v 3 = v 3 (v 1 , v 2 ). This changes the derivative and its adjoint of F in a computationally unfavourable way and hence we avoid this approach.
Instead, we enforce the divergence-free condition in a weak way, by changing F to where D is a matrix representing the divergence-free condition. The operator F now maps from X to Y with X as before and where we use the following inner product The resulting nonlinear inverse problem now reads as An analogous calculation as before yields that the Frechet derivative of F is given by and that the adjoint is given by As for the choice of the matrix D, note that since, due to the Divergence Theorem, Assuming that each component of v is piecewise linear (tri-linear), divergence-free then means are suitably defined operators derived from (33) (see appendix, formulas (65), (66), (67) for their definitions). The (sparse!) matrix D is now built such that D v = 0 is equivalent to (34). Whenever we speak of the weak divergence-free option in subsequent chapters, we mean that we use F defined as in (28). As it will turn out in our numerical tests below, using this option has a significant effect on the reconstructed solutions.

4.3.
Choosing the matrix H. We now turn to the choice of the matrix H in the inner product of X . From the theory of transport equations (see e.g. [6]), we know that v should be at least an H 1 velocity field. Assuming as above that each component of v is piecewise linear and can hence be written in the form where the ψ i,j,k are the 3D hat functions commonly used to form a basis in H 1 -FEM, we find that the optimal choice of H in this case would be, where c s is a suitable scaling constant. As can easily be seen,H is the FEM system matrix of the equation −∆u + u = f . However, in the computation of the adjoint of F we need to apply H −1 , or equivalently three timesH −1 , which amounts to solving three perturbed Laplace equations in each iteration step. This is way too costly and hence we need to find a suitable alternative for inverting the matrix H.
One possibility is to approximateH by its diagonal part, which leads to a diagonal matrix H that is easy to assemble and to invert, i.e., The scaling constant c s is chosen such that the two terms on the right hand side of (18) are balanced and that the H 1 and the L 2 norm approximations of constant vectors coincide, which leads to the choice Another, more sophisticated possibility is to use an orthogonal system ψ i,j,k , e.g., wavelets, since then the matrixH becomes diagonal. One can see that applying H −1 coincides with applying the operator i * 1 , where i 1 : H 1 → L 2 is the embedding operator. For a given wavelet system {φ, ψ}, every function f ∈ L 2 can be expanded as where φ 0k = φ(t−k) and ψ jk (t) = 2 j/2 ψ(2 j t−k). If the wavelet system is sufficiently smooth, then for every Sobolev space, the H s inner product of two functions f and g is equivalent to Following [27], we see that the adjoint i * s of the embedding i s : H s → L 2 is given by Using this, we can, instead of applyingH −1 to the components of v, compute their discrete wavelet transforms, weight the resulting coefficients according to (41) and then apply the inverse discrete wavelet transforms. The computation of v T H w in (18) is then replaced by using a scaled version of (40), using again the discrete wavelet transform. Thus, whenever we speak of using the wavelet embedding option in subsequent chapter, we mean that this procedure is being used. Note that for the results presented below, due to the low spatial resolution, using the wavelet embedding option with s = 1 in (40) and (41) leads to an undesirably high amount of smoothing and subsequently to mediocre results. Using a smaller s and hence less smoothing yields much better results and therefore, the choice s = 0.1 was used in all computations below. As for the choice of wavelets, Daubechies 3 wavelets (see [5]) were used in all cases.
Both approximations of H −1 , using only the diagonal entries of H and via the wavelet embedding are very fast (diagonal matrix inversion and O(n) wavelet decomposition). The use of wavelets has the additional advantage that it yields a very good approximation of the application of H −1 , as compared to using the diagonal approximation, which in essence only amounts to a scaling of the steps in the iterative solution method introduced below.
5. Regularization approach. The stable solution of ill posed (nonlinear) equations F (x) = y requires the use of regularization methods, see [10] for an overview of methods. Tikhonov regularization is probably the best known method. For nonlinear operators F , Tikhonov regularization results in the minimization of a functional consisting of a data fit term and a penalty term. Besides this, there exists a wide array of gradient based iterative methods such as Landweber iteration or the iteratively regularized Gauss-Newton method (see [16]). Even though Gauss-Newton and other sophisticated methods are considered to be much faster than Landweber iteration, they face a serious drawback when confronted with largescale problems, since in every iteration step, a system of equations with a full system matrix has to be solved. Hence, since in our MRAI problems we often deal with more than one million unknowns, we resort to the Landweber iteration, which is given by (see [10]) where x 0 is a suitable initial guess and ω is a scaling parameter which has to satisfy for all x in a sufficiently large neighbourhood around the initial guess x 0 . The superscript δ in (42) indicates that we are only given noisy data y δ instead of y.
The most common assumption on the data is that In essence, Landweber iteration (42) is nothing else than the gradient method applied to the residual functional 1 2 F (x) − y δ 2 . Since we are dealing with an ill-posed problem with noisy data, minimizing this residual functional with (42) will lead to very bad approximations of solutions of F (x) = y, unless the iteration is stopped by a suitable stopping rule, in which case Landweber iteration becomes a regularization method. Typically, one uses the discrepancy principle, i.e., the iteration is stopped after k * steps, where k * is the smallest index fulfilling where τ is an appropriately chosen positive number. Concerning the initial guess x 0 , one should if possible incorporate knowledge of the true solution into its choice. Where no such information is available or can be expected, as is the case for the problem under consideration, it is common to use a zero initial guess.
Determining an appropriate ω in (42) is very important but difficult. Hence we use an iteration dependent scaling parameter ω δ (z) defined by: which leads to the so-called steepest descent method. For details, see for example [16,Chapter 3.4].
As is well known, both the Landweber iteration and the steepest descent method converge rather slowly. Hence, in order to reduce the number of required iterations, we follow the idea of Nesterov's acceleration scheme [24] and introduce an intermediate step into our iteration, which now reads as follows: A similar intermediate step was also used in the highly successful FISTA algorithm for linear inverse problems [3], where additionally the convergence rate O(k −2 ) was proven; a big improvement over the classical rate O(k −1 ). Although the authors are not aware of any publication proving the convergence rate O(k −2 ) for our method (48), numerical tests showed that using (48) instead of (47) leads to a significant decrease in iteration number, even for our nonlinear problem (16). This is remarkable, since the intermediate steps z δ k are simply linear combinations of previous iterations and are therefore very cheap to compute. 5.1. Implementation details. The implementation of (48) seems very straightforward at first, since we have explicit expressions for F and its adjoint available. However, even though we will be dealing with a rather coarse space discretization, since we are essentially using a space-time approach with three space dimensions, the problem becomes largescale, with around 3.3 million unknowns for one of the real-world data sets considered below. This causes severe numerical difficulties.
Please note that for the calculation of one iteration step it is necessary to solve three large sparse linear systems of equations, one for calculating F , one for F and one for ω δ . Since, due to the size of the problem, this can no longer be done directly, the iterative solver biCGstab with an incomplete LU factorization preconditioner was used.
The implementation of the method was done in MATLAB R2015b. Since some built-in MATLAB functions were too slow for our purposes, we had to rely on fsparse.m, a function for creating sparse matrices, see [9].

5.2.
Enforcing sparse solutions. The relevant blood vessels in which the pulse waves travel constitute only a minor part of the brain. Hence, the velocity vector field v which we seek to reconstruct should take nonzero values in those blood vessels only.
In mathematical terms this means that v should be compactly supported and should have a sparse representation in the basis ψ i,j,k . The reconstruction algorithm should take this into account, which leads directly to the concept of sparsity regularization.
Following [28] and [20], we seek to compute ( v, ρ 0 ) as a minimizer of the functional where ω v l i,j,k and ω ρ0 i,j,k are positive weights bounded away from zero, α is a regularization parameter and p ∈ [1, 2]. The choice p = 1 yields sparse minimizers, while 1 < p < 2 is suspected to promote sparsity.
For further use below, and for 1 ≤ p < ∞ and τ > 0, we define the real valued shrinkage function S τ,p : R → R via For a vector x = {x k } k∈Λ and weights ω = {ω k } k∈Λ we define the shrinkage function S ω,p via Following [20], a possible method for solving nonlinear inverse problems F (x) = y δ involving sparsity constraints is given by the so-called iterated soft shrinkage algorithm, which reads as For our problem we combine this approach with the accelerated gradient method (48) to arrive, after collecting the ω v l and ω ρ0 into a single sequence ω s , at the following iterative scheme: . Note that for this algorithm, values for α, ω s and p need to be specified. Moreover, if p > 1, 4n nonlinear equations need to be solved for approximating G −1 τ,p . This is very costly, hence we use p = 1 only. Furthermore, since we want to weight all v l,i,j,k and ρ i,j,k equally, ω s was set to 1 for the numerical examples presented below.
Obviously, the above algorithm can also, with minor modifications, be applied to the case when F contains the divergence-free part as introduced in Section 4.2.
6. Results. In this section, we present several results obtained by using the method described above on simulated and real-world data. We compare different choices of parameters and the effects of the sparsity and the weak divergence-free option as well as the different approximations of H described in Section 4.3.
For all examples, a maximum intensity projection (MIP) over the z-axis of the norm of the velocity vector field was calculated. Afterwards, a colour direction MIP was created by assigning a RGB value to every pixel of the MIP. This was done by first considering, for every pixel of the MIP, that voxel whose velocity norm value was responsible for the entry of the MIP at that pixel. The absolute values of the v 1 , v 2 and v 3 values of that voxel were then taken as the red, green and blue values of the RGB triplet at that pixel, respectively. This means that a red pixel in the colour direction MIP indicates movement along the x-axis, a green pixel along the y-axis and a blue pixel along the z-axis. Finally, all RGB values were divided by the maximum absolute RGB value of the colour direction MIP and the resulting map was divided by a factor of 0.6 in order to enhance colours. 6.1. Simulated data. In this subsection, we test our algorithm on simulated data. For this purpose, a phantom of size 40 × 30 × 30 was created, featuring several blood vessels, i.e. pipes, of variable thickness and orientation. A projection of this phantom over the z-axis can be seen in Figure 2, which not only shows the vessels themselves but also the norm of the velocity vector field (left figure) and the a colour direction MIP of the velocity (right figure) moving through the vessels. Looking at the colour direction MIP in Figure 2, the three red horizontal vessels on the bottom of the phantom move along the x-axis and have a thickness of 1, 2 and 3 voxels, respectively. The three blue vessels above them move along the z-axis, and hence only a small part of them can be seen in the picture. Note that one of those vessels has a plus-shaped cross-section, which is also the case for the bottommost of the three red blood vessels. The three orange-red blood vessels move diagonally across the x-y-plane and have a z-thickness of 1 voxel each. Both the orange and the red vessels lie in the middle of the z-plane, while the blue vessels extend over the entire range of the z-axis.
As for the simulation of the data ρ i,j,k,l , consider first the case of a signal ρ 0 (x, y, z) transported via a constant velocity fieldv = (v 1 ,v 2 ,v 3 ). It can be easily seen that in this case solves the advection equation with initial guess ρ 0 . If in each vessel we prescribe a constant velocity vector field pointing in one of the two directions of the vessel, then for a given initial signal ρ 0 we can calculate the solution of the advection equation in that vessel via (53). Summing up those solutions for all the different vessels and sampling at the correct space-time points then gives us the data ρ i,j,k,l . Adding a random data error of fixed magnitude, e.g., 1%, we arrive at the final data used in the simulation. In our simulation, for a vessel with given velocityv, we used the initial signal To make the simulation procedure a bit clearer, consider the bottommost vessel in Figure 2. Prescribing for example the velocityv = (c, 0, 0) in that vessel leads to (55) ρ(x, y, z, t) = sin 6π(x − c t) I∆x , and the data ρ i,j,k,l , for those (i, j, k) for which (x i , y j , z k ) belongs to the vessel under consideration, is then defined via (56) ρ i,j,k,l = ρ(x i , y j , z k , t k,l ) = sin 6π(x i − c t k,l ) I∆x .
We apply the same procedure to all the remaining vessels and set ρ i,j,k,l to 0 whenever (i, j, k) does not correspond to any vessel. Finally, a randomly generated data error of magnitude delta is added. Note that the velocity vector field v underlying this data simulation is constant in each vessel and hence locally, but not globally, in H 1 . Even though we have derived our solution method from the assumption of a globally H 1 velocity vector field, using this simulation makes sense, since for real MRI data we also expect nonzero velocities to occur inside blood vessels only, which renders the velocity vector field to be reconstructed only locally H 1 as well.
For the results presented below, we have chosen ∆x = ∆y = ∆z = 1 mm and, as mentioned above, I = 39, J = 29, K = 29 for the space discretization, as well as L = 4. As for the time discretization ∆t, notice that our forward solver belongs to the class of BTCS (backward in time, central in space) finite difference methods, which are implicit methods requiring no restriction on the time stepsize ∆t to achieve stability. However, in order to get good accuracy of the forward solver, ∆t should be chosen small enough. Denoting with ∆T the duration of a full measurement circle, i.e., ∆T = (K + 1)∆t, it turned out that a suitable bound is given by the CFL-type condition (57) ∆T v 2 ∆x 10 .
Using a ∆T significantly greater than this bound was found to introduce large errors in the reconstructed velocity (see Figure 12). Hence, for our tests below, we used ∆T = 0.1 s, which by (57) allows for velocities with a maximum norm of 1 mm/s. For our simulations, we have used velocitiesv with three different magnitudes v 2 , which can be seen in Figure 2. The orientation of these velocity vector fields is depicted in more detail in Figure 3, which shows the values of the three velocity components, revealing also the different orientations of the simulated pulse waves.
For all tests below, a random data error of magnitude δ was added and the iteration was stopped using the discrepancy principle (45) together with the choice of τ = 1. Furthermore, if not noted otherwise, the matrix H introduced in Section 4.3 is used instead of the wavelet embedding described in the same chapter. As a first test, we use our method without any special options, i.e., neither using the weak divergence-free option, nor the sparsity or the wavelet embedding option. The resulting approximation, achieved after 90 iterations, can be seen in Figure 4. The structure of the vessels can clearly be identified and also the reconstructed velocity is partially correct. However, the sinusoidal structure of the initial signal ρ 0 is visibly transferred to the reconstructed velocity. Figure 5 shows the results of the second test: the weak-divergence free option was included in the reconstruction algorithm, which now stopped after 126 iterations. The reconstructed velocity is much smoother than before, now resembling the true solution much more closely. However, as could be expected, using the weak divergence-free calculation option leads to a smoothing of the solution, clearly visible in the reconstruction around the vessels.  For the next test, we apply the algorithm together with the wavelet embedding option. The stopping criterion was met after 121 iterations and the results can be seen in Figure 6. A comparison with Figure 4 shows that using this option mainly leads to a smoothing of the reconstructed velocity, comparable to but not as strong as using the weak divergence-free option. The results of combining the weak-divergence free and the wavelet embedding options can be seen in Figure 7. This time, the iteration terminated after 162 iterations and once again one can see the strong smoothing effects of the two calculation options. As we will later see in Table 6.1, of all the combinations of different reconstruction options, this one yields the third best result.  Next, we present some results of using the sparsity option, together with either the divergence-free or the wavelet embedding option. For this, we need to choose an α, see (5.2). A good choice turns out to be α = 10 −3 , which was used for computing all presented results. Figures 8, 9, 10 and 11 show the results of the different combinations, the iteration stopping after 71, 100, 99 and 138 steps, respectively. Note that all iterations involving the sparsity option were terminated before satisfying the discrepancy principle, since the residual stopped decreasing monotonously but rather started to oscillate around a certain value.
Comparing Figure 6 and Figure 9, we see that the edges are now more sharply reconstructed, although the result itself does not look much better then when using no additional options at all. This also holds true for only using the sparsity option alone, see Figure 8. However, Figure 10 strongly shows the advantages of combining Figure 10. Result of the algorithm applied to the test problem (δ = 1%), using the weak divergence-free and the sparsity option with α = 10 −3 . Velocity norm MIP and colour direction MIP. Figure 11. Result of the algorithm applied to the test problem (δ = 1%), using the weak divergence-free, the wavelet embedding and the sparsity option with α = 10 −3 . Velocity norm MIP and colour direction MIP. the divergence-free and the sparsity options. The initial signal ρ 0 only slightly affects the reconstructed solution and the sparsity option removes some of the smearing introduced by the divergence-free option, producing very nice results. Even better results are obtained when combining all three calculation options, which can be seen Figure 11, most notably at the plus shaped vessel in the centre of the figure and at the diagonal vessels, especially at the middle one of the three, which, despite the crude discretization and the smoothing introduced by the divergence-free and the wavelet embedding option, is reconstructed rather nicely.
In Table 6.1 we have collected some important information about the results presented above. The first three columns contain information about the used calculation options, the fourth column contains the iteration index k * at which the algorithm was terminated and in the fifth column, the error between approximated and true solution (denoted by ( v † , ρ † 0 )) is given. Here we have used the standard Euclidean 2 -norm for measuring the error, in order to allow for a fair comparison between those results which were achieved using the wavelet embedding option and the ones not using it. Once again it can be seen that the best results are obtained using all three calculation options in the reconstruction algorithm.
For our next test in this section, we adapt the simulation parameters to better fit the natural stimulation data set considered below, i.e., we use the same simulation div-free wavelets sparsity k * ( Figure 4 no no no 90 16.9658 Figure 5 yes no no 126 9.2904 Figure 6 no yes no 121 16.4324 Figure 7 yes yes no 162 8.8878 Figure 8 no no yes 71 17.179 Figure 9 no yes yes 100 16.7834 Figure 10 yes no yes 99 5.6577 Figure 11 yes yes yes 138 5.5245 Table 1. Comparison of the results of the reconstruction algorithm applied to the test problem (δ = 1%), achieved using combinations of the different computation options. Figure 12. Result of the algorithm applied to the modified problem (δ = 1%), where all involved velocities were multiplied by a factor of 10 4 , using the weak divergence-free, the wavelet embedding and the sparsity option with α = 10 −3 . Velocity norm MIP (left) and colour direction MIP (right). phantom as before but now with spatiotemporal resolution ∆x = ∆y = ∆z = 1.4 mm and ∆t = 0.0417 s (i.e., ∆T = 1.25 s; note the difference in size between phantom and natural stimulation data set) together with L = 9. Furthermore, since the PWV can reach up to 10 m/s, we multiply the velocities in all the vessels with a factor of 10 4 to simulate this fact. It is obvious that with those choices, condition (57) is far from being satisfied and hence, one can no longer expect to achieve similarly good results as before. The reconstructions computed with our algorithm using all three computation options can be seen in Figure 12, the iteration having stopped after 7 iterations due to a detected increase of the residual. One can clearly see that as a result of (57) not being satisfied, the velocities are strongly underestimated, which, even if one were to ignore the increase in residual and continue the iteration, could not be overcome. The problems related to high pulse velocities or large wavelengths, as well as the artefacts arising from them, can only be overcome by improved data acquisition techniques and have been discussed previously [33,34]. However, the algorithm is still able to detect the location of the vessels as well as some small qualitative differences in the norm of the velocity vector field. Furthermore, it is also still able to extract some information about the directions of the PWV, especially for vessels with a thickness of one voxel. Due to the high velocities in this simulation, in vessels thicker than one voxel neighbouring voxels perpendicular to the flow direction influence each other strongly, which leads to a somewhat distorted velocity vector field reconstruction in them. Figure 13. Result of the algorithm applied to the modified problem (δ = 1%), where all involved velocities were multiplied by a factor of 10 4 with initial signal (58), using the weak divergencefree and the wavelet embedding option. Velocity norm MIP (left) and colour direction MIP (right).
In the above examples, the initial signal ρ 0 defined via (54) was used to create the data. As can be seen from (55), this choice leads to a signal with a wavelength independent of the PWV. However, in the real world data set examined below one expects the wavelength to increase with the PWV, the signal having a pulse around 1 Hertz and a wavelength between 1 to 10 meters. In order to simulate this behaviour, we adapt our initial signal as follows: Prescribing for example the unidirectional velocityv = (c, 0, 0) as in (55), we now get from which we see that the wavelength of the signal now depends on the PWV in the desired way. Using the same phantom as in the previous test but with PWVs up to 10 m/s, the reconstruction algorithm using the weak divergence-free and the wavelet embedding options produces the results depicted in Figure 13. The calculation stopped after 6 iterations due to a detected increase of the residual. As expected, the periodic modulation in the horizontal vessel visible in some of the previous results disappears, but also the direction estimate in the same region becomes inaccurate; the thinnest vessel disappears and the horizontal vessel direction is not visible anymore. These results might show the limits of our simplified phantom simulations but also should be considered as possible artefacts in interpreting real-world data, which will be studied next.
6.2. Natural stimulation data set. In this subsection, we test the applicability of our algorithm to real-world data sets. For this, we use a publicly available natural stimulation dynamic EPI data set obtained on a 7.0 T MRI scanner [12]. Subjects were listening to an audio version of a movie. The data set includes eight 15 minutes long segments for each subject, of which the first 20 seconds of the second one were used for analysis. The transversal slices covered most of the frontal and occipital cortex and the regions in between. Data was sampled with a pulse repetition time (TR) of 2 s and an isotropic spatial resolution of 1.4 mm. The data set also contains time-of-flight angiography images of about the same coverage as the EPI data, as well as pulse oximetry data. We want to apply our algorithm to different subjects of this data set, for which Voss et al. have already tried to reconstruct the pulse wave velocity using their multiple regression approach [34]. In order to do so, we need to adapt our stopping rule, since the discrepancy principle defined in (45) relies essentially on knowledge of δ, which, as is usually the case in real-world situations, is not given explicitly. However, an estimate of the data error can be made by looking at the background voxels, i.e., those voxels which are known to lie outside the brain and which therefore should have value 0 if no noise were present. The corresponding calculations suggest that for our data set, the relative data error is approximately 2 -3 %. This estimate, combined with the discrepancy principle and a check for monotonous decrease of the residual suggests to stop the iteration after 15 -25 iterations.
The upper two figures of Figure 14 show the results of applying our algorithm with the above described changes to subject 16 of the real world data set. Here we used the divergence free, the wavelet embedding and the sparsity option, this time with α = 10 4 and the computation was stopped after 20 iterations. As before, the upper left image shows the velocity norm MIP over the z-axis and the upper right image the colour direction MIP of the reconstructed velocity v. One can clearly see the location of the major blood vessels and arteries, as well as their orientation, even though the expected PWV is severely underestimated (see below). The norm of the velocity vector field has maximum 0.1495, mean µ = 0.0011 and standard deviation σ = 0.0034. In order to generate the figure, a slight scaling was introduced, cutting all values in the MIP which are above µ + 5σ. The colour direction MIP was slightly brightened in order to enhance visibility.
For comparison of our proposed and the previously used method, the lower two figures of Figure 14 depict the results of the multiple regression approach, applied to the same subject 16 of the data set, using a 0.05 Hz cut-off filtering preprocessing step and all 15 minutes of the second segment of the data set. Again, a µ+5σ scaling and a slight brightening of the colour direction MIP were used for better visibility. One can see that while the previously used algorithm mainly yields estimates of the PWV in the main arteries and blood vessels, our new algorithm is able to resolve finer structures as well, using only a fraction of the data. Note that the previously used algorithm applied to 20 seconds of data would yield a result hardly distinguishable from white noise.
For certain subjects of the data set, the regression approach of [34] yields very unsatisfactory results. This appears to be due to the heart rate of the subjects having an unfavourable frequency and the data error being higher in those cases. The results of the regression approach and our proposed algorithm applied to one of those subjects, subject 2 of the data set, can be seen in Figure 15. The differences are quite obvious and can be attributed to two main reasons. Firstly, our algorithm works with much less data then the regression approach and secondly, by stopping the iteration after a certain amount of steps, we get a regularizing effect. Consequently, the effects of data error can partly be compensated and therefore, better reconstructions are obtained.
Please also note, that the calculated velocities using the real-world data set, which are around 10 −5 m/s, differ by orders of magnitude from the expected pulse wave velocity, which can exceed 10 m/s. One reason for this is the high amount of noise in the data, which can only be partially controlled using appropriate filters. Another reason is the low spatiotemporal, in particular the low temporal resolution of the MRI data; condition (57) is far from being fulfilled and hence the algorithm, after a certain amount of iterations, is no longer able to improve the approximation, which leads to underestimated velocities. The same problem has already been observed in [33], where the velocities were severely underestimated as well. This phenomenon, although most clearly understandable from the point of view of the finite difference approximation and condition (57), is also quite likely to appear, in one or another form, when using other discretization techniques as well. 7. Conclusion and outlook. The advection model was intended to model travelling pulse waves, but there might be other travelling disturbances along blood vessels or nerves in the brain. For example, it has been observed that endothelially mediated vasodilation related to functional brain activation travels along small blood vessels [15]; although on spatial scales much smaller than used here, high-resolution dynamic MRI data that would be potentially able to resolve this phenomenon with MRAI already exists [39]. Further, due to its sensitivity to pulsatile components of the signal and due to dramatic advances in dynamic MRI data acquisition [22], MRAI might have future potential to contribute to the modelling of the cerebrovascular system and to serve as a biomarker for cerebrovascular disease. It should also be noted that the methods described herein are quite general and could in principle be applied to spatiotemporal dynamics across a wide range of dynamic imaging applications in medicine and other fields (with an adaptation of the PDE model to the specific situation).
Concerning our proposed velocity estimation algorithm, the numerical simulation results of Section 6.1 clearly demonstrate the good reconstruction abilities of our method, especially when used with a suitable combination of the currently available options (weak divergence-free, sparsity, wavelets). This points to an advantage over the regression approach of Voss et al., namely the high flexibility of our approach. Considering the parameter estimation problem of MRAI in the framework of inverse problems, a vast array of techniques becomes available, leading to improved results. While the regression based method is more or less inflexible, our proposed approach can easily be adapted to include different or newly developed reconstruction options.
Another advantage of our algorithm is its ability to produce appealing results with only a small amount of data. Where the multiple regression approach requires at least a couple of minutes of measurements, our algorithm, as we have seen in Section 6.2, can produce nice qualitative results from only a couple of seconds of measurements. This might prove advantageous in practice, where long scan times often need to be avoided.
Although working on numerical phantom simulations, when applied to the realworld data sets, both the regression based approach and our proposed reconstruction method produce qualitative results only. As mentioned above, the most important reasons for this is are the high amount of data error and the low spatiotemporal resolution of the data when compared to the expected magnitude of the PWV. Hence, in order to achieve better results, MRI data with higher resolutions and less noise need to be used in our algorithm. One possible way towards this would be to use advanced imaging methods such as multiband EPI [22], where a whole stack of slices is acquired in a time that normally allows only for the acquisition of a single slice, which leads to a much higher spatiotemporal resolution.
As the way towards quantitative results leads through higher resolution data sets, in the future our proposed algorithm will need to be modified in order to be capable to deal with the resulting huge data sets. This quite naturally leads into the realms of modern, highly efficient space-time methods, high-performance and parallel computing -three areas which the authors are planning to combine and apply to MRAI in future works in order to improve it even further, with a hopefully much faster CG-based reconstruction method already under development.
In order to assemble the matrices A t , A x ( v), A y ( v) and A z ( v) introduced in (14), a looping procedure is used. In the case of the matrix A x ( v), considering the definition (60) of D xi , this assembly procedure looks as follows: • Create an all-zero m × m matrix A. A ind,ind = v ind /∆x, For the matrices A t , A y ( v) and A z ( v), the assembly procedure looks similar, with obvious modifications due to the respective definitions of D yj , D z k and the backwards time difference quotient in (9).