Robust reconstruction of fluorescence molecular tomography with an optimized illumination pattern

Fluorescence molecular tomography (FMT) is an emerging powerful tool for biomedical research. There are two factors that influence FMT reconstruction most effectively. The first one is the regularization techniques. Traditional methods such as Tikhonov regularization suffer from low resolution and poor signal to noise ratio. Therefore sparse regularization techniques have been introduced to improve the reconstruction quality. The second factor is the illumination pattern. A better illumination pattern ensures the quantity and quality of the information content of the data set thus leading to better reconstructions. In this work, we take advantage of the discrete formulation of the forward problem to give a rigorous definition of an illumination pattern as well as the admissible set of patterns. We add restrictions in the admissible set as different types of regularizers to a discrepancy functional, generating another inverse problem with the illumination pattern as unknown. Both inverse problems of reconstructing the fluorescence distribution and finding the optimal illumination pattern are solved by fast efficient iterative algorithms. Numerical experiments have shown that with a suitable choice of the regularization parameters the two-step approach converges to an optimal illumination pattern quickly and the reconstruction result is improved significantly, regardless of the initial illumination setting.


Introduction 1.Background
Fluorescence molecular tomography (FMT) is a medical imaging technique with high sensitivity, noninvasiveness and low cost.The traditional planar imaging strategy suffers from a lack of depth resolution since its dimension and optical properties correspond to a fully diffusive regime.FMT solves such problems by allowing quantitative reconstruction of the three-dimensional fluorescence distribution in the intact organism.Compared to traditional radiative emission tomography such as positron emission tomography (PET) and single photon emission computer tomography (SPECT), it is a relatively cheaper and safer alternative and thus widely used in basic biomedical research as well as drug development [21,25,20,22].

Basic principles of FMT
The major components of FMT include a laser that emits near-infrared light, a highly sensitive camera that captures the outcoming light from the fluorescence probe and a scanner that directs the laser beam onto the object surface.Figure 1: A snapshot of the latest stand-alone FMT hardware system in the animal imaging center of ETH Zurich.The main components of the hardware system include a sCMOS camera (detector), the animal support (object), the laser source and the MEMS scanner.Photo credit: Andrin G. C. Rickenbacher.
A complete FMT experiment is composed of the following processes.First of all, a fluorescence probe is administrated into the object to be imaged.The object is then illuminated by an external diffuse photon wave source of wavelength λ e , giving rise to an excitation field φ e in the object.The previously injected fluorescence dye will be excited, absorbs the radiation at wavelength λ e and re-emits as a source at a longer wavelength λ f [3].Such a physical procedure can be explained as follows.When a photon is absorbed by the probe, it will excite an electron into a higher energy state.Excited electron will soon relax back to ground state, losing some of its energy and sending out another photon with a longer wavelength (Stoke's shift) and less intensity.Measurements taken at both excitation and emission processes are made by a CCD camera such that information from every corner is recorded, which is equivalent to having point detectors densely distributed on the object surface.
The data acquisition system of FMT consists of a light source that delivers near-infrared light to the object surface at different points or with certain spatial patterns and a detector system that measures the light transmitted through the tissue.The light source of FMT can be classified into the following three categories: steady-state (or continuous wave) systems for measuring the stationary excitance, time-resolved systems for measuring the temporal distribution of the excitance due to ultrashort input pulse, and frequency-domain systems which uses an intensity modulated source and measures the phase shift and modulation depth of the detected signal [26].Our FMT model belongs to the category of steady-state systems.
Particularly in our experiments, we employ a near-infrared light source at a fixed wavelength of 670nm which allows for a drastic increase in the depth of light penetration in the object [25].In fact, light that does not lie in the near-infrared range (650-900µm) will suffer from low signal-to-noise ratio due to Rayleigh scattering or photoelectric effect [20].When electromagnetic waves transmit through biological tissues, different degrees of absorption, reflection and scattering will occur.In order to compute the forward model, the scattering coefficient µ s and absorption coefficient µ a of the object to be imaged must be given a priori or measured by means of other techniques, for instance, diffuse optical tomography.Experimental and theoretical work have demonstrated that under the condition of µ a µ s which is exactly the case of our application, light propagation in the medium falls into a strongly diffusive regime.Light within the medium essentially behaves like heat [19] and the photon transport can be modeled by a diffusion approximation with sufficient accuracy [15].
Finally, the forward problem is formulated into a large linear system by using the finite element method to calculate the light propagation through the object.Together with the measurements collected by a CCD camera, the linear system is properly inversed in order to reconstruct the three-dimensional distribution of the fluorescence dye inside the object.

Current status
The aim of most studies on tomographic imaging in recent years focuses on improving the quality of the reconstruction, especially the spatial resolution, signal to noise ratio (SNR) and the reconstruction speed.Necessary regularization techniques are introduced due to the illposedness of the inverse problem.Some effort has been devoted to the development of novel regularization methods such as Tikhonov regularization to replace the traditional Algebraic Reconstruction Technique (ART) which generates diffuse, oversmooth and noisy reconstruction results at a heavy computational cost.Recently, sparse regularization methods become popular because of their advantage in promoting sparsity and thus higher spatial resolution in cases where the target to be imaged is relatively small in size compared to its background.These sparse regularization methods mainly employ the l 1 norm of the image vector combined with other penalty terms to formulate hybrid regularization methods, such as l 1 combined with total variation regularization and l 1 combined with l 2 regularization [13].Aside from encoding sparsity of the image as the regularizer, other regularization techniques that form regularizers using prior information of the object are also investigated including Laplacian-type regularization that incorporates tissue structural information into reconstruction [7], patch-based anisotropic diffusion regularization that solves the issue of oversmooth and noisy reconstruction [9], and maximum-likelihood expectation maximization combined with sparse regularization method that reduces background noise and promotes sparsity at the same time [29].
Existing work has shown an advantage of sparse regularization methods especially hybrid methods that benefit from the strengths of more than one regularizer compared to the conventional nonsparse methods.Hence, it is necessary to introduce hybrid sparsity-based regularization methods to our discrete FMT inverse model.Considering the complexity of implementation, we choose l 1 combined with l 2 regularization (also known as elastic net in linear regression) as well as pure l 1 regularization.The main advantage of the elastic net is that it promotes sparsity of the solution while preserving its smoothness at the same time, avoiding discontinuities in the reconstructed image.
Another important factor that affects the reconstruction quality is the data set, which is determined to a large extent by the configuration of detection and illumination.A lot of research groups have investigated designing of the laser/detector pattern in different imaging modalities, such as electrical impedance tomography [17] and thermo-acoustic tomography [6], etc. Regardless of the modality involved, the common issue to be addressed is the same, namely how to place the source-detector pairs onto the object surface so as to obtain the best reconstruction result.In order to solve this problem, the usual procedure is as follows.First try to find a mathematical description of the geometry of the sensors and the object boundary.Then define the admissible set of possible sensor patterns on the boundary and search for a proper criterion which assesses the quality of a certain sensor pattern.Such a criterion is often defined as a functional of the admissible set and the object boundary.Depending on the mathematical or physical meaning of the defined functional, optimal sensor patterns are obtained by either minimizing or maximizing this functional.Hence, the criterion is also called the optimality condition.
In most papers, a complex geometric analysis of the boundary of the object as well as the shape of the sensors has to be carried out [6,10,17] in order to define the admissible set.And the defined functional is only explicitly solvable in a number of ideal cases that require the object to have some simple geometric structures, for example, a two-dimensional disk.However, our FMT experiment is very different from those settings.First of all, the detectors need not to be optimized.This is because the information of light propagation is acquired by a CCD camera which is able to detect light from every corner of the surface of the object.In other words, we solely focus on optimizing the illumination pattern, namely the laser power and laser positions.Besides, each laser point can be regarded as a small two-dimensional disk projected on the object surface and thus has no thickness, so complex analysis of the geometry of lasers is not needed.Our special problem setting inspires us to define the set of admissible illumination patterns by enforcing the restrictions such as the number of lasers, the power of each laser as well as the position of each laser as constraints directly on the source vector by simply using different norms of the source vector.
As for the optimality condition, [17] employs the A-optimality or D-optimality condition from optimal experiment design which involves the minimization or maximization of a functional of estimation; [12] achieves optimal illumination pattern by improving the conditioning of the Fisher information matrix in order to maximize the information content of acquired data; [6] maximizes the observability functional that stands for the efficiency of a set of sensors to allow for best observation quality of the worst possible case, etc.To benefit most from the discrete formulation of the forward problem, we define our functional as a discrepancy between the measurements and the discrete model in the least square fashion and encode the information in the admissible set as regularizers added to the discrepancy function.Hence, the techniques from existing sparse reconstruction section can be borrowed to solve the optimization of the illumination pattern.Finally, inspired by [6], we adopt an iterative approach of using the reconstructed result to guide the design of the next illumination pattern and using the updated illumination pattern to perform the next step of reconstruction.

Scope of this work
Our work focuses on improving the quality of the reconstructed three-dimensional fluorescence distribution from two aspects.The first effort is devoted to reducing the level of diffusion in recovering the fluorescence inclusions through the implementation of novel sparse regularization techniques.The second effort goes to finding the optimal illumination pattern which is achieved by minimizing a discrepancy functional based on the reconstruction result from the previous step.Then we use the updated illumination pattern to run a new FMT experiment and perform the next round of reconstruction.Finally, by combining these two efforts to form a loop of reconstructing the fluorescence distribution and optimizing the illumination pattern, the updated illumination patterns are expected to converge to the optimal illumination pattern regardless of the initial laser setting.The optimal illumination pattern should ensure a better reconstruction result compared to that of a random unoptimized illumination pattern.

Forward model 2.1 Theoretical formulation of FMT
Light propagates in tissue with different degrees of absorption, reflection and scattering.The mathematical description of light propagation can be modeled at three different levels: Maxwell's equations at the microscopic scale, radiative transfer equation (RTE) at the mesoscale and the diffusion approximation to the radiative transfer approximation at the macroscale.Here, we model the light propagation as photon transport which is described by the diffusion equation, an asymptotic approximation to the following radiative transfer equation [2]: for (r, ξ, t) ∈ Ω × S × R + , where µ a and µ s are the absorption and scattering coefficients, c is the speed of light in the medium, S is the unit sphere, and I(r, ξ) is the specific intensity defined as the intensity at position x in the direction ξ.
The diffusion approximation to RTE is widely used in many applications.It holds when µ s µ a , the point of observation is far away from the medium boundary and the observation time is sufficiently long.If we introduce a rescaling of µ a , µ s and t, and perform the asymptotic expansion for I(r, ξ, t), then (2.1) yields the following the diffusion equation: with φ(r, t) = I(r, ξ, t)dξ being the energy (photon) density, κ(r) = 1 3(1−g)µt being the diffusion coefficient, µ t = µ a + µ s being the extinction coefficient, and g = S ξ • ξ p(ξ • ξ )dξ being the anisotropy.Here, g ∈ (−1, 1).
In our application the medium is a bounded domain, hence, it is necessary to account for boundary layers by adding the following boundary condition to (2.2): where q is the source term, ν is the outward normal to ∂Ω, and l ext is the extrapolation length calculated to be l ext = 2ζ(c)κ(r) in our case.ζ is the refractive index mismatch at the boundary.Since we consider our problem in the continuous wave case, we apply Fourier transform to (2.2), add boundary condition (2.3) and arrive at the following two coupled partial differential equations: φ e (r; ω) + 2ζ(c)κ(r) ∂φ e (r; ω) ∂ν = q(r; ω) on ∂Ω, (2.5) ) Here, (2.4)-(2.5)describe the propagation of light at the excitation wavelength λ e while (2.6)-(2.7)describe propagation of the re-emitted light at the fluorescence wavelength λ f .The internal source term q (r; ω) in (2.6) represents the portion of the photons absorbed at λ e and re-emitted at λ f during the emission process.To be precise, q (r; ω) is a product of the unknown fluorescence concentration C(r) and the photon density φ e (r; ω), that is, where η is a constant determined by the fluorescence and the absorption cross section at the excitation wavelength λ e .The detectable flux on the boundary for both excitation and emission processes is the normal current J n (ξ; ω) across the boundary: (2.9)

FEM formulation
Analytical solutions of the two systems of equations ((2.4) and (2.5), (2.6) and (2.7))only exist for simple geometries.For practical applications, a numerical model must be employed in order to incorporate inhomogeneous parameter distributions and complex boundaries.A comprehensive computational model of the diffusion equation has been developed in [4,26,27] where a Galerkin finite element method is used to compute the numerical solution of the coupled systems.The details of the FEM formulation is presented as follows.
Since the excitation process and the emission process share the same physical principles with only different source terms, we take the first system (2.4)-(2.5)as an example to derive its discrete formulation.(2.6)-(2.7)can be dealt with using the same strategy.For convenience we omit the superscript for φ e .
The Galerkin method proceeds as follows.First write (2.4) in the variational formulation: (2.4) has a solution if there exists a solution of where H 1 (Ω) is the set of square-integrable functions with square-integrable weak derivatives in Ω.
Integration by parts yields where ν denotes the outward normal to ∂Ω.
Combining the detectable flux on the boundary (2.9) and boundary condition (2.5), we obtain To apply the finite element method, one defines an N -dimensional subspace U h ⊂ H 1 (Ω) and uses an N-dimensional polynomial basis expansion {u i } N i=1 ∈ U h with local support over domain Ω.A piecewise polynomial approximation φ h of the field φ can be expressed as which is defined by the vector Φ ∈ C N of nodal coefficients in the excitation process.The problem thus becomes: find holds for all u j , j = 1, . . ., N .Formulation (2.14) can be expressed by the following linear system: where K, C, A, and B are sparse symmetric positive definite system matrices and Q(ω) denotes the nodal values of the source.Individual integrals over an element Ω el i are given by and for the external source in the excitation process for the internal source in the emission process u i (r)q (r; ω)dr.
Define the stiff matrix Then the discrete formulation of (2.4)-(2.5)can be simply written as (omitting the frequency ω) (2.16) Similarly, the system of equations (2.6)-(2.7)can be written in the discrete form where the superscript f represents the emission process.
In the course of the experiment, the position of the light source is varied in order to generate enough independent observations to allow for reconstruction.
Suppose there are L illumination points.For the lth illumination point I r , denote by Φ e l , Φ f l ∈ C N and Q e l , Q f l ∈ C N the (discrete) photon densities and the source terms, respectively.We obtain L discrete coupled systems: Source Q f l of the lth emission process is given by the following formula: where C ∈ C N is the vector of the unknown nodal value of the fluorescence distribution to be recovered.The operation diag turns the vector C into a diagonal matrix by putting all entries of the vector on the diagonal of the square matrix.Define the matrices The systems (2.18) and (2.19) can be written in a compact form The compactly written systems (2.21) and (2.22) complete the FEM formulation of FMT.

Measurements
Early FMT experiments adopted a contact configuration, meaning that the fibers used for photon collection need to be in contact with the tissue or require the tissue to conform to simple geometries such as a slab or a cylinder.The most common approach to overcome this problem is to insert the subject in some fluid of similar optical properties [24].This approach however introduces additional absorption and scattering, significantly degrading the resolution and the signal to noise ratio unnecessarily.In the ideal configuration, the detectors should be detached from the object to be imaged, offering simple experimental procedures and yielding large numbers of tomographic projections.Hence, a non-contact configuration is employed in our detection by placing the detectors away from the object surface.This allows imaging of arbitrary geometries in the absence of matching fluid, thus boosting the resolution and quantification accuracy to the maximum.

Free-space light propagation model
A free-space light propagation model to describe the intensity distribution from a diffusive volume of arbitrary shape Ω to a set of noncontact detectors is added to (2.21) and (2.22).Assume that there are M effective pixels on the detector plane.For both the excitation and emission processes during lth illumination, l = 1, . . ., L, the measured power P e l , P f l ∈ C M can be expressed by the product of the field Φ l ∈ C N and a transportation matrix Γ ∈ C M ×N of free-space light propagation.Γ is assumed identical for both processes: Define the matrices . . .Φ e L ] ∈ C N ×L .The system (2.23) and (2.24) can be written compactly as The calculation of the transportation matrix Γ can be found in [23].

Linear representation of a FMT experiment
(2.21), (2.22), (2.25) and (2.26) give a complete description of the whole procedure of FMT in discrete form.The formulation of the forward problem can thus be derived.During a complete FMT experiment, after one illumination measurements are made for both the excitation and emission processes.In order to cancel out location-dependent gaining factors, we calculate the measurement vector Y using a normalization technique.
For the lth illumination, define the measurement Y l,k at the kth effective pixel by the ratio of the measured power P f l (k) (kth entry of the vector P f l ) of the emission process at kth pixel and the measured power P e l (k) of the excitation process at the kth pixel: Here, Γ k denotes the kth column vector of Γ, (S f ) −1 (i), i = 1, . . ., N is the ith column vector of the inverse of the matrix S f during the emission process, C is the unknown vector of fluorescence distribution and C(i), i = 1, . . ., N denotes the ith entry of C.
Define the row vector Then a single measurement point Y l,k at the kth pixel during lth illumination can be expressed by the inner product of two vectors W l,k and C. Stack all M row vectors W l,k , k = 1, . . ., M during lth illumination to get the lth weighting matrix T ∈ C M ×N , and define the lth measurement vector a linear system representing the complete FMT experiment with L illumination processes.

Design matrix V
To be precise, designing an optimal illumination pattern refers to optimizing the spatial positions of the lasers used in the experiment as well as the power of each laser.Both the location information and the laser power are encoded in the group of external source vectors {Q e l } L l=1 defined in (2.18).In order to find out how the external source impacts the reconstruction result, we rewrite the measurement Y l,k from (2.27) as a function of the external source vector Q e l .Let (S e ) −1 be the inverse of the matrix S e .According to (2.18), Φ e l = (S e ) −1 Q e l , hence, Define the row vector V l,k := Then the measurement point Y l,k can be written as the inner product of two vectors V l,k and Q e l .Using the same strategy as the last section by stacking all M row vectors V l,k to get the lth design matrix T ∈ C M ×N associated with the lth illumination I l , it is easy to see that Once again assemble all L measurement vectors into a linear system that reveals the relation between the external source Q e and the measurement Y .Note that V is a function of the fluorescence distribution vector C, which indicates that designing a new illumination pattern requires the information of the reconstructed fluorescence distribution as a prior.(2.28) and (2.30) constitute a complete forward model for FMT with a designed illumination pattern.

Definition of the illumination pattern Σ
The illumination pattern physically refers to the spatial distribution of the lasers (illumination points) on the object surface.Moreover, it also encodes the information of the power strength of each laser.Mathematically, the illumination pattern vector Σ is defined by means of the set of external source vectors {Q e l } L l=1 , L is the number of illumination points used in an FMT experiment.Recall that the illumination process in an FMT experiment proceeds as follows: the first laser beam shoots at location ξ 1 ∈ ∂Ω on the surface of Ω.Light propagates through the tissue exciting the fluorescence inclusion and then reaches the detector plane, measured power P e 1 is recorded.Re-emitted light with shifted wavelength λ f due to the relaxation of the excited fluorephore reaches the detector plane and measured power P f 1 is recorded.Then the second laser beam shoots at location ξ 2 ∈ ∂Ω and measured power P e 2 and P f 2 are recorded and so on, until the Lth laser beam.
To improve the quality of the reconstruction result, a straightforward strategy is to use a large number of illumination points to increase the amount of detected information.However, this strategy is not always numerically tractable due to the size of the data set.Moreover, it leads to a significant increase in the acquisition time [11].To remain compatible with in vivo measurements, we need to limit the number of illumination points.In other words, the number of illumination points L is expected to be relatively small and should be under control.
Notice that during each illumination, the source vector Q e l contains exactly one 1 nonzero entry at the corresponding location ξ l ∈ ∂Ω.Hence, the vectors {Q e l } L l=1 are all sparse.Adding up all L vectors we define the illumination pattern Σ := L i=1 Q e l that is an N -dimensional column vector with exactly L nonzero entries.
Under the FEM framework, Ω is discretized into a set Ω el of N elements.All the elements on the surface of Ω constitute the surface element set ∂Ω el and the rest of the elements belong to the inner element set Ω el in = Ω el \∂Ω el .Regard Σ as a function mapping an element Here we only concern the location information carried by an element Ω el i .It is easy to see that Σ maps the set Ω el in to {0} because only the surface of the target can be illuminated.For the surface set ∂Ω el , only L elements have nonzero function values since there are L illumination points.Additionally, we demand that the number of illumination points to be small, hence, L N .Meanwhile, the laser power should be under a maximal value due to safety reasons.In other words, ||Σ|| ∞ ≤ p max with p max the maximally allowed value of the laser power.
In summary, the set ∆ of admissible illumination patterns is defined as follows: 1 This is based on the assumption that the laser power perfectly concentrates on a node ξ l of the finite element mesh.In a real FMT experiment, the laser power decays around its center obeying some certain rule.In most cases, Gaussian distribution is accurate enough to describe such a phenomenon.Hence, we can amend our statement to be more rigorous: Q e l has only a small number of nonzero entries.
where || • || 0 denotes the l 0 norm that counts the nonzero entries of a vector, || • || ∞ denotes the supremum norm that returns the maximal amplitude of a vector.Note that we impose the inequality constraint ||Σ|| 0 ≤ L instead of the equality constraint ||Σ|| 0 = L to allow for more freedom in controlling the number of illumination points.

Inverse model: Sparse reconstruction and illumination pattern optimization
This section dedicates to solving two inverse problems.The first one is reconstructing the fluorescence distribution with sparse regularizers.The second one is finding the optimal illumination pattern given a reconstructed fluorescence distribution.Both problems involve an objective function to be minimized.We first define the objective functions and then design suitable iterative algorithms to solve these optimization problems.In the end, we explain in detail our two-step approach of reconstruction with an optimized illumination pattern.

Recover fluorescence distribution C
In this section we formulate the inverse problem for recovering the fluorescence distribution, discuss the choice of suitable sparse regularizers, and then present the algorithm we use for this inverse problem.

Formulation of the inverse problem
Given an illumination pattern Σ, we aim to recover the fluorescence distribution in the object Ω so as to image the location and shape of the target of interest.This corresponds to reconstructing the unknown vector C in (2.28): Such a problem is usually ill-posed.Instead of direct inversion, we take an optimal design approach which formulates the inversion of (2.28) as an equivalent optimization problem: arg min where the objective function J is defined by a data-fitting term A(C; Σ) plus a regularization term R(C).A(C; Σ) is defined in the least-squared sense: where Σ denotes the illumination pattern defined in (3.2).

Basic choice of R(C): Lasso
In most imaging problems we are considering, the target is confined in a small region.In other words, the fluorescence distribution C is a sparse vector.We can thus employ l 1 regularization to promote sparsity and define: where

Advanced choice of R(C): Elastic net
Numerical experiments show that while pure l 1 regularization captures the sparsity of the solution, it tends to bring discontinuities into the reconstructed image.In order to improve the smoothness of the solution, we replace the pure l 1 penalty with the elastic net penalty which makes a compromise between the l 2 and the l 1 penalties: 2 ), where α ∈ [0, 1] is a constant.
Such a combination of norms takes advantage of both regularizers and finds a balance between sparsity and the integrity of the image.
The optimization problem thus writes as follows: arg min Note that when α = 1, (3.4) becomes pure l 1 regularization, and when α = 0, (3.4) is Tikhonov regularization.In summary, the problem of reconstructing the fluorescence distribution C is formulated as the following minimization problem: arg min where R(C) can be chosen between λ||C|| 1 and λ(α||C||

Algorithm for the inverse problem: proximal gradient descent
This section focuses on designing a fast and accurate iterative algorithm to solve (3.5) based on gradient descent methods.Notice that both the Lasso and the elastic net regularizers are convex nondifferentiable due to the l 1 -norm, so it is necessary to seek a variation of the usual gradient descent method in order to handle the nondifferentiality.
Recall that the gradient descent for a convex differentiable function f (x) usually takes the form x t+1 = x t − s t ∇f (x t ), t = 0, 1, 2 . . ., where s t > 0 is the step size.
(3.6) has the following alternative representation: where •, • is the euclidean scalar product.Notice that the objective function J in (3.5) is nondifferentiable but J can be decomposed into a sum of a convex differentiable part (3.2) and a convex nondifferentiable part R. Hence, the usual gradient step (3.6) needs to be generalized: where f has the aforementioned type of decomposition: The update (3.10) has an explicit formula when h is the lasso or the elastic net penalty.We simply take a gradient step and then perform elementwise soft-thresholding.To be precise, when where τ θ : R N → R N is the soft-thresholding shrinkage operator defined by with θ ∈ R a given threshold.(x) + := max{x, 0} is the rectifier function.
Following the idea of proximal gradient descent plus an accelerating process [5], we implement the following algorithm for the reconstruction procedure.Due to sparsity, the initial guess of C is set to be the zero vector.
Here, L is the largest eigenvalue of W T W .

Design illumination pattern Σ
This section first introduces the optimality condition and then formulates the minimization problem by adding the constraints from the admissible set ∆ as different regularizers one by one.In the end, the algorithm to solve this constrained minimization problem is explained.

Optimality condition
Each entry of the vector Σ encodes the location as well as the laser power information at the corresponding node on the object surface.Given a reconstructed fluorescence distribution vector C, we calculate the design matrix V (C) defined in (2.30).The design of the optimal illumination pattern is achieved by solving the following linear system: subject to the restriction Σ ∈ ∆.In other words, we want to solve the following optimization problem: (3.17)

Solving the constrained minimization problem (3.17)
We try to solve this multi-constrained problem by adding one constraint at a time to (3.16).First add ||Σ|| 0 ≤ L and we arrive at Note that (3.18) is nonconvex due to the 0-norm and thus computationally intractable.To make it solvable, consider its l 1 relaxation formulated as the following basis-pursuit program: Problem (3.19) can be written in the equivalent Lagrangian form: where µ > 0 is the regularization weight.Solving the basis-pursuit program (3.19) is only equivalent to solving the original problem (3.18) under specific conditions, see chapter 10 of [16] for details.In general, the solutions of (3.19) and (3.18) do not coincide.This is because l 0 norm and l 1 norm act differently as a regularizer: l 1 norm penalizes large coefficients more heavily than smaller coefficients in terms of magnitude.In order to rectify the influence of the l 1 norm, we can adopt a reweighted l 1 minimization approach to make l 1 norm behave more like l 0 norm [8].
The idea is as follows: replace where H is a N × N diagonal matrix with nonnegative diagonal entries h 1 , . . ., h N defined inversely proportional to Σ 1 , . . ., Σ N , respectively.An iterative procedure that alternates between recovering Σ and redefining H can be used to solve (3.21).
is a parameter that should be set slightly smaller than the expected entries of Σ to avoid numerical overflow.Now we only need to solve (3.22) plus the remaining constraints, i.e., arg min Notice that the objective function f (Σ) in (3.23) has an additive decomposition: where g(Σ) = 1 2 ||Y − V (C)Σ|| 2 2 is convex differentiable, and the univariate functions d j (Σ(j)) = µh j |Σ(j)| are convex.
Coordinate descent algorithm is perfectly tailored for such a separable structure as the separability shown in (3.24) guarantees convergence to a global minimizer.Meanwhile, coordinate descent also makes it easy to set upper and lower bounds on Σ(j) by simply computing the coordinate update and setting the Σ(j) that violates the bounds to the closest boundary.To this end, the second constraint is successfully added.
The last constraint can be added by applying the Lagrange multiplier to the linear system Y = V Σ to arrive at the following extended linear system Ỹ = Ṽ Σ, (3.25) where Ṽ = V S p and Ỹ = Y O .
Here, S p ∈ R N ×N is a diagonal matrix with only a few entries of value 1 on the diagonal and all the remaining entries 0. These nonzero entries correspond to all the internal nodes of the mesh.O ∈ R N is a zero vector.Such an extended linear system will force all the entries in Σ that correspond to the internal nodes to have value 0. Hence, the last constraint is satisfied.
In summary, we are dealing with (3.25) and trying to solve the following minimization problem: by using a coordinate descent method.In the next section, a detailed coordinate descent algorithm for solving (3.26) is presented.

Coordinate descent as the main solver
Coordinate descent is an iterative algorithm that updates from the tth step Σ t to next step Σ t+1 by choosing one coordinate to update and then performing a univariate minimization over this coordinate [16].More precisely, if the coordinate j is chosen at iteration t, the update is given by As we cycle through all the coordinates a complete update Σ t+1 is obtained.The algorithm is hence usually referred to as Cyclic Coordinate Descent.An explicit solution for (3.26) can be derived using the soft-thresholding operator τ introduced in previous sections .

Combining fluorescence reconstruction and illumination pattern optimization
It is easy to notice that the problem of recovering C and of determining Σ are mutually dependent, each takes the output of the other as an input.This inspires us to design the following two-step approach in order to obtain the best possible reconstruction quality.By iterating back and forth between step 2 and step 3, we are updating the illumination pattern for the next reconstruction based on the reconstructed fluorescence distribution from the previous step.In the end, this two-step approach is expected to produce a better reconstruction result compared to that of a fixed and unoptimized illumination pattern.

Numerical experiments
The two-step approach from the last section contains a fluorescence distribution reconstruction step and an illumination pattern optimization step.A combination of these two steps constitutes a complete loop for updating the illumination pattern and is referred to as one round.Several rounds are performed in order to improve the final reconstruction quality.We validate our two-step approach by performing numerical experiments on a virtual cubic phantom under the reflection mode, i.e. the laser array and detector array are restricted to the same face of the cubic phantom as opposed to the transmission mode where lasers and detectors are placed on the opposite faces of the phantom.

Experiment setup
Table 4.1: Left: 15×15×15 mm 3 phantom with absorption coefficient µ a = 0.01 and scattering coefficient µ s = 1.Blue region represents the virtual detector plane that is made up of a 60 by 30 detector array.Each detector has size 1×1 mm 2 and an efficiency of 100%.Red points represent the 10 by 10 laser array.Each laser point has power 1 W/mm 2 and wavelength 670 nm.The setup of the phantom is performed by the TOAST++ software [27].Right: birdview of the laser and detector array from the top surface of the phantom.Laser array is placed at the center of the detector plane.
During the reconstruction step, l 1 regularization together with FISTA is used to recover the fluorescence distribution vector C. Regularization weight λ during the reconstruction is empirically chosen to be 10 −4 and kept the same throughout all rounds.During the illumination pattern optimization step, the regularization weight µ is chosen to be 1.5 × 10 −8 and kept the same throughout all rounds of the experiment.

Result: updated illumination patterns at each round
It can be seen in Table 4.3 that with the value of µ = 1.5 × 10 −8 , the number of lasers is restricted to approximately 70.After the first round of iteration, the illumination pattern is already very different from the initial spatial layout of the lasers, instead, it is shaping into the optimal illumination pattern as is shown in the following updated illumination patterns.It seems that most lasers are taking their positions near the edge of the top surface and only a few of them are intruding the center of the top surface.After the second round of iteration, both the number and the spatial distribution of the lasers already tend to stabilize with only a few laser points' position changed.After the 9th rounds of iteration the illumination pattern is still very stable.Besides, the laser intensity profiles demonstrate that after the second round of iteration, all lasers have reached maximal possible intensity 1W/mm 2 , in other words, the laser intensity has a homogeneous distribution.One plausible explanation is that with the limited number of lasers available, all the lasers reach maximal value of intensity at their optimal position so that more useful information can be obtained.In some sense, the intensity of the lasers may not be the dominating factor compared to the spatial layout of lasers in terms of improving the reconstruction result.

Result: reconstructed fluorescence distribution at each round
It is easy to see that the reconstructed fluorescence distribution (first row) based on the initial laser setting loses information of the object's edges, especially in 1(a).The reason for this might be that in the initial laser setting all illumination points are densely distributed at the center of the top surface so that information about the edges can not be effectively captured.After the second round of iteration, with the illumination points moving towards the edges of the top surface, more information is obtained and the reconstruction result gets improved.This is seen from 2(b) and 2(c) where images are brighter than those in the first row.However, 2(a) is still problematic.Fortunately, as we run more rounds of updates, the quality of the reconstruction results drastically exceeds that of the initial laser setting.The shape, location and the concentration of the fluorescence distribution are well recovered.What's more, the reconstruction results are quite stable up to 10th round of iteration.This is in accordance with the updated illumination patterns in Table 4.3 where the illumination patterns tend to stabilize after 9 rounds of updates.

Error analysis
This section carries out quantitative error analysis of the reconstruction results to evaluate the quality of the reconstructed three-dimensional images.We first define the four metrics used for quantifying the error and then present the error analysis results for the experiment.

Metrics for error analysis
We employ four most common metrics for image processing: mean square error (MSE), Dice similarity (Dice), volume ratio (VR) [28] and signal to noise ratio (SNR) [14].Suppose image vector x is the reconstructed image and x * is the ground truth of the same dimension as x.In case x and x * are two-dimensional or three-dimension image matrices, serialize the image matrix into a vector first.Denote N the total number of entries of x.Define the region of interest (ROI) the set of entries that have intensity greater than one third of the maximal intensity of the image x.In other words, ROI(x) = {x i : The four metrics are defined as follows: • The mean square error (MSE) describes the general deviation of the reconstruction result from the ground truth: The smaller the MSE, the better the reconstruction result.Here, | • | denotes the cardinality of a set.Note that Dice ranges from 0 to 1. Hence, If Dice is close to 1, the reconstructed image is well overlapping with the ground truth; otherwise if Dice is close to 0, the reconstructed image is not recovering the true image at the correct location or shape.
• Volume Ratio (VR) measures the ratio between cardinality of the true region of interest (ROI) and that of the reconstructed ROI: In ideal conditions, VR should be 1.If VR is smaller than 1, it indicates that the reconstructed image is oversparsified; if VR is greater than 1, the reconstruction is diffuse.
• Signal to Noise Ratio (SNR) expressed in decibels (dB) measures how well the reconstructed image is distinguished from the background noise: .
The higher the SNR the better image we have produced.

Reconstruction error at each round
Using the metrics defined in the last section, the error of the reconstruction results from the 7 rounds of iterations are displayed in Table 4.6.
According to the definition of the four metrics, the reconstruction result at the third round has the best quality since it has the lowest MSE, highest Dice similarity and SNR, and an almost perfect volume ratio (≈ 1).It is safe to say that the reconstruction result is indeed improved after merely two rounds of iterations.
It is interesting to notice that the reconstruction result produced by the first updated illumination pattern is actually worse than that produced by the initial laser setting with a fairly low Dice similarity and a negative SNR. Negative SNR means that the signal is drowning in the noise.However, after updating the illumination pattern again based on this bad reconstruction result, the produced reconstruction result enjoys a big improvement in quality.In the following fourth and fifth rounds, the reconstruction quality is slightly declining.Hence, the best reconstructed result is achieved after two rounds of experiments.As we increase the number of iteration rounds the quality of the reconstruction result becomes stable.

Uniqueness of the optimal illumination pattern
Now that we have achieved the optimal illumination pattern for our experiment, it is natural to ask if the optimal illumination pattern depends on the initial illumination pattern.To be precise, using the same phantom as in Table 4.1, the same fluorescence ground truth as in Table 4.2 and the same regularization weight µ = 1.5 × 10 −8 , does the optimal illumination pattern change as the initial illumination pattern varies?If the optimal illumination pattern does not depend on the initial laser setting, it is unique.We perform another set of experiments starting from a different initial laser pattern to find out if the updated illumination patterns converge to the optimal one from the previous experiment.

A different initial laser setting
Without loss of generality, the initial illumination pattern is still the 10 by 10 laser array but with the initial intensity of each laser changed to half the intensity of the previous experiment namely 0.5W/mm 2 .In addition, the center of the laser array is moved to the bottom left corner of the top surface of the phantom:

Results and error analysis for the new laser setting
The updated illumination patterns behave almost the same as the ones in the previous experiment: quickly shaping into the optimal illumination pattern as before and the updated patterns become very stable as we iterate up to 10 rounds.
The laser intensity profiles of every round tend to resemble those from the previous experiment as the number of iteration rounds increases.Slightly different from Table 4.4, the intensity of each laser in all rounds of iteration is identical and reaches the maximal intensity 1W/mm 2 .
The reconstruction results in the new experiment are displayed in Table 4.10.The first reconstruction result that is based on the new initial illumination pattern does not capture the complete information of the fluorescence distribution, especially in 1(a).The result after the first updated illumination pattern has more information obtained as is seen in 2(a).Interestingly, the information content in 2(a) is not complete either, and the lost part seems to be that in 1(a).The reason for this might be explained by the comparing the initial illumination pattern and the first updated illumination pattern in Table 4.8.It can be observed that all the lasers are inclined to run away from the the left bottom corner where all lasers were in the initial setting, so the information from the left bottom corner is consequently missing.However, as more iterations are performed, the lasers quickly find their optimal positions on the top surface, hence the corresponding reconstruction results are improved.In the end, the results remain unchanged in Illumination 4, 5 and 9 because the updated illumination patterns in these rounds are the same as well.Even though the new initial illumination pattern produces worse first reconstruction results 1(a), 1(b), and 1(c) in Table 4.10, the updated illumination patterns and the reconstruction results still converge to the optimal one very quickly.
Error analysis in Table 4.11 shows that the 4th round produces the best reconstruction result.Furthermore, by comparing the quantitative error using the four metrics defined before between the best reconstruction result in the previous experiment (appears in the 3rd round of iteration) and this experiment (appears in the 4th round of iteration) as is shown in Table 4.12, the quality of the best result in both experiments are nearly the same.Besides, the reconstruction quality is greatly improved compared the that of the initial setting in both cases.Summing up the conclusions before, it is safe to say that the optimal illumination pattern is independent of the initial laser setting.

Discussion
When actually implemented in the numerical experiments section, both FISTA algorithm and the Cyclic Coordinate Descent prove to be quite fast.Notice that Cyclic Coordinate Descent is very similar to stochastic gradient descent (SGD) which randomly chooses one coordinate to calculate the gradient at each step.In our case, Cyclic Coordinate Descent has an excellent performance in terms of computational speed, so SGD is not considered.In practice, if the system becomes larger and Cyclic Coordinate Descent becomes costly, we can simply transform it into SGD to save computational time.
It is also worth pointing out that µ needs to have different values for design matrix V and matrix S p if we wish to restrict lasers on a certain region of the phantom surface.If we regard µ as a first-level parameter, after determining appropriate µ, a second-level parameter that restricts the lasers to the desired region of the object surface needs to be determined as well.During phantom experiments, more penalty was enforced on the matrix S p so that the nonzero entries of the solution are confined solely on the top surface of the phantom.Overall, the proper approach for the choice of the value of the regularization weight should be that we first adjust µ to make sure the solution is stable and meaningful, in other words, a convergence-type analysis for µ is needed; then slightly adjust the finer penalty parameter placed on S p to attain the desirable number and location of lasers according to our need.However, choosing the proper value of µ is non-trivial because µ is a highly sensitive parameter.Suitable values of µ produce desired number of lasers whereas tiniest changes in µ can lead to the solution oscillating between no Table 4.9: Laser intensity profiles of each updated illumination pattern starting from the new initial laser setting.First column from top to bottom: laser profile for 1st, 3rd, and 5th updated illumination patterns.Second column from top to bottom: laser profile for 2nd, 4th, and 9th updated illumination pattern.lasers at all and over-numbered lasers.Currently, there exists no systematic way of determining the regularization weight µ.One possibility might be to tune the value of µ using machine learning techniques.Further investigation of the factors that influence the value of µ will also shed light on the choice of µ and thus enable our method to adapt to various experiment setups in practice.
In addition, the imaging of time-varying targets is becoming a trendy topic, too.Dynamical imaging methods have been developed in fields such as ultrafast ultrasound imaging [1] in which the reconstruction is combined with some tracking technique, and electrical impedance tomography [18] where the determination of optimal current pattern in cases of imaging timevariant targets is made possible by using statistical tools.These dynamical methods provide good inspiration for extending the static methods developed in this work to dynamical FMT in the future.

Conclusion
We start with a complete theoretical analysis of the diffusion equation that describes the physical processes of FMT and also the discrete formulation of the equation systems.Together with the free-space light propagation model due to the noncontact configuration, a comprehensive forward model is derived.Based on the forward model, we introduced sparse regularization techniques for the inverse problem of recovering the fluorescence distribution.Sparse regularizers have an advantage in promoting the sparsity of the solution to the linear system thus producing reconstruction results with higher spatial resolution.
Biologists have noticed that designing optimized illumination patterns plays a very key role in improving the reconstruction quality of FMT.In order to design the optimal illumination pattern, we first introduce a proper mathematical definition of an illumination pattern Σ making use of the discrete formulation.With such a definition, we regard the restrictions on the laser properties including intensity, location and total number as restrictions on different norms of Σ and hence define the admissible set ∆ of illumination patterns.Designing the optimal illumination pattern thus becomes solving another linear system with the illumination pattern vector Σ as the unknown subject to all the constraints in the admissible set.We also performed a detailed analysis on the inverse problem of designing Σ and proposed a fast iterative algorithm, the Cyclic Coordinate Descent algorithm, to solve the inverse problem.
At the end, we combine the sparse regularization techniques and designing the optimal illumination pattern as a two-step approach to further improve the reconstruction quality.Numerical experiments on a small cubic phantom are performed on different initial laser settings.It turns out that with suitable choice of µ the updated illumination pattern after each round converges quickly and remains very stable after merely four or five rounds.In addition, the optimal illumination pattern does not depend on the initial laser setting.At the same time, the reconstruction result corresponding to the optimal illumination pattern proves to be better than the reconstruction result based on the initial unoptimized illumination pattern in all metrics.The spatial resolution is increased with a higher Dice similarity and lower mean square error.The signal to noise ratio is greatly increased as well.Numerical experiments have demonstrated that our twostep approach outperforms the existing nonsparse reconstruction methods without optimizing the illumination pattern.

a
linear representation of the lth FMT experiment.In order to obtain sufficient measurement data, put together all L measurement vectors by stacking all column vectors Y l , l = 1, . . ., L into a long column vector Y = Y T 1 , Y T 2 , . . ., Y T L T ∈ C LM to obtain the final measurement vector Y .Perform the same technique for all L weighting matrices W l , l = 1, . . ., L to get the final weighting matrix W λ > 0 is the regularization weight.(3.1) thus becomes the following Lasso estimator: arg min

Table 4 . 2 :
Fluorescence ground truth setting sliced at reference point (13, 9, 6) viewed from three directions: top, left and front of the phantom (from left to right).The fluorescence ground truth is composed of two identical bars of size 1×1×10 mm 3 and fluorescence intensity 100 a.u..The two bars are both embedded at a depth of 3 mm from the top surface of the phantom and 6mm away from each other.

Table 4 . 3 :
Updated spatial distribution of lasers after round 1, 2, 3, 4, 5 and 9. Red points represent the lasers.Laser positions are attached to nodes of the finite element mesh on the top surface of the phantom.The number of lasers (illumination points) are indicated above each figure.

Table 4 . 4 :
Laser intensity profiles of each updated illumination pattern.First column from top to bottom: laser profile for 1st, 3rd, and 5th updated illumination patterns.Second column from top to bottom: laser profile for 2nd, 4rd, and 9th updated illumination pattern.On each panel the x−axis indicates the node number of lasers and the y−axis indicates the intensity (W/mm 2 ) of each laser.

Table 4 . 5 :
Reconstruction results sliced at the same reference point(13,9,6) as the ground truth in Table4.2.The first row (Illumination 0) contains reconstruction results of the initial laser setting viewed from the top (a), left (b) and front (c) of the phantom.Row 2 to row 7 are reconstruction results from the 1st, 2nd, 3rd, 4th, 5th and 9th updated illumination pattern.• Dice similarity (Dice) between the reconstructed ROI and the true ROI measures the shape and location accuracy of the reconstructed image: Dice = 2|ROI(x) ∩ ROI(x * )| |ROI(x)| + |ROI(x * )| .

Table 4 . 6 :
Mean square error, Dice similarity, volume ratio and signal to noise ratio of the reconstruction results at round 1-6 and round 10.The error of the reconstruction result based on the initial laser setting, i.e., the first round of iteration is indicated by the red bars.The blue bars represent the error of following rounds of iterations that are based on updated illumination patterns.
Table 4.7: A different initial illumination pattern viewed from the top surface of the same phantom.Except for the initial laser intensities and the location of the laser array, everything else remains the same as the previous experiment.The detector array is the same 60 by 30 array represented by the blue area; the laser array is the 10 by 10 point array represented by the red points.

Table 4 . 8 :
Updated spatial layout of illumination points after round 1, 2, 3, 4, 5 and 9 of iteration starting from the new initial illumination pattern in Table 4.7.Red points represent the lasers.The number of lasers (illumination points) are indicated above each figure.

Table 4 . 11 :
Mean square error, Dice similarity, volume ratio and signal to noise ratio of the reconstruction results at round 1-6 and round 10.The error of the reconstruction result based on the initial laser setting, i.e. the first round of iteration is indicated by the red bars.The blue bars represent the error of following rounds of iterations that are based on updated illumination patterns.
1. Input an initial illumination pattern Σ 0 ; 2. Use Σ k (k ≥ 0) to calculate the weighting matrix W k , solve (3.5) to obtain the fluorescence distribution C k ; 3. Use C k to calculate the extended design matrix Ṽk (C), solve (3.17) to obtain the updated illumination pattern Σ k+1 ; 4. Check if Σ k+1 varies much from is Σ k , otherwise, return to step 2.

Table 4 .
12: Comparison of quantitative error under the four metrics between the first reconstruction result based on the initial laser setting and the best reconstruction result of the first and the second experiments.