Anisotropic Total Variation Regularized L^1-Approximation and Denoising/Deblurring of 2D Bar Codes

We consider variations of the Rudin-Osher-Fatemi functional which are particularly well-suited to denoising and deblurring of 2D bar codes. These functionals consist of an anisotropic total variation favoring rectangles and a fidelity term which measure the L^1 distance to the signal, both with and without the presence of a deconvolution operator. Based upon the existence of a certain associated vector field, we find necessary and sufficient conditions for a function to be a minimizer. We apply these results to 2D bar codes to find explicit regimes ---in terms of the fidelity parameter and smallest length scale of the bar codes--- for which a perfect bar code is recoverable via minimization of the functionals. Via a discretization reformulated as a linear program, we perform numerical experiments for all functionals demonstrating their denoising and deblurring capabilities.


Introduction
In this article we study the application of total variation-based energy minimization for denoising and deblurring of 2D bar codes. A 2D bar code is a collection of non-overlapping black squares, the lengths of whose sides are all bounded below by some value ω, placed on a white backdrop. Analogous to the terminology for 1D bar codes, we call the lower bound ω the X-dimension of the bar code. Examples include stacked and matrix 2D bar codes illustrated in Figure 1 (see [22] for a thorough description of 2D bar code symbologies). When these bar codes are scanned, the resulting signal will be a blurred and noisy version of the original bar code. Efficient and robust techniques to recover the original bar code are needed to retrieve the information from the code.
The problem of denoising and deblurring images via variational methods has received much attention in the literature since the introduction of the Rudin-Osher-Fatemi (ROF) functional in [24]. This functional is the sum of a so called fidelity term, which measures the L 2 distance between the argument of the functional u and the given (measured) signal f , and the total variation of u, which acts as a regularization term. In this paper we will study variations of this functional that take into account the a priori knowledge that the original image which we want to recover is a 2D bar code.
This notation should not be confused with the L ∞ norm of a vector field v ∈ L ∞ (R 2 ; R 2 ), denoted by v ∞ , which is the supremum of |v(x)| over all x ∈ R 2 where | · | denotes the standard Euclidean norm. Let f ∈ L 1 (R 2 ) denote an observed signal and λ ≥ 0 be the so-called fidelity parameter. We consider the following functionals.
• Denoising: Minimizing this functional includes no deblurring effects but the regularizing effect from the anisotropic total variation will lead to denoising.
• Denoising and slight deblurring: Note here the smaller domain of binary functions which entails a very crude attempt at deblurring. Whereas the functional is no longer convex over its domain, it has a convenient convex reformulation (cf. Lemma 5.1) via the following functional: • Deblurring and denoising: Here K : L 1 (R 2 ) → L 1 (R 2 ) is a positive normalized 1 bounded linear operator, (2) e.g. convolution with a suitable blurring kernel which is commonly referred to (cf. [7]) as the PSF (point spread function). Our main interest here is when f is of the form f = Ku 0 , where u 0 ∈ BV (R 2 ), and in particular where u 0 is the characteristic function of a bar code (we could hence consider the larger class of operators K defined on BV (R 2 )). The linear operator K models the blurring of the bar code signal and deblurring is introduced by the action of K on u prior to comparison with f .
These functionals are variations of the original ROF functional with the following modifications: (i) The standard isotropic total variation of a characteristic function u ∈ BV (R 2 , {0, 1}) gives the length of the perimeter of the set {u = 1}. Using this as regularization term will lead to a rounding off of corners in the end result, because a rounded off corner has less interface than a sharp one. Our aim is to recover bar codes with sharp corners and hence we use this particular anisotropic total variation in F i whose corresponding Wulff shape (e.g. [25], [12]) is a square. The anisotropic total variation 2 we use gives, for a characteristic function u, the sum of the lengths of the projections of the perimeter of {u = 1} onto the coordinate axes. Moreover, it allows one to reformulate the discretized minimization problems in the form of linear programs which can be computationally solved quickly and efficiently (see Section 7). (ii) For the fidelity term we use the L 1 distance. As addressed in [5] the use of an L 2 fidelity term, as in the ROF functional, leads to a loss of contrast when minimizing over all of BV (R 2 ). From the point of view of image processing, the usefulness of this variational approach lies in the ability to denoise and deblur signals via minimization of the functionals. This has the potential to work well if the functionals are in fact faithful to the underlying images sought in the sense that if we input a perfect signal, minimization of the appropriate functional will indeed yield back the perfect signal. We will say a functional F i , i ∈ {1, 2} is faithful to a signal f if there exists some explicit regime for λ such that the signal f is the unique minimizer of F i . We say F 3 is faithful to a signal u 0 if there exists some explicit regime for λ such that u 0 is the unique minimizer of F 3 with f = Ku 0 . We thus focus on the following four questions: 1. If the parameter λ is chosen too small the lack of enforced fidelity to the measured signal will lead to the trivial minimizer u = 0. For which values of λ is this the case?
2. Are the functionals faithful to a clean 2D bar code and what are the associated values of λ? How do these threshold values for λ depend on the X-dimension of the bar code and the properties of the blurring operator K?
3. Are the F i faithful to other binary signals? This is particularly relevant to judge the denoising properties of our functionals: if only clean 2D bar code signals are returned unchanged by minimization, this is an indication that noisy bar code signals will be denoised.
4. What do numerical simulations for minimization of these functionals yield? Particularly, how do these minimization algorithms perform in the presence of noise?
Question 1 is answered in Lemma 2.2. The basis for answering questions 2 and 3 lie in the existence of a certain vector field (cf. the definition of V in (3)). Following an argument initially presented in [19], and elaborated on in [5], we find that a sufficient condition for u 0 to be a minimizer is the existence of a vector field v ∈ V(u 0 ) (cf. Theorem 3.2). Arguments 2 Our choice of an anisotropic total variation does have a significant drawback: with the anisotropy along the coordinate axes, it assumes that the measured bar code is aligned with the coordinate axes (see also the definition of bar code in Section 4). In practice this assumption does not necessarily hold and the bar code might be rotated or even seen from a skewed perspective. Either a preprocessing step which aligns the bar code with the axes or the use of a rotated form of the anisotropic total variation might be in order (e.g. [3,9,26]).
from convex analysis ( [10], [23]) show that the existence of such a vector field is also a necessary condition (cf. Lemma 3.1). We then use the sufficiency, and a particular vector field construction, to show that if λ > 4 ω , a barcode z is the unique minimizer of F 1 and F 2 with f = z (cf. Corollaries 4.2 and 5.5). If K represents convolution with any positive kernel (PSF) of unit mass, the same condition on λ insures that z is the unique minimizer of F 3 (cf. first part of Theorem 6.5). We use the necessity to answer question 3: the only binary signals f which F 1 is faithful to are clean 2D bar codes (cf. Lemma 4.4) and the only binary signals u 0 which F 3 with f = Ku 0 is faithful to are bar codes (cf. second part of Theorem 6.5). Question 4 is discussed in Sections 7 and 8.
Because of the importance these vector fields play in our analysis we introduce special notation for them (compare with the extremal pairs in [19, Section 1.14, Proposition 5]). For a fixed u ∈ BV (R 2 ) we define where the three conditions are There is an increasingly large literature on the analysis of these types of functionals (cf. [7]) and indeed, our work is highly guided by similar work of Chan, Esedoḡlu, Meyer, Osher, Ring, and others [24,19,12,5,23]. We are unaware of any analysis on this particular combination of L 1 fidelity and anisotropic total variation.

Trivial minimizer
We first state an elementary lemma involving the operator K. Its proof easily follows by decomposing u into its positive and negative parts and using the triangle inequality.
Lemma 2.1. Let u ∈ L 1 (R 2 ), K be as in (2), and Ω ⊂ R 2 be open, then where C is the isoperimetric constant from Lemma A.2. If 0 ≤ λ < λ 0 then u = 0 is the unique minimizer of F 1 and F 1 . If in addition ker K = {0} then u = 0 is the unique minimizer of F 3 as well.
Proof. The proof is essentially the same as that of [5,Proposition 5.7]. Since it is brief, we present it for F 3 . Let u be a minimizer of F 3 and hence F 3 (u) ≤ F 3 (0). By Lemma A.2 we find |Ku|.

Lemma 2.1 and Hölder's inequality tell us that
Applying this to the left hand side of the calculation above we get Since λ < λ 0 , R 2 |Ku| = 0.

Minimizers of F 1
We first explore the consequence of the simple property in convex analysis that (cf. [10]) for a convex functional F defined over a topological vector space V , where the subdifferential of F at u 0 (F (u 0 ) < ∞) is defined by Here V * denotes the topological dual space with pairing V * ·, · V . There are some subtleties involved in choosing the right space V to define our functional(s) on. Our choice here is to take V = BV (R 2 ). This has the advantage that it allows for some basic subdifferential calculus but forces us to work with the dual space BV (R 2 ) * and its associated pairing with BV (R 2 ) which both lack a simple general explicit description. However, we only need this dual space structure in a specific case in which we are able to give an explicit description of the dual element and its action on smooth functions in BV (R 2 ), see (10). An alternative approach would be to take V = L 2 (R 2 ) after extending all our functionals to L 2 (R 2 ) by setting their value to be +∞ on L 2 (R 2 )\BV (R 2 ). For the anisotropic total variation by itself, this would work well and for example, its subgradient over V = L 2 (R 2 ) was calculated in [20,Theorem 12]. However, the L 1 fidelity term then lacks the necessary continuity properties to be treated separately and we would need to compute the subgradient of the functional as a whole. Defining the functionals over L 1 (R 2 ) would solve this issue, but would still leave us with a complication in the computation of the subgradient of the anisotropic total variation because the gradient operator is not continuous on L 1 (R 2 ).
We introduce some notation. Let M 2 (R 2 ) denote the set of all vector valued Radon measures on R 2 with two components. Denote the transpose of the gradient operator . For a v ∈ M 2 (R 2 ) * and µ ∈ M 2 (R 2 ) we can write the coupling as R 2 v dµ. In particular if µ is absolutely continuous with respect to the Lebesgue measure we can identify µ via the Radon-Nikodym derivative with a function in L 1 (R 2 ; R 2 ) which we again denote by µ. In that case the coupling with v can be written as R 2 v · µ dL 2 (we will leave dL 2 out where there is no confusion).
Furthermore we introduce is a minimizer of F 1 over BV (R 2 ), then there exists a vector field v ∈ V(u 0 ) and λ ≥ div v L ∞ (R 2 ) .
Proof. This proof follows similar lines as the proofs of [4, Theorem 2.3] and [23,Proposition 3]. Note that the result is trivially true if u 0 = 0. We explore the consequences of 0 ∈ ∂F 1 (u 0 ). Because both terms in F 1 are continuous in u, the subdifferential ∂F (u 0 ) is given by the sum of the subdifferentials of the separate terms [10, Chapter I, Proposition 5.6]. The functional u → R 2 |u x |+|u y | can be written as the composition: · a •∇. Because · a is a continuous functional on M 2 (R 2 ) and ∇ is a continuous linear mapping from BV to M 2 (R 2 ), we can apply [10, Chapter I, Proposition 5.7]:

This means that
Next we turn to the subdifferential for the fidelity term in F 1 . First note that since We deduce that there exist v and ψ as above such that Choosing µ = 0 and µ = 2∇u 0 in the right hand side statement of (4) leads to Substituting this back into (4) gives, for all µ ∈ M 2 (R 2 ), If we restrict this to µ = (µ 1 , µ 2 ) ∈ L 1 (R 2 ; R 2 ) this reads and hence |v(x)| ∞ ≤ 1 almost everywhere, i.e. condition 1 in (3) holds. Letũ ∈ BV (R 2 ) and choose u = ±ũ + u 0 in the right hand side statement of (5) (and then drop the tilde), then we get that for all u ∈ BV (R 2 ) We use (8) to show that there exists an L ∞ (R 2 ) function such that the action of ψ on test functions is given by integration against this function. To this end, note that ψ ∈ W 1,1 (R 2 ) * and hence by [1, 3.4 and ) and use this as u in (8) to find via a substitution of variables We claim that the above implies q = 0 a.e. To see this note that since u, and hence ∇u, have compact support, we can replace, for ε small, q byq := q| y+supp u in the preceding integral. We have, for any p ≥ 1,q ∈ L p (R 2 ; R 2 ) and hence there exists a sequence Since {q n } ∞ n=1 is a compact set of continuous functions defined on a compact set by the Arzelà-Ascoli theorem it is equicontinuous and hence R 2q This allows us to interchange the limits in (9) and find We now apply the dominated convergence theorem to find that Because L p convergence implies pointwise convergence almost everywhere andq n (y) does not depend on the integration variable x, we conclude that for almost every y ∈ R 2 lim n→∞ R 2q The function u was chosen arbitrarily, hence q = 0 a.e. and we conclude for all u ∈ C ∞ c (R 2 ), We thus have p = ψ in the sense of distributions, and hence by equation (6), there exists r ∈ L ∞ (R 2 ) which as a distribution may be identified with ∇ * v such that we have −div v = r ∈ L ∞ (R 2 ) in the sense of distributions. This gives condition 2 in (3). Also by equation (6): −div v + λψ = 0, in the sense of distributions. We remember that we can write (7) as The vector field v satisfies all the conditions of Lemma A.3 and hence there exists a sequence we have that −∞ < ∇u 0 (R 2 ) < ∞ and hence the constant function C is integrable against from which we conclude that condition 3 in (3) holds. Finally, combining (8), (6), and the fact that ψ as distribution is represented by an and therefore λ ≥ div v L ∞ (R 2 ) .
be the measured signal in F 1 , then the following two statements are equivalent: Proof. The implication 1 ⇒ 2 follows directly from Lemma 3.1. For the reverse direction, we follow [5,Lemma 5.5]. To this end, for any u ∈ BV (R 2 ) Corollary A.4 implies , the last inequality is strict.

F 1 and 2D barcodes
We define some further notation to make the idea of a 2D bar code precise. By a 2D bar code we mean a bounded set S ⊂ R 2 such that ∂S is the union of a finite number of non-intersecting polygons, each a union of horizontal and vertical line segments. For s ∈ R, define the horizontal and vertical lines In words: The X-dimension is the shortest horizontal and vertical length scale of both the black squares and the white background. We denote the set of 2D bar codes by S and the set of 2D bar codes with prescribed X-dimension larger than or equal to ω by S ω . We identify the 2D bar codes with their characteristic functions (χ S is the characteristic function of the set S): By ∂ * S we denote the reduced boundary of a set S, i.e. all points in ∂S for which there is a well defined normal vector (see [2,Definition 3.54]). For example, if S is a square its reduced boundary consists of all boundary points except the corners.
Proof. Let n(x) be the outward normal to S at x ∈ ∂ * S and define the following subsets of ∂ * S: Figure 2 for an illustration.
for all i, then x (i) ∈ V + for odd i and x (i) ∈ V − for even i. The analogous statement holds for The function v 1 (·, s) is Lipschitz continuous for each s ∈ R and ∂v 1 in a similar way on each vertical line l v (s). In particular v 2 (s, ·) is Lipschitz continuous for each s ∈ R and satisfies Proof. This follows directly from Theorems 3.2 and 4.1. Proof. Note that if f = 0 the result is trivially true. We now assume that f = 0. We prove this via contradiction. For some A ⊂ R 2 , A ∈ S, let f = χ A ∈ B. By Lemma A. 8 Since A ∈ S, ∂ * A contains a non-horizontal and non-vertical curve of positive H 1 measure. In other words, there is a connected subset B ⊂ ∂ * A such that H 1 (B) > 0 and the generalized normal vector (cf. [2, Definition 3.54]) to ∂ * A satisfies |n 1 | < 1 and |n 2 | < 1 almost everywhere on B. Then for every j there is a connected subset B j ⊂ ∂A j such that H 1 (B j ) is uniformly bounded away from zero and the generalized normal vector n j to ∂A j satisfies the same inequalities as above. We denote the extensions of the normals n j into smooth vector fields on R 2 with compact support by n j as well.
Let c j ∈ C ∞ c (B j ) be a non-negative function such that |(1+c j )n j 1 | ≤ 1 and |(1+c j )n j 2 | ≤ 1 almost everywhere on B j . Then c j is uniformly bounded away from zero on a subset B * j ⊂ B j of positive H 1 measure. Let w j ∈ C 1 c (R 2 ; R 2 ) be such that |w j (x)| ∞ ≤ 1 for all x, w j = n on ∂A j \ B * j and w j = (1 + c j )n j on B * j . Then Because f minimizes F 1 over BV (R 2 ) we know by Theorem 3.2 that there exists a vector field v ∈ L ∞ (R 2 ; R 2 ) such that for almost all x ∈ R 2 , |v j (x)| ∞ ≤ 1 and Because |n(x)| ∞ < 1 and |v(x)| ∞ ≤ 1 almost everywhere on B we find Hence we deduce which is a contradiction.
5 Minimizers of F 1 and F 2 Next we investigate minimizers of F 1 . Due to the binary constraint on admissible functions, the minimization problem is no longer convex. However, following Chan and Esedoḡlu [5] (see also [6]) there is a simple and elegant convex reformulation of the problem.
Proof. The proof is essentially a repetition of [6,Theorem 2].
where C does not depend on v. Hence together with Lemma A.7, we can now write Since v is a minimizer of F 2 , we find that for almost every t ∈ [0, 1], χ E(t) is a minimizer of F 1 .
We now turn to minimizers of F 2 . In order to stay within the general framework of convex analysis, we have to define our functional on a vector space. Thus we define for all u ∈ BV (R 2 ), the functional where for a given set A, the ζ A is defined as Note that minimizing F 2 over BV (R 2 ) is equivalent to minimizing F 2 over BV (R 2 ; [0, 1]).
We will now formulate results that tell us which conditions are necessary and/or sufficient for u ∈ BV (R 2 ) to be a minimizer of F 2 . First we address the implications from convex analysis for minimizers of F 2 .
and in addition there exists ξ ∈ BV (R 2 ) * such that Proof. Following Lemma 3.1, we consider the consequences of 0 ∈ ∂ F 2 (u 0 ), focusing on each of the three terms separately. The continuity of F 2 ensures that even though ζ BV (R 2 ;[0,1]) is not continuous with respect to the topology on BV (R 2 ) we can still use [10, Chapter I, Proposition 5.6] to compute the subdifferential of each term separately and then add them to find ∂ F 2 (u 0 ). The subdifferential of the functional BV (R 2 ) → R : u → R 2 |u x | + |u y | was analyzed in Lemma 3.1 where we found that χ ∈ ∂( · a • ∇)(u 0 ) if and only if Since the second term in F 2 , i.e. the functional BV (R 2 ) → R : u → R 2 |1 − f | − |f | u, is Gâteaux differentiable, we find (cf. [10, Chapter I, Proposition 5.3]) that its subdifferential is the singleton {(|1 − f | − |f |)}. To be precise, the subdifferential at u 0 has exactly one element ψ which satisfies, for all u ∈ BV (R 2 ), Turning to the last term in F 2 , we have  Adding the three computed subdifferentials gives the result.
from Lemma 5.2 does not allow us to conclude that ∇ * v = −div v ∈ L ∞ (R 2 ), as we could conclude in Theorem 3.2.
We now turn to a sufficient condition for u to be a minimizer which can be adapted to deal with binary u as well. Here more regularity on the vector field is required as explained in the next lemma, which is based upon [12, Proposition 3.3] and [5, Lemma 5.5].
If the inequality in (18) Finally, if the inequality in (18) is strict it follows by the computation above that the inequality in (17) is strict as well and hence by Lemma 5.4 f is the unique minimizer of F 2 over BV (R 2 ). Because f is binary all its super level sets are the same and it follows by Lemma 5.1 that f is the unique minimizer of F 1 .
Note that, as expected, the condition on λ we found for minimizing F 1 , i.e. λ ≥ div v L ∞ (R 2 ) , implies condition (18). One could ask whether the new condition on λ is weaker in practice than the old one. This is only the case if div 6 Minimizers of F 3 Lemma 6.1. Let K be as in (2). If u 0 ∈ BV (R 2 ) is a minimizer of F 3 over BV (R 2 ), then there exists a vector field v ∈ V(u 0 ) such that for all w ∈ BV (R 2 ), Proof. The proof is almost identical to the proof of Lemma 3.1. Here we point out the differences. The subdifferential of the anisotropic total variation is computed as before and hence the existence of a v ∈ L ∞ (R 2 ; R 2 ) satisfying condition 1 in (3) follows as before. For the subdifferential of the fidelity term at u 0 , we find Since Ku 0 ∈ L 1 (R 2 ), the condition and choose u = ±w + u 0 in the second inequality in the right hand side above. Then we compute for all w ∈ BV (R 2 ). The scaling arguments following equation (8) in the proof of Theorem 3.2 now follow as before and we find that as a distribution ψ ∈ BV (R 2 ) * can be represented by ψ ∈ L ∞ (R 2 ), −div v + λψ = 0 as distributions, and by density of . Combining this with the first inequality in (20) gives (19). From here on all the arguments follow as in the proof of Lemma 3.1.
Remark 6.2. It is noteworthy that (19) implies λ ≥ div v L ∞ (R 2 ) as follows: By |w| and hence (19) allows us to repeat the argument in (12). Theorem 6.3. Let u 0 ∈ BV (R 2 ) and K be as in (2). Define f := Ku 0 . Then u 0 is a minimizer of F 3 over BV (R 2 ) if and only if there exists a vector field v ∈ V(u 0 ) such that, for all w ∈ BV (R 2 ), (19) holds. Moreover, if such a vector field exist and the inequality in (19) is strict, then u 0 is the unique minimizer of F 3 over BV (R 2 ).
Proof. By Lemma 6.1, it suffices to prove that if the vector field v exists then u 0 is a minimizer of F 3 over BV (R 2 ). Let u ∈ BV (R 2 ). By Corollary A.4 we have Also note that by condition 3 in (3) we have F 3 (u 0 ) = − R 2 u 0 div v. Using these results we find The inequality follows directly from (19). Finally, if the inequality in (19) is strict the above inequality is strict: The following lemma shows that if the linear operator K is given by convolution with a blurring kernel/PSF, condition (19) is satisfied for λ sufficiently large.
The last inequality is strict if λ > div v L ∞ (R 2 ) .
When we compare Theorem 6.3 to Theorem 3.2, we see that there are two differences in the conditions on the vector field v: (i) For Theorem 6.3 condition 3 in (3) involves z instead of f , as it did for Theorem 3.2; (ii) The combined condition (19) on λ and K from Theorem 6.3 is stronger than the condition on λ we had before in Theorem 3.2. Hence with Lemma 6.4 in mind, we can transfer all the results in Section 4 which we derived for F 1 with f ∈ B ω to F 3 with f = k * z for z ∈ B ω . In particular, we have Theorem 6.5.
1. Let z ∈ B ω , let Ku := k * u for u ∈ BV (R 2 ) and a nonnegative k ∈ L 1 (R 2 ) satisfying R 2 k = 1, and let f = k * z. If λ ≥ 4 ω , then z is a minimizer of F 3 over BV (R 2 ). If the inequality is strict, then z is the unique minimizer of F 3 over BV (R 2 ).
Remark 6.6. It is interesting to note in Theorem 6.5 that the conditions on λ for recovery of the bar code do not depend on properties of the blurring/deblurring kernel k. In [8], we considered the problem of deblurring of 1D bar codes. In 1D there is no difference between our anisotropic and the regular isotropic total variation, however we did employ an L 2 instead of L 1 fidelity term. While our main focus was for deblurring and blurring kernels of different size, a corollary of our results was that when the two coincided (analogous to F 3 ), the functional was faithful to the clean bar code for blurring kernels with modest supports (on the order of the X-dimension). Numerical results suggested that this bound on the size of the blurring kernel was not optimal. Our Theorem 6.5 can readily be adopted to 1D barcodes, showing that regardless of the support size of the kernel (or the standard deviation of an infinitely supported kernel), deconvolution with the same blurring kernel always recovers the barcode for λ ≥ 2 ω (via the vector field construction of Theorem 4.1 reduced to 1D) when using an L 1 fidelity term. However, as we note in Section 8, this threshold value for λ is very sensitive to noise and indeed this sensitivity grows with the support size of the blurring kernel.

Numerical Implementation
We have numerically tested the performances of the convex functionals F i . As convex functionals, their minimization can be approximated as finite dimensional convex optimization problems and global minimizers of these problems can be found using standard software packages. We chose such an implementation because it allows us to find global minimizers without writing custom algorithms for each functional. This allows for convenient comparison and experimentation with different functionals. In some cases, other options are available. For example, for the functional F 1 gradient descent methods on a regularized functional can be used as in [5], [6]. We are not aware of direct methods for F 3 .
The discretization is obtained by using standard forward finite differences and quadrature. Because of the particular form of the anisotropic total variation, each of the problems can be reformulated as a linear program, which is usually more tractable than a general convex optimization problem. By comparison, the standard isotropic total variation involves a term of the form u 2 x + u 2 y which does not allow for a linear program reformulation, and leads to a more challenging optimization problem.
Let us show how to discretize and reformulate F 3 as a linear program -the functionals F 1 and F 2 can be discretized and minimized in a similar manner (in fact F 1 is a special case of F 3 , where the convolution matrix is the identity). For a given small parameter h, approximate the minimizer u(x, y) by a suitable (piecewise linear or piecewise constant) interpolation u h (x, y) of a grid function U , defined on the grid with spacing h, where i, j = 1, . . . , N . Here u(ih, jh) = U i,j . Next, approximate the terms u x and u y by finite For the purpose of building matrices for the linear operators, reindex U as a column vector of length N 2 . Write D x , D y for (h times) the matrices which correspond to the finite difference operators above. Similarly, the convolution operator can be approximated by a convolution with a discrete kernel K h . Let M k represent the matrix of the convolution operator, also scaled by the factor h. The matrices D x , D y , M k each have N 2 columns. They have different numbers of rows. Denote the number of rows for each matrix by m 1 , m 2 , m 3 , respectively. Let F be the grid function which corresponds to a blurred and noisy vector and represent F as a column vector of length m 3 . For simplicity, we use piecewise constant quadrature to approximate the integrals in the operator F 3 , and, after dropping a factor of h, we are left with a fully discrete convex function of the grid function U .
To formulate the equivalent linear program, define the matrix and vector Here 0 x , 0 y are column vectors of zeros of length m 1 , m 2 , respectively. Using this definition, we can rewrite F h 3 (U ) = M U − b 1 . Next, changing notation slightly, (the following notation applies just to this section) we show how to recast the problem as a linear programming problem. Define the new variable y t = (x + t , x − t , x t ), where x + , x − are column vectors in R N 2 , and the vector e t = (1, . . . , 1) ∈ R N 2 . Then consider the equivalent problem minimize y∈R 3N 2 obtained by splitting M x − b into a positive and negative part and summing their 1-norms. This is a linear program: It involves the minimization of the linear function (c t y), for c = (e t , e t , 0), subject to a linear equation, and non-negativity constraints on a subset of the variables. We performed the minimization using two convex optimization packages, CVX, and MOSEK. Both are callable from MATLAB. The first, CVX [18,17], is a package for specifying and solving general convex optimization problems. The second, MOSEK [21], requires the user to reformulate each problem as a standard form optimization problem (in this case as a linear program), but it is able to solve larger problems. For the problems we presented, CVX was able to solve the problem in a few seconds for F 1 , F 2 and a few minutes for smaller instances of F 3 , MOSEK was able to solve the small instances of F 3 in seconds, and the largest problems in a few minutes.

Numerical Results
We compared the performance of the functionals F i applied to a number of 2D bar codes corrupted by some combination of noise and blurring. To produce blurred and noisy images, convolution with blurring kernels of variable sizes was followed by additive Gaussian noise. The objective was to test the conditions for which bar codes could be recovered and to compare the performance of the various functionals, not to optimize the numerical implementation. The bar code images had a resolution of either 8 × 8 or 12 × 12 pixels per square of size ω × ω. That is, we choose h = ω/p where p is the number of pixels per square. In all cases, we give values for the dimensionless fidelity parameter λ := λh.

Noisy Images
We begin with a comparison of performance of the two functionals, F 1 and F 2 on noisy images. We show the same noisy bar code denoised with the two methods ( Figure 3). In each run we used λ = 75 for F 1 and λ = 2 for F 2 . In contrast to F 1 , minimization of F 2 resulted in a binary image (even without taking a level set 3 ). We found that the functional F 2 was robust under fairly large noise and indeed, the reconstruction is nearly perfect even in the presence of significant amounts of Gaussian noise. The performance of F 2 is superior to F 1 as it retains both the shape and the binary character of the image. We note that simply thresholding the result of F 1 can introduce additional errors.

Blurred Images
The blurring kernel was chosen to be a piecewise linear hat function on a square. (In one dimension, for a given kernel radius r, the kernel is the normalization of the pixel function 2, signal to noise ratio 18.2 dB). The reconstruction using F 1 is patchy. The reconstruction with F 2 is nearly perfect. Second row: Gaussian noise (amplitude a = .35, signal to noise ratio 7.0 dB). The reconstruction using F 1 has deteriorated further. The reconstruction with F 2 is correct, except for a few switched pixels. with values (1, 2, . . . r, . . . 2, 1). The two-dimensional kernel is a Cartesian product of that one-dimensional kernel with itself.) In each run, we used λ = 10 for F 3 . The first result ( Figure 4) is for the bar code with a single square, blurred with a kernel whose radius is equal to the width of the square and with Gaussian noise added. The reconstruction has close to the correct shape, but has lost some contrast. The second result involves blurring without noise ( Figure 5). The third result ( Figure 6) involves two different kernels. After thresholding, the recovered image is quite close to the original even in the presence of substantial noise. For the wider kernel there is some deterioration along diagonal patterns of squares. One might ask how our values for λ compare with the results of our Theorems which basically state that if λω > 4, then the global minimizer is the underlying bar code (i.e. if λ > 4/N , where N is the number of pixels per unit square). Comparison here is delicate because of the presence of noise. First off, there is an upper limit to acceptable values of λ as large values will bring in too much unwanted fidelity to the noise. More importantly, in all cases we found the lower threshold value for λ to be very sensitive to noise: for a given barcode signal, even adding noise with amplitude 10 −6 increased the threshold value. Thus even in the simulations underlying Figure 5 which involve no externally added noise, numerical round-off errors can account for enough noise to change the threshold for λ. This explains the fact that the threshold required for exact recovery was larger than predicted by Theorem 6.5 (by a factor of 8). A simulation with a narrow kernel, on a smaller image resulted in a critical value of λ which was off by (the smaller) factor of 1.4. We expect that with wider kernels and larger problems, this value can be even more sensitive to noise.
In conclusion, in the presence of a known blurring kernel the functional F 3 followed by thresholding can recover close to the exact bar code for blurring kernels with diameters   Figure 6: Results of F 3 . Each square in the bar code is 12 × 12 pixels. Gaussian noise amplitude a is either .02 or .2. The blurring kernel radius r is 8 or 12 pixels. For the mildest example, the reconstruction is almost perfect. Errors observed include deterioration along diagonal patterns of squares or incorrect placement of the boundaries of squares. However, note that even the last case is not that bad given that the blurring kernel with r = 12 has support spread over two unit squares. about 1.5 times the X-dimension and small noise. Increasing the radius of the kernel or the amplitude of the noise leads to errors in the recovered image. The types of errors included incorrect locations of boundaries of squares, or additional features along squares arranged in diagonal patterns. Not surprisingly, computations of blurred images (not presented) using F 1 or F 2 were less effective at recovering the bar code.

Discussion
We have presented three functionals for the L 1 approximation of signals with anisotropic total variation regularization and applied them to the denoising and deblurring of bar codes. Our analytical results show that the fidelity parameter λ should be chosen above a certain threshold in order to not get the trivial minimizer. When comparing F 1 and F 1 the analytical results in the absence of noise or blurring show that there is only a slight relaxation of the threshold for λ to get the bar code back. The numerical experiments clearly show that in practical situations with noise and blurring, F 1 (or actually F 2 ) is preferred since it has binary output. If the blurring operator K is known, it is even more advantageous to choose F 3 . Our analysis shows that in the absence of noise, but with blurring present, we can recover the clean bar code by using F 3 regardless of the precise form of (the known) K. The numerical experiments also show the best results for F 3 . There are several avenues for future work: • We have shown that the only binary signals that functionals F 1 and F 3 are faithful to are clean 2D bar codes. It would be interesting to extend this result to any signal in L 1 (R 2 ), not just binary ones. Due to the technical difficulties introduced by the restriction on the admissible functions for F 2 we have at present no result for F 2 (or F 1 ).
• The convexification method we used for F 1 is not applicable to the restriction of F 3 to BV (R 2 ; {0, 1}). It would be valuable to see if there is another way to incorporate the binary restraint into F 3 apart from the a posteriori thresholding as applied in the numerical experiments.
• Blind deconvolution: In practice, the blurring kernel underlying the measured signal f may be unknown. Future analytical work could focus on how well F 3 performs if the exact blurring operator is not known, and hence K is only an approximation of the operator hidden in the measured signal f . One possibility could be to first try to determine certain statistics of the unknown convolution kernel (such that its standard deviation), and then use F 3 with a K consisting of convolution with a fixed kernel possessing the same statistics. Within a Gaussian ansatz, blind deconvolution was addressed variationally for 1D bar codes in [11] and a similar approach could also be adopted for 2D bar codes.
• The numerical experiments suggest that, for certain choices of λ, minimization of these functionals works well with both significant noise and blurring. In the case of non-noisy bar codes our theorems give sufficient thresholds for acceptable values of λ but the numerics show that these are sensitive to noise. It would therefore be interesting to explore whether or not one can analyze the dependence of these thresholds on small perturbations of the signal f .
• Nonlocal total variation: A possible alternative regularization term instead of anisotropic total variation is anisotropic nonlocal total variation (cf. [14,15]) |u(x) − u(y)| w(x, y) dx dy for some well chosen weight function w. Nonlocal total variation does not restrict itself to local information, but compares patches from all over the image and hence is well suited to regularize images containing recurring structures, like bar codes. It would be interesting to see what analysis and simulations can tell us about the improvement this would be over the local anisotropic total variation.

A Properties of the anisotropic total variation
In this appendix we collect some properties of our anisotropic total variation (1), most of which follow from the analogous properties of the standard isotropic total variation which defines the space BV (R 2 ): u ∈ L 1 (R 2 ) is in the space BV (R 2 ) iff We start by pointing out that the anisotropic total variation is an equivalent seminorm to the isotropic total variation defined above.
Lemma A.2. There exists a constant C > 0 such that for all u ∈ BV (R 2 ) C u L 2 (R 2 ) ≤ R 2 |u x | + |u y |.
The following approximation lemma allows us to replace C 1 c functions with L ∞ in the definition of our anisotropic total variation. Lemma A.3. Let v ∈ L ∞ (R 2 ; R 2 ) with div v ∈ L ∞ (R 2 ) and |v(x)| ∞ ≤ 1, then there exists a sequence {v j } ∞ j=1 ⊂ C ∞ c (R 2 ; R 2 ) such that as j → ∞, v j * v in L ∞ (R 2 ; R 2 ), and div v j * div v in L ∞ (R 2 ). That is, for all u ∈ L 1 (R 2 ; R 2 ) and w ∈ L ∞ (R 2 ), as j → ∞, In addition, for each j and all x ∈ R 2 , |v j (x)| ∞ ≤ 1.
An immediate corollary to Lemma A.3 is Four important properties of the standard isotropic total variation also hold for the anisotropic total variation: lower semicontinuity, approximation by smooth functions, the co-area formula, and smooth approximation to sets of finite perimeter. The proofs follow those of the isotropic case (cf. [16, Theorems 1.9, 1.17], [13, 5.5 Theorem 1], [2, Theorem 3.42]) with the obvious modifications.