Well-posed Bayesian inverse problems and heavy-tailed stable quasi-Banach space priors

This article extends the framework of Bayesian inverse problems in infinite-dimensional parameter spaces, as advocated by Stuart (Acta Numer. 19:451--559, 2010) and others, to the case of a heavy-tailed prior measure in the family of stable distributions, such as an infinite-dimensional Cauchy distribution, for which polynomial moments are infinite or undefined. It is shown that analogues of the Karhunen--Lo\`eve expansion for square-integrable random variables can be used to sample such measures on quasi-Banach spaces. Furthermore, under weaker regularity assumptions than those used to date, the Bayesian posterior measure is shown to depend Lipschitz continuously in the Hellinger metric upon perturbations of the misfit function and observed data.


1.
Introduction. The Bayesian perspective on inverse problems has attracted much mathematical attention in recent years [11,23]. Particular attention has been paid to Bayesian inverse problems (BIPs) in which the parameter to be inferred lies in an infinite-dimensional space U, a typical example being a scalar or tensor field coupled to some observed data via an ordinary or partial differential equation. Numerical solution of such infinite-dimensional BIPs must necessarily be performed in an approximate manner on a finite-dimensional subspace, but it is profitable to delay discretisation to the last possible moment and consider the original infinitedimensional problem as the primary object of study, since infinite-dimensional wellposedness results and algorithms descend to any finite-dimensional subspace in a discretisation-independent way, whereas careless early discretisation may lead to a sequence of well-posed finite-dimensional BIPs or algorithms whose stability properties degenerate as the discretisation dimension increases. Well-posedness results for Banach U have been established for infinite-dimensional Gaussian priors [23], for Besov priors [8], and for log-concave priors with exponentially thin tails [10]. There is a parallel approach of discretisation invariance, introduced by Lehtinen in the 1990s and advanced by e.g. [13,14], in which the finite-dimensional BIP is the primary object, but care is taken to ensure the existence of a well-defined continuum limit independent of the discretisation. A common assumption in these works is some exponential integrability of the prior, and one purpose of this article is to relax this by permitting the prior to be heavy-tailed in the sense of only having finite polynomial moments of order 0 ≤ p < α for some α < ∞, and to explicitly identify the growth rates in the misfit potential that are permissible in such a setting. This article also permits U to be only a quasi -normed complete space, i.e. a quasi-Banach space.
C(δ, γ) has no well-defined mean, even though it is 'obviously' centred on δ, nor indeed polynomial moments of any order greater than α = 1. Despite this, the Cauchy distribution arises naturally in even quite elementary applications. For example, Cauchy distributions arise naturally from quotients of Gaussian random variables, as in (2). More geometrically, if uniform measure on a circle is projected radially onto any line not passing through the centre of the circle, as in Figure 1, then the image measure is Cauchy. Markkanen et al. [16] have recently reported numerical results on the use of heavy-tailed priors for edge-preserving Bayesian inversion in X-ray tomography, where the seemingly natural choice of a total variation regularisation term cannot be interpreted as a discretisation-invariant Bayesian prior [13]. In a Bayesian context, the use of a heavy-tailed prior model in preference to one with exponentially small tails corresponds to a prior belief that large deviations are not exponentially rare events. For example, in a wavelet basis of L 2 ([0, 1], dx), it is not rare to draw samples with localised large deviations (see Figure 2); physically, these might correspond to inclusions in an otherwise relatively homogeneous material matrix, or edges in a piecewise smooth image. The asymmetry between the two models is starkly illustrated the following information-theoretic calculation of the Kullback-Leibler divergences (relative entropy distances) between a standard normal and a standard Cauchy distribution on R: Thus, the approximation of a heavy-tailed Cauchy prior by a thin-tailed Gaussian prior represents an infinite loss of information. However, asymmetrically, the 'defensive' adoption of a Cauchy prior in place of a Gaussian one represents a mild loss of information, with which one gains access to large deviations that would be exponentially rare in the Gaussian model.
The family of stable distributions generalises both the Cauchy and Gaussian examples. Because the stable family is, by definition, closed under linear combinations of independent members, it is an attractive model for spatial or temporal phenomena that decompose in an additive way over disjoint subsets of space or time. So, for example, a stable distribution is a natural modelling choice for the net external forces imparted on a passive tracer particle in some medium over a time interval: a Gaussian model leads to Brownian motion, whereas other stable models lead to Lévy flights.
Thus, after establishing some background and notation in Section 2, the purpose of this article is twofold: Figure 1. Uniform angular measure on a circle projects radially to give Cauchy measure with width parameter γ on any line at distance γ from the centre of the circle. Section 3 shows how to define quasi-Banach space analogues of heavy-tailed stable distributions via Karhunen-Loève-like random series, and studies their convergence and integrability properties. The usual variance-based arguments cannot be applied directly, but the situation can be repaired using Kolmogorov's three series theorem, and notably the same conditions on the decay of the coefficients suffice for the heavy-tailed stable case as in the Gaussian case.
Section 4 shows that the usual results on the Hellinger well-posedness of BIPs with respect to perturbations of the observed data and the misfit functional (negative log-likelihood) hold in the case of a heavy-tailed prior, under weaker continuity assumptions than those used to date. Non-trivial growth of lower bounds on the misfit functional, which is typically enjoyed in applications, can and should be used to offset growth in other errors and retain well-posedness of the BIP. 2. Background and notation.
2.1. General notation. The setting for the inference problems in this paper will be a real and separable Banach or quasi-Banach space U. Observed data will take values in another real and separable Banach or quasi-Banach space Y. Recall that in a quasi-Banach space the triangle inequality only holds in the weaker form Typical examples of quasi-Banach spaces that are not Banach spaces include the p and L p spaces for 0 < p < 1.
Occasionally, we will need to make reference to an underlying probability space (Ω, F, P) as a common domain of definition for all the R-, U-, and Y-valued random variables of interest. 1[P ] denotes the indicator function of a measurable set or logical predicate P , e.g.
A property will be said to hold almost surely if it fails only on a subset of a measurable set of measure zero, and this will be abbreviated to "a.s." If f : U → R is measurable, then E u∼µ [f (u)] or simply E[f ] denotes the expected value (Lebesgue integral) of f with respect to µ: , where each u j,k = (j + 1) −2 2 −j times a standard Cauchy or normal draw, and ψ denotes the mother wavelet. The plots show 20 i.i.d. samples with J = 10. Theorem 3.4 ensures a.s. convergence in L 2 ([0, 1]) as J → ∞. To enable easy comparisons between plots, the ensemble has been translated and linearly scaled to take values u(x) ∈ [0, 1], and the same random seed is used in each case. Note well the large local deviations in the Cauchy case.
Equality in distribution (equality in law) for random variables u and v will be The set of all Borel probability measures on U will be denoted M 1 (U), and d H denotes the Hellinger metric on M 1 (U), defined by where λ is any σ-finite Borel measure on U with respect to which both µ and ν are absolutely continuous, e.g. λ := µ + ν. By Kraft's inequality [12], the Hellinger topology coincides with the total variation topology; by Pinsker's inequality [20], the Hellinger topology is strictly weaker than the Kullback-Leibler (relative entropy) topology; all these topologies are strictly stronger than the topology of weak convergence of measures. Expected values of square-integrable functions are Lipschitz continuous with respect to the Hellinger metric:

2.2.
Bayesian inverse problems. This paper is concerned with inverse problems of the following form: given spaces U and Y, and a known forward operator G : U → Y, recover u ∈ U from a randomly corrupted observation y ∈ Y of G(u). A simple example is an inverse problem with additive noise, e.g.
where η is a draw from a Y-valued random variable; crucially, we assume knowledge of the probability distribution of η, but not its exact value. Inverse problems are typically ill-posed in the sense of having no solution, or multiple solutions, or solutions that depend sensitively upon the observed data y. While there is a long tradition dating back to Tikhonov [24] and others of addressing such problems using regularisation, the Bayesian approach [11,23] is to interpret both u and y as random variables, and relations such as (5) as defining the conditional distribution of y given u. First, one must posit prior beliefs about u independent of y in the form of a prior distribution µ 0 ∈ M 1 (U). Then, the Bayesian inverse problem (BIP) is to compute the posterior distribution µ y ∈ M 1 (U), i.e. the conditional distribution of u given y. Naturally, one hopes to do this through an appropriate version of the Bayes formula, e.g. for probability densities with respect to Lebesgue measure on R n , in the case dim U = ∞, in which there is no canonical choice of reference measure such as Lebesgue measure, this formula must be treated with some care.
As observed by [23, Section 6.6], the correct statement of the Bayes formula when µ 0 is supported on an infinite-dimensional parameter space U is that the posterior µ y has a probability density (Radon-Nikodým derivative) with respect to µ 0 , and this density is proportional to the conditional probability density of y|u. It is both mathematically and computationally convenient to express this relationship in exponential form. That is, Φ : U × Y → R will denote the misfit or negative log-likelihood, meaning that, under the hypothesis that u ∈ U is 'correct', the probability distribution of y|u is where is some σ-finite reference measure on Y; it is implicitly assumed that y|u is absolutely continuous with respect to for every u ∈ U.
In this setting, the generalised Bayes formula is However, care must still be taken to check that this formula does define a probability measure µ y on U; in particular, the normalisation constant Z(y) must be strictly positive and finite, and verifying this property for the stable priors µ 0 of interest in this paper is the business of Theorem 4.3.
If dim Y is infinite and η ∼ N (0, Σ) is a Gaussian random variable on Y with Cameron-Martin space ran(Σ 1/2 ), then this Φ is a.s. infinite since y / ∈ ran(Σ 1/2 ) a.s. It is then necessary to 'subtract off the infinite part of Φ' by using the Cameron-Martin formula for translations of η [23, Remark 3.8].

Stable distributions.
Stable distributions have been studied extensively in the statistical and probabilistic literature. A random variable u is stable if, whenever u 1 , . . . , u n are independent copies of u and a 1 , . . . , a n > 0, The random variable is strictly stable if this holds with d = 0 for all choices of the a i . This relation can be made more quantitatively precise: Equivalently, in terms of the law µ of u and the rescaling µ n (E) : Stability is a particularly appealing property if the aim is to construct prior measures for BIPs that are 'physically consistent' in the sense of remaining in the same model class regardless of discretisation or coordinate choices, at least when the 'physical quantity' obeys an additive law. 1 Example 2.2. Suppose that the aim is to model (and later infer, in a Bayesian fashion) the distribution of electrical charge in some domain Ω ⊆ R 3 . For computational purposes, Ω is approximated by a triangulation T . Consider two elements The charge density charge(T i )/ volume(T i ) behaves similarly. Thus, we remain in the same stable model class if we coarsen or refine the mesh T ; this would not be true for an unstable random model of the charge, and this would complicate computational modelling in an undesirable fashion.
Stable and strictly stable distributions on Banach spaces U, and indeed on locally convex topological vector spaces, can be defined in the same way as in the univariate case, by reference to sums of independent copies or convolutions of their laws [3,Section 4.2]. It can be shown that µ ∈ M 1 (U) is stable of order α precisely when all of its finite dimensional projections are stable of order α, and if all one-dimensional projections of µ are strictly stable of order α, then so is µ. These facts motivate further examination of stable distributions on R.
One further theoretical argument in favour of modelling using stable random variables, particularly from a limiting mesh refinement point of view, is that the stable distributions are precisely the central limits of independent and identically distributed random variables: Theorem 2.4 (Generalised central limit theorem: [17, Theorem 1.20]). A nondegenerate random variable u is S(α, β, γ, δ; 0) if and only if there is a sequence of i.i.d. random variables x 1 , x 2 , . . . and constants a n > 0, b n ∈ R such that b n + a n n i=1 x i converges in distribution to u as n → ∞. The literature contains many further application-specific arguments for or against the use of stable distributions in optimisation and inference. [18] gives a general perspective on modelling with heavy-tailed distributions. [21] treats applications to physics, biology, and electrical engineering, particularly for the modelling of signals and noises with occasional sharp spikes or bursts, as in Figure 2. [25,1] treat applications to communications and image processing, while [26] discusses applications to economics. [9] discusses applications to optimisation.
Of particular relevance to this article is the recent work of [16], which proposes the use of heavy-tailed priors for edge-preserving (i.e. non-smoothing) Bayesian inversion in X-ray tomography; in essence, a Cauchy prior is placed on the gradient of the image to be reconstructed, thereby allowing for jump discontinuities in the image. One objective of this article is to provide a well-posedness theory in the style of [23] to underwrite the numerical investigations of [16].
3. Karhunen-Loève expansions for stable distributions on quasi-Banach spaces. Now consider the problem of constructing and sampling heavy-tailed stable probability measures on a real quasi-Banach space U, for example a vector space of summable sequences or a Sobolev space of fields of specified smoothness. Supposing that one already has access to a generator of real-valued stable random variables [5], it is natural to try to realise a U-valued stable random variable via an infinite random series of the form (10) u where the ψ n are a basis for U and the u n are R-valued stable random variables; this is the strategy used to generate the examples shown in Figure 2. The natural question is, when does (10) define a bona fide U-valued random variable?
The Gaussian case is a useful reference point. Suppose that C is a positivesemi-definite and self-adjoint operator on a Hilbert space U with an eigensystem (λ n , ψ n ) n∈N , and that (λ n ) n∈N ∈ 1 , i.e. C is a trace-class operator. Then the series (10) with u n ∼ N (0, λ n ) -i.e. with u n = λ 1/2 nûn withû n ∼ N (0, 1) -converges a.s., and is a draw from the Gaussian measure N (0, C) on U with covariance operator C. Similar expansions with different powers of λ n andû n having density proportional to exp(−|û n | p ) are used to define draws from Besov measures [8].
However, the focus here is on u n with heavy tails, so the usual variance-based arguments that are used to prove a.s. convergence of the series (10) will not be applicable. However, Theorem 3.4 below shows that the series (10) indeed converges almost surely in U under the assumption that the scale parameters γ n of the stable random coefficients u n are α-summable, modulo a logarithmic correction term in the case α = 1. The proof of this rests on the following result, which is a synthesis of two classical results from probability theory, and gives a necessary and sufficient condition for the convergence of random series: Theorem 3.1 (Kolmogorov's zero-one law and three series theorem). Let (x n ) n∈N be a sequence of independent R-valued random variables. Then the series n∈N x n either converges a.s. or diverges a.s, and a.s. convergence holds if and only if, for some A > 0, the following series are all finite: Definition 3.2. Let U be a real quasi-Banach space with countable, unconditional, normalised, Schauder basis (ψ n ) n∈N . Let α ∈ (0, 2], β = (β n ) n∈N ⊂ (−1, 1), γ = (γ n ) n∈N ⊂ R + , and δ = (δ n ) n∈N ⊂ R. Let u n ∼ S(α, β n , γ n , δ n ; 0) be independent for each n ∈ N. Then we shall say that u := n∈N u n ψ n is a stable U-valued random variable and write u ∼ S(α, β, γ, δ; 0). Theorem 3.4 will justify the terminology of Definition 3.2 by showing that, under suitable summability conditions on γ and δ, u ∼ S(α, β, γ, δ; 0) is indeed a welldefined U-valued random variable. First, it is necessary to make an assumption on the geometry of the basis (ψ n ) n∈N . Assumption 3.3. The basis (ψ n ) n∈N and q > 0 are such that the synthesis operator S ψ : v := (v n ) n∈N → n∈N v n ψ n is a continuous embedding of the sequence space q of coefficients into U, i.e.
When U is a Banach space, Assumption 3.3 holds with q = 1 for any choice of basis (ψ n ) n∈N , since it is just the triangle inequality for an unconditionally convergent series in U. Since 0 < p ≤ q ≤ ∞ =⇒ · q ≤ · p , whenever (11) holds for q it also holds with q replaced by any p ∈ (0, q]. If inequality (11) can be reversed, possibly with a different constant, then the basis (ψ n ) n∈N is known as a q-frame for U [6]. The case q = 2 is the well-known notion of a Riesz basis. Theorem 3.4 (Well-definedness of U-valued stable random variables). Let u ∼ S(α, β, γ, δ; 0) with α ∈ (0, 2), β ⊂ (−1, 1), γ ∈ α , δ ∈ q and, in addition, Then u ∈ U a.s.
Proof. For each n ∈ N, letû n ∼ S(α, β n ; 0), so that u n If δ ∈ q , then the dominated convergence theorem implies that (deterministic) first sum on the right-hand side converges to 0 as N, M → ∞. Therefore, it remains only to show that the assumptions on γ are sufficient to ensure that the (random) second sum on the right-hand side converges a.s. to 0 as N, M → ∞; it will then follow that the partial sums of u are a.s. Cauchy in the quasi-Banach space U, and hence a.s. convergent to a well-defined limit in U.
To that end, it will be shown that n∈N |γ nûn | q converges a.s. in R. Let A > 0 be large enough that the asymptotic properties (8) and (9)  where C depends only on α and β n . Since γ ∈ α , the series n∈N P |γ nûn | q > A is convergent. By (9), for p = 1, 2, the truncated p th moments of |γ nûn | q satisfy where C depends on α, β n , p, and A but is independent of γ n . The assumptions on γ ensure that these truncated moments are both summable over all n ∈ N for p = 1 and p = 2. Therefore, Theorem 3.1 implies that n∈N |γ nûn | q converges a.s. in R, and so (10) converges a.s. in U.
Example 3.5. Suppose that U is a Banach space (so we may take q = 1, but perhaps no greater). If the coefficients u n in (10) are independent Cauchy random variables, u n d = γ nûn ∼ C(0, γ n ) = S(1, 0, γ n , 0; 0), then the truncated moments are Consistent with Theorem 3.4, the corresponding three series all converge if γ 1 and [γ] log are finite, and in particular if γ n = O(n −r ) for some r > 1. When this convergence holds, the random series (10) converges a.s. in U, and thereby defines a U-valued Cauchy random variable.
This example also shows that Theorem 3.4 is sharp: for U = 1 with its standard Euclidean basis (ψ n ) n∈N , with u n ∼ C(0, γ n ), Condition (12), requiring in this case that the Orlicz-type quantity [γ] log be finite, cannot generally be weakened to just requiring that γ ∈ 1 . For example, for γ n := n −1 (log n) −2 , the integral test reveals that n≥2 |γ n | < ∞ but n≥2 |γ n log γ n | = ∞; in this situation, summability of the truncated first absolute moments of the coefficients γ nûn is no longer assured. However, for polynomial γ, the 1 and log criteria do coincide: for γ n = Cn −r , γ 1 is finite once r > 1, and then [γ] log is also finite.
It is worth noting in passing that, like Gaussians, infinite-dimensional Cauchy distributions of this type satisfy a Cameron-Martin-type theorem. It follows from [3, Theorem 5.2.1 and Example 5.2.3] that the law of u with u n ∼ C(0, γ n ) is mutually equivalent with the law of the shifted random variable v with v n ∼ C(h n , γ n ) precisely when (h n /γ n ) n∈N ∈ 2 . This Hilbert shift quasi-invariance space also coincides with the domain of Fomin differentiability for the law of u.
Remark 3.6. An immediate consequence of the stability of each of the coefficients u n in the basis {ψ n } n∈N is that U-valued random variables in the sense of Definition 3.2 are stable in the general sense of e.g. [3,Section 4.2].
Remark 3.7 (Values in Hilbert scales). Suppose that U is a Hilbert space and (ψ n ) n∈N is an orthonormal basis or normalised Riesz basis (2-frame) of U. Theorem 3.4 shows that u ∼ S(α, β, γ, δ; 0) takes values in U a.s. when γ ∈ α and δ ∈ 2 . If, say, U = L 2 (D) for some domain D ⊆ R d , then this Hilbert setting offers an easy way to have u a.s. take values that are fields of specified smoothness by the wellestablished technique of a Hilbert scale [4]. For a positive-definite bounded linear operator C on U, the scaled space U s is defined to be the completion of U with respect to the inner product u, v U s := C −s u, C −s v U . A standard example is that C = (−∆) −1/2 , which generates the scale of Sobolev spaces on D. If the basis (ψ n ) n∈N is taken to be the eigenbasis of C with eigenvalues (λ n ) n∈N in decreasing order and tending to 0, then and u ∈ U s a.s. when (γ n /λ s n ) n∈N ∈ α and (δ n /λ s n ) n∈N ∈ 2 .
The final objective of this section is to show that u ∼ S(α, β, γ, δ; 0) has finite fractional lower-order moments E u q U for 0 < p < α, as in the real-valued case.
Theorem 3.8 (p th -mean convergence and fractional lower-order moments). Let u ∼ S(α, β, γ, δ; 0) satisfy the assumptions of Theorem 3.4, and suppose that (ψ n ) n∈N satisfies (11) for some q > 0. Let 0 < p ≤ q and p < α. Then N n=1 u n ψ n → u in L p (Ω, P; U) as N → ∞ and, in particular, Proof. To save space, u L p := E u p U 1/p denotes the quasinorm in L p (Ω, P; U).
where the inequality follows from the generalised triangle inequality for the quasinorm in L p (Ω, P; U) and the equality follows from δ being a deterministic sequence. By Assumption 3.3, N n=M +1 δ n ψ n U ≤ C (δ n ) N n=M +1 q ; since δ ∈ q , the dominated convergence theorem implies that the first term on the right-hand side of the previous display tends to zero as M, N → ∞, so now we consider the second, random term.
Since γ ∈ α , the dominated convergence theorem implies that the right-hand side tends to zero as M, N → ∞. Thus, the partial sums of the series n∈N u n ψ n are Cauchy in the quasi-Banach space L p (Ω, P; U), which implies that they converge to u ∈ L p (Ω, P; U). The estimate (16) follows from the above and Fatou's lemma.
The next section uses Theorem 3.8 in the form that, when µ 0 is the law of an α-stable u, exp(p log · U ) ∈ L 1 (U, µ 0 ) for 0 < p < α. Remark 3.9. Other series representations of stable Banach-valued random variables are possible. In particular, [15, Sections 5.1 and 5.2] uses series with random coefficients coming from the jumps of a Poisson process and the spectral measure of the random vector, and provide estimates for the strong and weak L p (Ω, P; U) norms of the induced random variable.

Well-posedness of Bayesian inverse problems on quasi-Banach spaces.
This section establishes conditions for the BIP with an arbitrary prior µ 0 to be wellposed in sense that, for each y ∈ Y, the posterior distribution µ y of u|y is a welldefined probability measure on U (Theorem 4.3), which changes continuously when either the observed data is perturbed to y ≈ y (Theorem 4.4) or the misfit function is perturbed to Φ N ≈ Φ (Theorem 4.6). It is natural to seek robustness of the BIP to such perturbations: a perturbation of y to y may arise through observational error, whereas a perturbation of Φ to Φ N may arise through a numerical approximation of the forward model (e.g. a PDE solution operator) G by a numerical solution operator G N . As in the earlier works following [23], the mapping y → µ y is shown to be ( · Y , d H )-Lipschitz, and the convergence µ y N → µ y in d H inherits the same convergence rate as the convergence Φ N → Φ, so that the numerical analysis of the forward problem transfers to the BIP.
A notable feature of the results presented in this section -like all well-posedness results in the style of [23] -is that a careful tradeoff of growth rates of Φ is necessary in order to ensure well-definedness and well-posedness of the Bayesian posterior measure µ y . Indeed, this tradeoff is a desirable feature, since 'good' behaviour of one growth rate can be used to compensate for 'bad' behaviour of another. In the case of a heavy-tailed prior µ 0 , this tradeoff can be a particularly delicate task, since the class of integrable functions may be quite small.
The results of this section are not particular to stable heavy-tailed priors, and the relaxed regularity assumptions used here provide additional understanding of the previously-studied Gaussian and Besov cases, in which well-posedness holds even when M 1,r (t) → −∞ at a polynomial rate as t → ∞. The proof strategies used here are very similar to those of [23,Section 4] and [7,Section 4]. Although the results of [7,Section 4] have a similar level of generality in terms of Φ (modulo continuity/measurability assumptions), they are only explicitly applied there to uniform, Gaussian, and Besov priors on Banach spaces. Thus, the stable case considered here broadens the set of applications and Example 4.8 later on elucidates that logarithmic growth rates are appropriate for stable priors, cf. quadratic rates for Gaussian priors. The relaxation of the usual assumption that U and Y are Banach spaces to allow them to be quasi-Banach space appears to be new, even though it introduces no significant complications in the proof.
Assumption 4.1. U and Y are separable quasi-Banach spaces over R and the misfit function Φ : U × Y → R satisfies the following: (A0) Φ is a locally bounded Carathéodory function, i.e. Φ(u; · ) is continuous for each u ∈ U, Φ( · ; y) is measurable for each y ∈ Y, and for every r > 0, there exists M 0,r ∈ R such that, for all (u, y) ∈ U × Y with u U < r and y Y < r, (A1) For every r > 0, there exists a measurable M 1,r : R + → R such that, for all (u, y) ∈ U × Y with y Y < r, (A2) For every r > 0, there exists a measurable M 2,r : Furthermore, for each N ∈ N, Φ N : U × Y → R is an approximation to Φ that satisfies (A0)-(A2) with M i,r independent of N , and such that (A3) Ψ : N → R + is such that, for every r > 0, there exists a measurable M 3,r : R + → R + , such that, for all (u, y) ∈ U × Y with y Y < r, Remark 4.2. Assumptions (A0)-(A3) have been re-ordered relative to their counterparts in earlier works, such as [23,8]. The numbering and placement of (A0) (usually assumptions 2 and 3 in the previous works) highlights its role as a mild measurability assumption, so that (A1)-(A3) (usually assumptions 1, 4, and 5) form a natural sequence of statements about the growth rates M i,r . (A0) is weaker than the corresponding assumptions in previous works, in which it is assumed that Φ is locally Lipschitz continuous [23, Assumption 2.6] or continuous [7,Assumptions 4.2]. However, close inspection of the proofs in those works reveals that continuity is used only in order to ensure that e −Φ( · ;y) is locally µ 0 -integrable, so that it can serve as a density of the non-normalised posterior with respect to the prior. The above assumptions imply that, Φ( · ; y) and e −Φ( · ;y) are locally bounded measurable functions; since µ 0 is a probability measure, this yields the desired local integrability. Furthermore, the separability assumptions on U and Y and (A0) imply, by [2,Lemma 4.51], that Φ(u; y) and e −Φ(u;y) are jointly measurable in (u, y).
However, (A2) remains as a continuity assumption, since this is necessary in order to establish Hellinger continuity of the posterior with respect to y. defines a Borel probability measure µ y on U, which is tight 2 in the sense that µ y (E) = sup{µ y (K) | K ⊆ E and K is compact} for all measurable E ⊆ U.
Proof. As discussed in Remark 4.2, exp(−Φ( · ; y)) is locally integrable with respect to µ 0 . Therefore, by the Radon-Nikodým theorem, setting for each measurable set E ⊆ U defines a countably additive measure ν on U; what remains is to check that ν can be normalised to yield the probability measure µ y , i.e. it is necessary to show that 0 < Z(y) ≡ ν(U) < ∞. Let r > y Y . Then (17); and by (A0). Since µ 0 is a countably additive Borel probability measure, and so it is impossible for all the summands on the right-hand side to vanish. Since at least one of the annuli B n+1 (0; · U ) \ B n (0; · U ) has strictly positive measure, it follows that µ 0 B r (0; · U > 0 once r > 0 is large enough. Hence, Z(y) > 0, and so µ y is a well-defined Borel probability measure on U, with Radon-Nikodým derivative with respect to µ 0 given by (18). In any Polish space, and hence in the separable quasi-Banach space U, every finite-mass measure is tight [2,Theorem 12.7]. Hence, µ 0 and µ y are both tight.
Theorem 4.4 (Perturbation of observed data). Suppose that r > 0 is such that (A0)-(A2) hold with Then there exists a constant C, which may depend on r, S 1,2,r , and the constants and functions in (A0)-(A2), such that, whenever y Y , y Y < r, and Remark 4.5. By Kraft's inequality [12,22], the assumptions of Theorem 4.4 also imply well-posedness on the total variation metric: Proof of Theorem 4.4. First, consider the normalising constant Z(y) as a function of y. Note that (19) implies (17), so 0 < Z(y) < ∞. Furthermore, whenever y Y , y Y < r, which establishes (20). Now, from the definition (3) of d H , where the inequality follows from the algebraic inequality (a + b) 2 ≤ 2a 2 + 2b 2 . For the first term, (19).
For the second term, (20) implies that Thus, d H µ y , µ y 2 ≤ C y − y 2 Y , and taking square roots completes the proof. Theorem 4.6 (Perturbation of likelihood). Let Φ and Φ N satisfy (A0)-(A3), and suppose that, for some r > 0, Then there exists a constant C, which may depend on r, S 1,3,r , and the constants and functions in (A0)-(A3) but is independent of N , such that the posteriors µ y and µ y N , arrived at using the same data y with y Y < r but the misfit functions Φ and Φ N respectively, satisfy d H µ y , µ y N ≤ CΨ(N ). Proof. The proof is very similar to that of Theorem 4.4, and is omitted.
Remark 4.7. It is interesting to note the range of applicability of Theorems 4.3, 4.4, and 4.6 when the prior µ 0 is the probability law of a U-valued α-stable random variable. Under the assumption (11), Theorem 3.8 implies that (17) is satisfied if M 1,r (t) ≥ C − p log t for some constant C and some 0 < p < α, i.e. Φ( · ; y) is permitted to diverge to −∞ at a logarithmic rate controlled by the index of stability of µ 0 . Similarly, (19) is satisfied if 2M 2,r (t) − M 1,r (t) ≤ C + p log t, and (22) is satisfied if 2M 3,r (t) − M 1,r (t) ≤ C + p log t.
Since p < 2, the satisfaction of condition (23) depends crucially on the behaviour of g ± (t) as t → ∞. For example, suppose that the following slowly-growing lower bound and power-law upper bound on G(u) Y hold: Then the BIP is well-posed with respect to y if 2κ − σ − c − ≤ p. Informally, this holds if the lower bounds on Σ −1 and G are far enough from zero compared to the upper bound growth rate κ.
As usual, similar arguments apply to approximation of Φ by Φ N .
Remark 4.9. The question of whether or not µ y depends continuously upon the prior measure µ 0 is a delicate one. First, probability measures on infinitedimensional spaces are highly prone to mutual singularity even when they are related by surprisingly simple operations such as translation or dilation, cf. the Cameron-Martin and Feldman-Hájek theorems. Secondly, it is known that small perturbations of µ 0 in the weak, total variation, or Hellinger topologies can lead to discontinuous changes in posterior expected values of pre-chosen integrands. On the other hand, at least for finite-dimensional U, with respect to the Kullback-Leibler topology, small perturbations in µ 0 lead to small perturbations in µ y . For a more thorough treatment of this highly involved topic, see e.g. [19, Section 1] and the references cited therein.