Approximation of Lyapunov Functions from Noisy Data

Methods have previously been developed for the approximation of Lyapunov functions using radial basis functions. However these methods assume that the evolution equations are known. We consider the problem of approximating a given Lyapunov function using radial basis functions where the evolution equations are not known, but we instead have sampled data which is contaminated with noise. We propose an algorithm in which we first approximate the underlying vector field, and use this approximation to then approximate the Lyapunov function. Our approach combines elements of machine learning/statistical learning theory with the existing theory of Lyapunov function approximation. Error estimates are provided for our algorithm.


Introduction
Ordinary differential equations model large classes of applications such as planetary motion, chemical reactions, population dynamics or consumer behaviour. A breakthrough in the understanding of ordinary differential equations was initiated by Poincaré and Lyapunov in the late 19th century, who developed an approach that embraced the use of topological and geometrical techniques for the study of dynamical systems. A key component of this theory is Lyapunov functions, which can be used to determine the basin of attraction of an asymptotically stable equilibrium.
In general, it is not possible to find an explicit analytical expression for a Lyapunov function associated to a nonlinear differential equation. Many methods have been proposed to numerically construct Lyapunov functions, see [13] for a recent review. These methods include the SOS (sums of squares) method, which constructs a polynomial Lyapunov function by semidefinite optimization [26]. Another method constructs a continuous piecewise affine (CPA) Lyapunov function using linear optimization [16]. A further method is based on Zubov's equation and computes a solution of this partial differential equation [6]. Lyapunov functions can also be constructed using set oriented methods [15]. The method that is also used in this paper is based on approximating the solution of a PDE using radial basis functions [11]. All these methods to approximate Lyapunov functions rely on the knowledge of the right hand side of the differential equation.
In this paper, we develop a method to approximate Lyapunov functions where the right hand side is unknown, but we have sampled data of the system, which is contaminated by noise. We will first approximate the right hand side of the differential equation, and then use this approximation to approximate the Lyapunov function. Our approach combines and develops previous results from statistical learning theory [28,29,30] together with existing methods using radial basis functions [11,14], which use the framework of reproducing kernel Hilbert spaces (RKHS).
The use of RKHS spaces to approximate important quantities in dynamical systems has previously been exploited by Smale and Zhou to approximate a hyperbolic dynamical system [31]. Bouvrie and Hamzi also use RKHS spaces to approximate some key quantities in control and random dynamical systems [4,5].

Setting of the Problem and Main Result
We consider ordinary differential equations of the forṁ where f * : R d → R d is a smooth vector field and dot denotes differentiation with respect to time. We define the flow ϕ f * : We assume that (2.1) has a fixed point x that is exponentially asymptotically stable. Define the basin of attraction as A(x) := {ψ ∈ R d | lim t→∞ ϕ f * (ψ, t) = x}. Note that A(x) = ∅ and A(x) is open. Subsets of the basin of attraction can be determined by the use of Lyapunov functions, which are functions decreasing along solutions of (2.1). We consider two types of Lyapunov functions V and T , as described in Theorems 3.2 and 3.3 below. These Lyapunov functions satisfy where p is a smooth function with p(x) > 0 for x =x and p(x) = 0, and c is a positive constant. The scalar products on the left hand sides are called the orbital derivatives of V and T with respect to (2.1), which are the derivatives of V and T along solutions of (2.1). The orbital derivatives of V and T are negative, which implies that V and T are decreasing along solutions. We assume that the function f * is unknown, but we have sampled data of the form (x i , y i ) in X × R d , i = 1, . . . , m, with y i = f * (x i ) + η x i . We assume that the one-dimensional random variables η k x i ∈ R d , where i = 1, . . . , m and k = 1, . . . , d, are independent random variables drawn from a probability distribution with zero mean and variance (σ k x i ) 2 bounded by σ 2 . Here X is a nonempty and compact subset of R d with C 1 boundary.
In §5 we provide an algorithm to approximately reconstruct the Lyapunov functions V and T by functionsV andT . The following main theorem provides error estimates in a compact set D ⊂ A(x)∩X, which depend on the density of the data, measured by two key quantities: the fill distance of the data h x (see Definition 6.3) and the norm of the volume weights w corresponding to the Voronoi tessellation of the data (see Definition 5.1).
Let Ω ⊂ A(x) be a compact set and D := Ω \ B ε (x) ⊂ X, with ε > 0 small enough so that D = ∅. For h x , ||w|| R m and h q sufficiently small, the following holds: 1. For every 0 < δ < 1, the reconstructionV of the Lyapunov function V defined in Theorem 3.2 satisfies the following estimate with probability 1 − δ: where Ω V ⊃ D is a certain compact subset of A(x), and 1 2 < r ≤ 1.
2. For every 0 < δ < 1, the reconstructionT of the Lyapunov function T defined in Theorem 3.3 satisfies the following estimates with probability 1 − δ: where Γ is a non-characteristic hypersurface on which T has defined values (see Definition 3.4), Ω T ⊃ D is a certain compact subset of A(x), and 1 2 < r ≤ 1.
The main point is that the expressions on the right hand side of (2.2)-(2.4) can be made arbitrarily small as the data density increases and for suitably chosen λ (see equation (6.13)). Therefore the orbital derivative of our Lyapunov function approximationsV andT become arbitrarily close in the infinity norm to those of V and T respectively. Estimate (2.2) implies that the orbital derivative of V will be negative in D (which does not contain a small neighbourhood of the equilibrium x), since where p is a positive definite function (see Theorem 3.2). The analogous statement is true forT , since ∇T (x), f * (x) R d = −c < 0. In principle the neighbourhood B ε (x) can shrink as the data density increases (as h x and ||w|| R m tend to zero).
The above estimate contains λ > 0 as a regularisation parameter of our algorithm, and h q as the fill distance of a set of sampled points in Ω V (resp. Ω T ) of our choosing. Similarly, hq is the fill distance of a set of sampled points on Γ, which we are able to choose. The constants in the above estimates depend on d, σ, the choice of function spaces for approximation and the vector field f * .
The rest of the paper is organised as follows. In §3 we provide the converse theorems for the Lyapunov functions V and T . In §4 we set out the framework for the function spaces that are used to approximate the Lyapunov functions, as well as previous results on the approximation of Lyapunov functions when the right hand side of (2.1) is known. The algorithms themselves that are used to computeV andT are detailed in §5. In §6 we provide an estimate for our approximation of the right hand side of (2.1), which is then used in the proof of Theorem 2.1 in §7.

Converse theorems for Lyapunov functions
The concept of a Lyapunov function dates back to 1893, where Lyapunov introduced these functions for the stability analysis of an equilibrium for a given differential equation, without the explicit knowledge of the solutions [19]. Many converse theorems have been proved that guarantee the existence of a Lyapunov function under certain conditions, see [11,18,21] for an overview. Massera [22] provided the first main converse theorem for C 1 vector fields where A(x) = R d , with further developments by several authors to prove the existence of smooth Lyapunov functions under weak smoothness assumptions on the right hand side (see e.g. [17,20,33]).
The existence of a Lyapunov function for system (2.1) with given values of the orbital derivative has been shown by Bhatia [2,3], as stated in the following theorems (see also [11]). We also refer to [12] for a proof that the conditions on the function p given here are sufficient to define the Lyapunov function V , in contrast to the conditions given in [11]. First we make the following definition.
We assume the system to have an exponentially asymptotically stable equilibrium x. Let p ∈ C ν 1 (R d , R) be a function with the following properties:

Then there exists a Lyapunov function
holds for all x ∈ A(x). The Lyapunov function V is uniquely defined up to a constant.
We may also choose p(x) to be a positive constant in equation (3.1) to obtain a Lyapunov function T , defined on A(x)\{x}, for which lim x→x T (x) = −∞.

Theorem 3.3 Consider the autonomous system of differential equationsẋ
We assume the system to have an exponentially asymptotically stable equilibrium x. Then for all c ∈ R + , there exists a Lyapunov function The Lyapunov function T will be uniquely defined if its values are given on a non-characteristic hypersurface Γ ⊂ A(x) [11] by a function ξ T ∈ C ν 1 (Γ, R); that is,

Definition 3.4 (Non-characteristic hypersurface)
An example of a non-characteristic hypersurface is the level set of a (local) Lyapunov function, see [11]. In what follows we assume that we have chosen a non-characteristic hypersurface Γ together with a function ξ T ∈ C ν 1 (Γ, R) so that T is uniquely defined.

Remark 3.5 The Lyapunov function V is smooth and defined on the entire basin of attraction A(x).
However, its orbital derivative vanishes at the equilibrium x, and therefore estimates that bound the error of the orbital derivative for numerical approximations of V cannot guarantee negative orbital derivative arbitrarily close to the equilibrium x. On the other hand, the Lyapunov function T is not even defined at x and is unbounded near the equilibrium. However, its definition has the advantage that it is not required that we know where the equilibrium is.
In our approach to approximate Lyapunov functions directly from data, we will provide estimates for the approximation of both V and T . The strategy is to first approximate f * from the data, and use this in turn to approximate the Lyapunov function.

Reproducing kernel Hilbert spaces
The function spaces that we use to search for our approximations to both f * and the Lyapunov functions V and T will be reproducing kernel Hilbert spaces (RKHS). For a survey of the main properties of RKHS spaces mentioned in this section, we refer to [7].
In order to define an RKHS function space we first fix a continuous, symmetric, positive definite function (a "kernel") K : X × X → R, and set K x := K(·, x). Define the Hilbert space H K by first considering all finite linear combinations of functions K x , that is x i ∈X a i K x i with finitely many a i ∈ R nonzero. An inner product ·, · K on this space is defined by and extending linearly. One takes the completion to obtain H K .
Alternatively, an equivalent definition of an RKHS is as a Hilbert space of real-valued functions on X for which the evaluation functional δ Finite dimensional subspaces of H K can also be naturally defined by taking a finite number of points x := {x 1 , . . . , x m } ⊂ X and considering the linear span In practice we will seek functions in these finite dimensional subspaces as approximations for f * .
Within these Hilbert spaces the reproducing property holds: If we denote κ := sup x∈X K(x, x), then H K ⊂ C(X) and it follows that The RKHS H K can also be defined by means of an integral operator. Let ρ be any (finite) strictly positive Borel measure on X (e.g. Lebesgue measure) and L 2 ρ (X) be the Hilbert space of square integrable functions on X. Then define the linear operator L K : When composed with the inclusion C(X) ֒→ L 2 ρ (X) we obtain an operator from L 2 ρ (X) to L 2 ρ (X), which we also denote by L K . L K is then a self-adjoint compact operator, and it is also positive if the kernel K is positive definite. Also the map defines an isomorphism of Hilbert spaces. L 1/2 K is well defined as an operator on L 2 ρ (X) in the sense that

Sobolev space RKHS
In this paper we will work with reproducing kernel Hilbert spaces that are Sobolev spaces. Given the consists of all functions f with weak derivatives D α f ∈ L p (B), |α| ≤ k. We also use the following notation to define the (semi-)norms Approximation of Lyapunov functions from noisy data 7 With p = ∞ the norms are defined in the natural way: We will also use fractional order Sobolev spaces. For a detailed discussion see e.g. [1]. The Sobolev embedding theorem states that for τ > d/2, W τ 2 (R d ) can be embedded into C(R d ), and therefore it follows that W τ 2 (R d ) is a RKHS, using the fact that the pointwise evaluation functional is then continuous. Several kernel functions are known to generate RKHS spaces that are normequivalent to Sobolev spaces [24,25]. We will choose to work with the Wendland functions [34]. These are positive definite, compactly supported radial basis function kernels that are represented by a univariate polynomial on their support.

Definition 4.1 (Wendland function)
Let l ∈ N, k ∈ N 0 . We define by recursion Setting l := ⌊ d 2 ⌋ + k + 1, the Wendland functions are characterised by a smoothness index k ∈ N, and belong to C 2k (R d ). For a domain D ⊂ R d with a Lipschitz boundary, the Wendland function kernel is given by The Wendland function kernel generates an RKHS consisting of the same functions as the Sobolev space W τ 2 (D) with τ = k + (d + 1)/2, with an equivalent norm [35,Corollary 10.48]. Therefore the generated Sobolev space is of integer order when d is odd, and integer plus one half when d is even.
In the second part of our algorithm we approximate the Lyapunov function. For this, we use the RKHS space H K 2 defined on Ω V (resp. Ω T ) corresponding to the Wendland function kernel K 2 with smoothness index k 2 , such that , where τ 2 := k 2 + (d + 1)/2. These function spaces consist of smooth functions as a consequence of the following generalised Sobolev inequality (for a proof, see e.g. [9, Chapter 5.7, Theorem 6]).

Generalised interpolant and approximation theorems
In this section we introduce the generalised interpolant that is used to approximate the Lyapunov functions V and T . Consider a general interpolation setting where Ω ⊂ R d is a bounded domain having a Lipschitz boundary. Let L be a linear differential operator and q := {q 1 , . . . , q M } ⊂ Ω be a set of pairwise distinct points which do not contain any singular points of L. [14].) We define linear functionals where (δ q j • L) y is the linear function applied to one argument of the kernel. The coefficient vector α is the solution of Aα = β = (β i ) with the interpolation matrix A ∈ R M ×M given by

Remark 4.5
According to the above definition, we have Ls(q i ) = β i . In addition, it can be shown that the generalised interpolant above is the unique norm-minimal interpolant in H K , see [11,14]. The matrix A is guaranteed to be invertible due to our choice of the Wendland function kernel K [11, Section 3.2.2], and since q does not contain any singular points.
We conclude this section by citing the following theorem from [14], which provides convergence estimates for approximating Lyapunov functions V g (resp. T g ) with the generalised interpolants s 1 (resp. s 2 ) as above, for a known dynamical systemẋ = g(x). Theorem 4.6 Let ν 2 := ⌈τ 2 ⌉ with τ 2 = k 2 + (d + 1)/2, where k 2 is the smoothness index of the compactly supported Wendland function. Let k 2 > 1/2 if d is odd or k 2 > 1 if d is even. Consider the dynamical system defined by the ordinary differential equationẋ = g(x), where g ∈ C ν 2 (R d , R d ). Let x ∈ R d be an equilibrium such that the real parts of all eigenvalues of Dg(x) are negative, and suppose g is bounded in A(x).
Let Γ be a non-characteristic hypersurface as in Definition 3.4

the Lyapunov functions of Theorems 3.2 and 3.3 for the systemẋ
Given pairwise distinct sets of points q : in Γ, and let Ω ⊂ A(x) be a bounded domain with Lipschitz boundary, with q ⊂ Ω. Let s 1 and s 2 be the generalised interpolants satisfying and let the fill distance of the set of points q in Ω andq in Γ be h q and hq respectively. Then for h q , hq sufficiently small, the following estimates hold: (4.10)

The Algorithm
Here we present the algorithm for which the estimate given in Theorem 2.1 holds. The algorithm is actually split into two parts. The first part computes f z,λ as an approximation to f * (Algorithm 1), and the second part computesV orT as an approximation to the Lyapunov functions V or T respectively given in Theorems 3.2 and 3.3 (Algorithm 2A or 2B). As discussed in §4.2, we will use two RKHS spaces H K 1 and H K 2 for the two parts of the algorithm, corresponding to the Wendland function kernels K 1 and K 2 with smoothness indices k 1 and k 2 respectively. We recall that the smoothness indices are chosen such that τ 1 = k 1 + (d + 1)/2 with ⌈τ 1 ⌉ = ν 1 , and Recall that our sampled data values z : a random variable drawn from a probability distribution with zero mean and variance σ 2 x .
Our approximation scheme for f * employs a regularised least squares algorithm (see e.g. [10] and its references) to approximate each component f * ,k , k = 1, . . . , d. We also introduce a weighting w = {w x i } m i=1 corresponding to the Voronoi tessellation associated with the points {x i } m i=1 [32].
where ρ is the strictly positive Borel measure from §4.1.

Algorithm 1
Fix a regularisation parameter λ > 0, and define D w ∈ R m×m as the diagonal matrix with diagonal elements w x i , i = 1, . . . , m. The approximation f z,λ for f * is constructed componentwise. That is, for each k = 1, . . . , d, we approximate the k-th component f * , may be calculated as the solution to the matrix equation Here We note here that due to our choice of RKHS the matrix A x is positive definite [14,Proposition 3.3], and therefore the matrix (A x D w A x + λA x ) is invertible, as it is the sum of two positive definite matrices. The error in our approximation of f * by f z,λ is studied in §6. We will show that this error may be bounded in the supremum norm on the domain X, depending on the density of the data in X and the choice of regularisation parameter λ.
Once we have our approximation f z,λ , then we construct our Lyapunov function approximation with the generalised interpolant as in Theorem 4.6, where we set g = f z,λ .
Therefore we have the following Algorithm 2A forV , or Algorithm 2B forT . These algorithms involve sampling our approximation f z,λ at a discrete set of points.

Algorithm 2A (Approximation of V ) First run Algorithm 1 on the sampled data set
Define a set of pairwise distinct points q := (q i ) M i=1 , with f z,λ (q i ) = 0 for all i = 1, . . . , M. Here B q ∈ R M ×M is a symmetric matrix defined by (B q ) i,j = λ i,x f z,λ λ j,y f z,λ K 2 (x, y) and p = (p(q i )) M i=1 .

The approximationV ∈ H K 2 for the Lyapunov function V (see Theorem 3.2) is given byV
As before, the choice of RKHS guarantees that the matrix B q will be positive definite, provided that f z,λ (q i ) = 0 for all i = 1, . . . , M.
To approximate T , we assume that a non-characteristic hypersurface for f * has been defined as

Algorithm 2B (Approximation of T ) First run Algorithm 1 on the sampled data set
Define a set of pairwise distinct points q := (q i ) M +N i=1 , with f z,λ (q i ) = 0 for all i = 1, . . . , M, and q i ∈ Γ for i = M + 1, . . . , M + N. The approximationT ∈ H K 2 for the Lyapunov function T

(see Theorem 3.3) is given byT
are computed as the solution to the matrix equation The vector β is given by β i = −c, i = 1, . . . , M and β i = ξ T (q i ), i = M + 1, . . . , M + N.
It may again be shown that the matrix C q,q will be positive definite, providing that f z,λ (q i ) = 0 for all i = 1, . . . , M, for details see [11,Section 3

.2.2].
Our error in the approximationsV andT will depend primarily on the error induced by Algorithm 1, which in turn depends on the density of the data, as well as the regularisation parameter. In addition, there will be an error due to the discrete sampling of f z,λ in Algorithms 2A and 2B. This error will depend on the density of the sample points q (chosen by the user), which in principle can be entirely independent of the original set of points x provided by the data. The overall error is the subject of §7, which will prove the estimate given in Theorem 2.1.

Error estimate for ||f
In this section we estimate the error ||f k z,λ − f * ,k || L ∞ (X) for each k = 1, . . . , d. We have sampled data of the form ( We assume that the one-dimensional random variables η k x i ∈ R d , where i = 1, . . . , m and k = 1, . . . , d, are independent random variables drawn from a probability distribution with zero mean and variance (σ k x i ) 2 bounded by σ 2 . In order to ease notation, and since each f k z,λ is calculated independently for each k, we shall henceforth drop the superscript k and consider the data to be of the form z := (x i , y i ) m i=1 ∈ (X ×R) m . Note that in this section we shall only be working with the RKHS H K 1 .

Definition 6.1 (Sampling Operator) Given a set
The adjoint operator S * x can also be derived as follows. Let c ∈ R m , then we have The final equality follows from the reproducing property (4.1). So then S * The following Lemma shows that the function f z,λ calculated in Algorithm 1 is the minimiser of a regularised cost function. We omit the proof, which is similar to that contained in [29, Theorem 1], except for the introduction of the weights w.

λ exists and is given by
Our strategy to prove convergence of the estimate f z,λ to f * combines and adapts results contained in [28,29,30]. The main difference in our case to the standard assumptions in learning theory is that the data z = (x i , y i ) m i=1 is not necessarily generated from an underlying probability distribution. Instead, our data is generated by the underlying dynamical system (2.1), and the data sites may be situated arbitrarily. They could potentially be chosen deterministically, or indeed may be generated randomly. The assumptions made in [29] correspond to this setting, but we would like to provide estimates in terms of the density of the data. This is why we have introduced the weights (w corresponding to the ρ-volume Voronoi tessellation (c.f. also [28], where a weighting scheme is introduced in a different setting).

Definition 6.3 (Fill distance) Let C ⊂ R d be a compact set and
In particular, for all y ∈ C there is a grid point In order to provide an estimate for ||f z,λ − f * || K 1 , we first define f x,λ , f λ ∈ H K 1 by f x,λ := J(S x f * ) (6.4) and f λ := arg min where ||f || 2 ρ = X |f (x)| 2 dρ(x). Recall from Section 4.1 that ρ is a finite strictly positive Borel measure on X.
The function f x,λ may be seen as a 'noise-free' version of f z,λ , such that in the case of no noise, i.e. η x = 0 for all x ∈ X, then we would have f z,λ = f x,λ . Correspondingly, the function f λ can be seen as a 'data-free' limit of f x,λ , or the limiting function as the data sites x become arbitrarily dense in X. Finally, if f * ∈ H K 1 , then f * is the 'regularisation-free' version of f λ -the limit of f λ as λ → 0.
The strategy is to break down the estimate of ||f z,λ − f * || L ∞ (X) according to and estimate each of the three terms in the inequality. These three terms correspond to errors incurred by the noise (sample error), the finite set of data sites (integration error) and the regularisation parameter λ (regularisation error).

Sample error
An estimate for ||f z,λ − f x,λ || 2 K 1 for an unweighted approximation scheme is given in Theorem 2 of [29]. A similar result is given in the following lemma, that incorporates the weights of our scheme. The proof follows that of [29], and here we will just sketch the main adaptations. Importantly, our estimate provides convergence of f z,λ to f x,λ as the data sites x become more dense, or as the quantity ||w|| R m tends to zero. Lemma 6.4 For λ > 0, for every 0 < δ < 1, with probability 1 − δ, we have where σ 2 := sup x∈X σ 2 x and κ 2 := sup x∈X K 1 (x, x).
PROOF: First note that if λ > 0 then (S * x D w S x + λI) is invertible. This follows since it is the sum of a positive and a strictly positive operator on H K 1 . Similar to [29,Theorem 2], we have Since we have assumed that the η x random variables are independent with zero mean, we have that where the inequality follows from our assumption that σ 2 := sup x∈X σ 2 x i < ∞. Then it follows that The operator (S * x D w S x + λI) −1 is estimated analagously to [29, Proposition 1] to obtain Then we have Finally, for 0 < δ < 1, application of the Markov inequality to the random variable ||f z,λ − f x,λ || 2 Combining the above together with (4.2) proves the lemma.

Integration error
To establish our estimate for the integration error we need to make additional assumptions on the choice of Borel measure ρ. Namely, we will require that it is strongly continuous (c.f. [27]):

Definition 6.5 A Borel measure ρ is strongly continuous if for all hyperplanes
Note that this requirement implies that the boundaries of the Voronoi tessellation have ρ-measure zero. Lebesgue measure is still an example measure that satisfies all of our assumptions, recall Section 4.1. An estimate for the integration error ||f x,λ − f λ || L ∞ (X) is given in the following lemma. Lemma 6.6 Let ρ be a (finite) strictly positive, strongly continuous Borel measure on X. For λ > 0, PROOF: It is shown in [8] that the solution to (6.5) for λ > 0 is given by where L K 1 was defined in equation (4.3).
where the second equality follows from (6.9) and the final equality follows from the definition of S x and its adjoint. Recall that we have chosen the weighting w to be equal to the ρ-volume of the Voronoi tessellation associated to the data sites x -that is, w x i = ρ(V i (x)). Also, since ρ is strongly continuous it holds that The second equality follows from the fact that (f * − f λ ) and K x belong to H K 1 , and are therefore bounded and Lipschitz on X (cf. Corollary 4.3). This, together with (6.7) proves the lemma.

Regularisation error
For the regularisation error we recall the following result from [30, Lemma 3] (c.f. also [29,Theorem 4]): Altogether, from lemmas 6.4, 6.6 and 6.7 and equation (4.2) we have with probability 1 − δ Applying equation (6.10) again yields where the constant C depends on f * , d, σ and the choice of RKHS H K 1 . Now it is clear that the above bound can be made arbitrarily small as ||w|| R m and h x tend to zero, if λ also tends to zero at an appropriate rate. With the choice of regularisation parameter with probability 1 − δ, we obtain the estimate . (6.14)

Proof of Theorem 2.1
Let V ∈ C ν 1 (A(x), R) and T ∈ C ν 1 (A(x) \ {x}, R) be the Lyapunov functions for f * as defined in Theorems 3.2 and 3.3. Then we have with p(x) also defined in Theorem 3.2. Similarly, x ∈ Γ. for c > 0, where Γ = {x ∈ A(x)\{x} | h(x) = 0} is a non-characteristic hypersurface according to Definition 3.4 (with h ∈ C ν 1 (R d , R)), and ξ T ∈ C ν 1 (Γ, R). As stated in Theorem 3.2, the Lyapunov function V is uniquely defined up to a constant. We will fix V by setting V (x) = 0. The Lyapunov function T is uniquely defined according to the above properties.
The following Lemma provides an alternative characterisation of the Lyapunov function V , which will be useful later in the section. (A(x), R) be the uniquely defined Lyapunov function as above, and R)) is a non-characteristic hypersurface according to Definition 3.4 Also recall ϕ f * (t, ·) denotes the flow operator of (2.1), and define the function x ∈ A(x)\{x}.
PROOF: It is shown in [11,Theorem 2.38] that the function θ f * is well-defined and belongs to C ν 1 (A(x)\{x}, R). Also in [11,Theorem 2.46] it is shown that which proves the Lemma.

Remark 7.2 Similarly, the Lyapunov function T has the representation
with ξ T ∈ C ν 1 (Γ, R) as above.
We aim to approximate the Lyapunov functions V and T in a compact subset of the basin of attraction A(x). This subset is given by D := Ω \ B ε (x) for a given ε > 0, where Ω is compact, cf. Theorem 2.1. See Figure 1 for a sketch of these domains.
For the approximation of V , we defineΩ V : Recall that Γ is a non-characteristic hypersurface for f * (and thus also for f z,λ , if f z,λ and f * are sufficiently close in supremum norm). Additionally, we may defineΓ := ϕ f * (T, Γ) with T > 0 sufficiently large so thatΓ ⊂ B ε (x). Note thatΓ is a non-characteristic hypersurface for f * (also for f z,λ ), defined by someh ∈ C ν 1 (R d , R). Then we define the Lipschitz domains Ω V :=Ω V ∩ {x ∈ A(x) |h(x) ≥ 0} and Ω T :=Ω T ∩ {x ∈ A(x) |h(x) ≥ 0} (cf. Theorem 4.6), and note that D ⊂ Ω V and D ⊂ Ω T . Also note that all orbits (for f * and f z,λ ) enter and exit Ω V (and Ω T ) only once.
Our algorithm detailed in §5 computes the generalised interpolant approximationsV andT corresponding to the vector field approximation f z,λ (as in Theorem 4.6 with g = f z,λ ).
Note that for max k ||f k z,λ − f * ,k || L ∞ (X) sufficiently small (recall the superscript k denotes the k-th component), f z,λ does not have any equilibria in Ω V (resp. Ω T ). Similarly, Γ,Γ are both noncharacteristic hypersurfaces for f z,λ , and all trajectories in Ω V (resp. Ω T ) eventually enter (and stay in) the region defined by {x ∈ A(x) |h(x) < 0}.
In fact, for ||w|| R m and h x sufficiently small, f z,λ and f * are even close in a C ν 2 sense, as we will show in the following Lemmas 7.3 and 7.5.

Lemma 7.3
For λ > 0, for every 0 < δ < 1, with probability 1 − δ, we have f k x,λ = arg min We will show that ||f k x,λ || K 1 is bounded independently of (x, λ). To see this, first note that due to the choice of the positive definite Wendland function kernel K 1 , that it is always possible to find a norm-minimal function g k 0 ∈ H K 1 that interpolates the data. That is, g k 0 is the solution to the problem min Therefore we have that ||g k 0 || K 1 ≤ ||f * ,k || K 1 . Now from (7.5) we have the following: In Lemma 6.4 we have provided a bound for ||f k z,λ − f k x,λ || K 1 for a given probability 1 − δ. Then together we find with probability 1 − δ, which proves the Lemma.
We will need the following convergence result in Lemma 7.5.
≤ε. Then by Lemma 4.2 (also using arguments similar to Corollary 4.3 since X is closed), we have (X) and setting ε = Cε proves the Lemma. It follows that for sufficiently small ||w|| R m and h x , that (with probability 1 − δ) f z,λ will have an equilibrium close to x which is also exponentially asymptotically stable. In addition, the noncharacteristic hypersurfaces Γ andΓ for f * will also be non-characteristic hypersurfaces for f z,λ (as will the level sets {x ∈ A(x) | V (x) = R}, resp. {x ∈ A(x) | T (x) = R}). In this case we can define the following 'Lyapunov-type' functions for f z,λ . Definition 7.6 Let ϕ z,λ (t, ·) denote the flow operator for the systemẋ = f z,λ (x). For ||w|| R m and h x sufficiently small, λ > 0 chosen according to (6.13), and 0 < δ < 1, Γ will be a non-characteristic hypersurface for f z,λ with probability 1 − δ. Then the function θ z,λ : Ω V → R given by ϕ z,λ (t, x) ∈ Γ ⇔ t = θ z,λ (x) is well-defined. By a slight abuse of notation we will also similarly define θ z,λ : We define the functions V z,λ : Ω V → R and T z,λ : Ω T → R by where ξ V ∈ C ν 1 (Γ, R) and ξ T ∈ C ν 1 (Γ, R) are as in Lemma 7.1 and equation (7.3) respectively.

Remark 7.8
It follows from the proof of Lemma 7.7 and from (7.7) and (7.8) that provided θ z,λ is well defined (which is guaranteed with probability 1 − δ), we have: x ∈ Ω T . (7.12) We now define a pairwise distinct, discrete set of points q := (q i ) M i=1 ⊂ Ω V (resp. Ω T ). Note that these points need not be the same as x. Let h q be the fill distance of q in Ω V (resp. Ω T ). We compute our approximationsV andT according to our algorithm given in §5. We have (forV , the arguments forT are similar) Then we have, for x ∈ D, where recall that τ 2 := k 2 + (d + 1)/2 is the degree of the Sobolev RKHS H K 2 . The last inequality above follows from Remark 7.8, Theorem 4.6 and D ⊂ Ω V ⊂ X. Recall the superscript k denotes the k-th component of a d-dimensional vector. Now we use an estimate similar to (3.16) from [12, Lemma 3.9]: recall thatV ∈ W τ 2 2 (Ω V ). Then from Corollary 4.3 we have Recall thatV is the norm-minimal generalised interpolant to V z,λ in H K 2 (since V z,λ satisfies (7.11)), and H K 2 is norm-equivalent to W τ 2 2 (Ω V ). Then ||V || W τ 2 2 (Ω V ) ≤ C||V z,λ || W τ 2 2 (Ω V ) . In addition, Lemma 7.7 shows that and so ||V z,λ || W τ 2 2 (Ω V ) ≤ C||V || W τ 2 2 (Ω V ) for sufficiently small ||w|| R m , h x with probability 1 − δ. Then it follows that We may similarly show Combining (6.12) with (7.13)-(7.15) proves Theorem 2.1.

Remark 7.9
Furthermore, as ||f k z,λ −f * ,k || L ∞ (X) converges to zero, we can shrink the ball B ε (x) (and therefore alsoΓ) towards x. The domains Ω V and Ω T will converge towardsΩ V andΩ T respectively, and therefore V z,λ and T z,λ will converge to V and T respectively. However, we do not give estimates for how fast h x and w would need to converge to zero relative to ε.