Kernel Methods for the Approximation of Some Key Quantities of Nonlinear Systems

We introduce a data-based approach to estimating key quantities which arise in the study of nonlinear control systems and random nonlinear dynamical systems. Our approach hinges on the observation that much of the existing linear theory may be readily extended to nonlinear systems - with a reasonable expectation of success - once the nonlinear system has been mapped into a high or infinite dimensional feature space. In particular, we develop computable, non-parametric estimators approximating controllability and observability energy functions for nonlinear systems, and study the ellipsoids they induce. In all cases the relevant quantities are estimated from simulated or observed data. It is then shown that the controllability energy estimator provides a key means for approximating the invariant measure of an ergodic, stochastically forced nonlinear system.


Introduction
Personal computing has developed to the point where in many cases it ought to be easier to simulate a dynamical system and analyze the empirical data, rather than attempt to study the system analytically. Indeed, for large classes of nonlinear systems, numerical analysis may be the only viable option. Yet the mathematical theory necessary to analyze dynamical systems on the basis of observed data is still largely underdeveloped. In previous work the authors proposed a linear, data-based approach for model reduction of nonlinear control systems [3]. The approach is based on lifting simulated trajectories of the system into a high or infinite dimensional feature (Hilbert) space where the evolution of the original system may be reasonably modeled as linear. One may then implicitly carry out linear balancing, truncation, and model reduction in the feature space while retaining nonlinearity in the original statespace.
In this paper, we continue under this setting and explore data-based definitions of key concepts for nonlinear control and random dynamical systems. We propose an empirical approach for estimation of the controllability and observability energies for stable nonlinear control systems, as well as invariant measures and their supports for ergodic nonlinear stochastic differential equations. Our methodology applies the relevant linear theory in a feature space where it is assumed that the original nonlinear system behaves approximately linearly. In this case we leverage the well-known connection between the controllability gramian of a linear control system and the invariant measure associated to the corresponding linear stochastic differential equation. This relationship was previously identified as useful for finding the controllability energy for certain nonlinear control systems given the invariant measure of the corresponding randomly forced dynamical system [23]. The approach in [23], however, requires solving a Fokker-Planck equation and so applies to only a narrow class of systems. Our approach takes the reverse direction in a data-driven setting: given an empirical estimate of the controllability energy function, one can obtain an estimate of the invariant measure. In particular, we will propose a consistent, data-based estimator for the controllability energy function of a nonlinear control system, and show how it can be used to estimate the invariant measure and its support for the corresponding stochastic differential equation (SDE).
The essential point of this paper is to illustrate that it is possible to find data-based estimates of linear objects in order to understand nonlinear control and random dynamical systems, without having to solve a Hamilton-Jacobi-Bellman or Lyapunov equation in the case of nonlinear control systems, or a Fokker-Planck equation in the case of nonlinear SDEs. The approach proposed here also highlights the close interaction between control and random dynamical systems and demonstrates how control theoretic objects can be useful for studying random dynamical systems. Our contribution should be seen as a step towards developing a mathematical, data-driven theory for dynamical systems which can be used to analyze and predict random dynamical systems, as well as offer data-driven control strategies for nonlinear systems on the basis of observed data rather than a pre-specified model.

Linear Systems as a Paradigm for Working in RKHS: Background
In this section we give a brief overview of some important background concepts in linear control, random dynamical systems and reproducing kernel Hilbert spaces (RKHS). We will make use of the linear theory that follows after mapping the state variable of a nonlinear system into a suitable RKHS, thereby harnessing RKHS theory as a framework for extending linear tools to nonlinear systems. The following background material closely follows [12,24,5].

Linear Control Systems
Consider a linear control systemẋ where x ∈ R n , u ∈ R q , y ∈ R p , (A, B) is controllable, (A, C) is observable and A is Hurwitz. We define the controllability and the observability Gramians as, respectively, These two matrices can be viewed as a measure of the controllability and the observability of the system [22]. For instance, consider the past energy [27], L c (x 0 ), defined as the minimal energy required to reach x 0 from 0 in infinite time and the future energy [27], L o (x 0 ), defined as the output energy generated by releasing the system from its initial state x(t 0 ) = x 0 , and zero input u(t) = 0 for t ≥ 0, i.e.
In the linear case, it can be shown that Moreover, W c and W o satisfy the following Lyapunov equations [12]: These energies are directly related to the controllability and observability operators.
The significance of this operator is made evident via the following optimal control problem: Given the linear systemẋ(t) = Ax(t) + Bu(t) defined for t ∈ (−∞, 0) with x(−∞) = 0, and for x(0) ∈ C n with unit norm, what is the minimum energy input u which drives the state x(t) to x(0) = x 0 at time zero?
That is, what is the u ∈ L 2 (−∞, 0] solving Ψ c u = x 0 with smallest norm u 2 ? If (A, B) is controllable, then Ψ c Ψ * c =: W c is nonsingular, and the answer to the preceding question is The input energy is given by Moreover, the reachable set through u opt , i.e. the final states x 0 = Ψ c u that can be reached given an input u ∈ L 2 (−∞, 0] of unit norm, {Ψ c u : u ∈ L 2 (−∞, 0] and u 2 ≤ 1} may be defined as Similarly, for the autonomous systeṁ where A is Hurwitz, the observability operator is defined as follows.
[12] Given a matrix pair (A, C), where A is Hurwitz, the observability operator Ψ o is defined as The corresponding observability ellipsoid is given by The energy of the output signal y = Ψ o x 0 , for x 0 ∈ C n can then be computed as

Linear Stochastic Differential Equations
In this section, we review the relevant background for stochastically forced differential equations (see e.g. [24,5] for more detail). Here we will consider stochastically excited dynamical control systems affine in the input where G : R n → R n×q is a smooth matrix-valued function and x ∈ R n . We replace the control inputs by sample paths of white Gaussian noise processes, giving the corresponding stochastic differential equation with W (q) t a q−dimensional Brownian motion. The solution X t to this SDE is a Markov stochastic process with transition probability density ρ(t, x). The time evolution of the probability density ρ(t, x) is described by the Fokker-Planck (or Forward Kolmogorov) equation The differential operator L on the right-hand side is referred to as the Fokker-Planck operator associated to (10). The steady-state probability density for (10) is a solution of the equation In the context of linear Gaussian theory where we are given an n−dimensional system of the form with A ∈ R n×n , B ∈ R n×q , the transition density is Gaussian. It is therefore sufficient to find the mean and covariance of the solution X(t) in order to uniquely determine the transition probability density.
t ] so that we may find Q by solving the Lyapunov system AQ + QA ⊤ = −BB ⊤ . Thus the solution Q is exactly the controllability gramian which is positive iff. the pair (A, B) is controllable [5]. Combining the above facts, the steady-state probability density is given by using (2) and letting Z = (2π) n det(W c ). Equation (14) suggests the following key observations in the linear setting: • Given an approximationL c of L c we obtain an approximation for ρ ∞ of the form • Given an approximationρ ∞ of ρ ∞ we obtain an approximation for L c (x) by solvinĝ We note that these approximations have been used in different contexts to study nonlinear control and random dynamical systems. For instance, in [6], Equation (15) was used to find explicit solutions of the Fokker-Planck equation for systems where a Lyapunov equation for the unforced system can be found and solved. In [23], Equation (16) was used, given an explicit solution to the Fokker-Planck equation, to approximate the controllability energy and subsequently applied to the problem of model reduction for nonlinear control systems.
Although the above relationship between ρ ∞ and L c holds for only a small class of systems (e.g. linear and some Hamiltonian systems), by mapping a nonlinear system into a suitable reproducing kernel Hilbert space we may reasonably extend this connection to a broad class of nonlinear systems. We will return to this topic in Section 5 after defining kernel Hilbert spaces and introducing gramians in RKHS.

Reproducing Kernel Hilbert Spaces
We give a brief overview of reproducing kernel Hilbert spaces as used in statistical learning theory. The discussion here borrows heavily from [8,28,30]. Early work developing the theory of RKHS was undertaken by N. Aronszajn [1]. Definition 2.3. Let H be a Hilbert space of functions on a set X . Denote by f, g the inner product on H and let f = f, f 1/2 be the norm in H, for f and g ∈ H. We say that H is a reproducing kernel Hilbert space (RKHS) if there exists a function K : X × X → R such that iii. K has the reproducing property: K will be called a reproducing kernel of H. H K will denote the RKHS H with reproducing kernel K where it is convenient to explicitly note this dependence. The important properties of reproducing kernels are summarized in the following proposition.
Theorem 2.1. Let K : X × X → R be a symmetric and positive definite function. Then there exists a Hilbert space of functions H defined on X admitting K as a reproducing Kernel. Conversely, let H be a Hilbert space of functions f : Then H has a reproducing kernel K.
Theorem 2.2. Let K(x, y) be a positive definite kernel on a compact domain or a manifold X. Then there exists a Hilbert space F and a function Φ : X → F such that Φ is called a feature map, and F a feature space 1 .
Given Theorem 2.2, and property [iv.] in Proposition 2.1, note that we can take Φ(x) := K x := K(x, ·) in which case F = H -the "feature space" is the RKHS itself, as opposed to an isomorphic space. We will make extensive use of this feature map. The fact that Mercer kernels are positive definite and symmetric is also key; these properties ensure that kernels induce positive, symmetric matrices and integral operators, reminiscent of similar properties enjoyed by gramians and covariance matrices. Finally, in practice one typically first chooses a Mercer kernel in order to choose an RKHS: Theorem 2.1 guarantees the existence of a Hilbert space admitting such a function as its reproducing kernel.
A key observation however, is that working in RKHS allows one to immediately find nonlinear versions of algorithms which can be expressed in terms of inner products. Consider an algorithm expressed in 1 The dimension of the feature space can be infinite, for example in the case of the Gaussian kernel.
terms of the inner product x, x ′ X with x, x ′ ∈ X . Now assume that instead of looking at a state x, we look at its Φ image in H, In the RKHS, the inner product by the reproducing property. Hence, a nonlinear variant of the original algorithm may be implemented using kernels in place of inner products on X .

Empirical Gramians in RKHS
In this Section we recall empirical gramians for linear systems [22], as well as a notion of empirical gramians for nonlinear systems in RKHS introduced in [3]. The goal of the construction we describe here is to provide meaningful, data-based empirical controllability and observability gramians for nonlinear systems. In [3], observability and controllability gramians were used for balanced model reduction, however here we will use these quantities to analyze nonlinear control properties and random dynamical systems. We note that a related notion of gramians for nonlinear systems is briefly discussed in [16], however no method for computing or estimating them was given.

Empirical Gramians for Linear Systems
To compute the Gramians for the linear system (1), one can attempt to solve the Lyapunov equations (6) directly although this can be computationally prohibitive. For linear systems, the gramians may be approximated by way of matrix multiplications implementing primal and adjoint systems (see the method of snapshots, e.g. [26]). Alternatively, for any system, linear or nonlinear, one may take the simulation based approach introduced by B.C. Moore [22] for reduction of linear systems, and subsequently extended to nonlinear systems in [20]. The method proceeds by exciting each coordinate of the input with impulses from the zero initial state (x 0 = 0). The system's responses are sampled, and the sample covariance is taken as an approximation to the controllability gramian. Denote the set of canonical orthonormal basis vectors in R n by {e i } i . Let u i (t) = δ(t)e i be the input signal for the i-th simulation, and let x i (t) be the corresponding response of the system. Form the matrix X(t) = x 1 (t) · · · x q (t) ∈ R n×q , so that X(t) is seen as a data matrix with column observations given by the respective responses x i (t). Then the (n × n) controllability gramian is given by We can approximate this integral by sampling the matrix function X(t) within a finite time interval [0, T ] assuming for instance the regular partition This leads to the empirical controllability gramian The observability gramian is estimated by fixing u(t) = 0, setting x 0 = e i for i = 1, . . . , n, and measuring the corresponding system output responses y i (t). Now assemble the output responses into a matrix Y (t) = [y 1 (t) · · · y n (t)] ∈ R p×n . The (n × n) observability gramian W o,lin and its empirical counterpart W o,lin are respectively given by and where Y (t) = Y (t) ⊤ . The matrix Y (t i ) ∈ R n×p can be thought of as a data matrix with column observations for j = 1, . . . , p, i = 1, . . . , N so that d j (t i ) corresponds to the response at time t i of the single output coordinate j to each of the (separate) initial conditions x 0 = e k , k = 1, . . . , n.

Empirical Gramians in RKHS Characterizing Nonlinear Systems
Consider the generic nonlinear system with x ∈ R n , u ∈ R q , y ∈ R p , F (0) = 0 and h(0) = 0. Assume that the linearization of (24) around the origin is controllable, observable and A = ∂F ∂x | x=0 is asymptotically stable. RKHS counterparts to the empirical quantities (20), (22) defined above for the system (24) can be defined by considering feature-mapped lifts of the simulated samples in H K . In the following, and without loss of generality, we assume the data are centered in feature space, and that the observability samples and controllability samples are centered separately. See ( [28], Ch. 14) for a discussion on implicit data centering in RKHS with kernels.
First, observe that the gramians W c , W o can be viewed as the sample covariance of a collection of N · q, N · p vectors in R n scaled by T , respectively. Then applying Φ to the samples as in (17), we obtain the corresponding gramians in the RKHS associated to K as bounded linear operators on H K : where the samples x j , d j are as defined in Section 3.1, and a ⊗ b = a b, · denotes the tensor product in H. From here on we will use the notation W c , W o to refer to RKHS versions of the true (integrated) gramians, and W c , W o to refer to RKHS versions of the empirical gramians. Let Ψ denote the matrix whose columns are the (scaled) observability samples mapped into feature space by Φ, and let Φ be the matrix similarly built from the feature space representation of the controllability samples. Then we may alternatively express the gramians above as W c = ΦΦ ⊤ and W o = ΨΨ ⊤ , and define two other important quantities: • The controllability kernel matrix K c ∈ R N q×N q of kernel products for µ, ν = 1, . . . , N q where we have re-indexed the set of vectors {x j (t i )} i,j = {x µ } µ to use a single linear index.
• The observability kernel matrix K o ∈ R N p×N p , for µ, ν = 1, . . . , N p, where we have again re-indexed the set {d j (t i )} i,j = {d µ } µ for simplicity.
Note that K c , K o may be highly ill-conditioned. The SVD may be used to show that W c and K c ( W o and K o ) have the same singular values (up to zeros).

Nonlinear Control Systems in RKHS
In this section, we introduce empirical versions of the controllability and observability energies (2)-(3) for stable nonlinear control systems of the form (9), that can be estimated from observed data. Our underlying assumption is that a given nonlinear system may be treated as if it were linear in a suitable feature space. That reproducing kernel Hilbert spaces provide rich representations capable of capturing strong nonlinearities in the original input (data) space lends validity to this assumption. In general little is known about the energy functions in the nonlinear setting. However, Scherpen [27] has shown that the energy functions L c (x) and L o (x) defined in (2) under the assumption that (30) has a smooth solution on W . Furthermore for all x ∈ W , L c (x) is the unique smooth solution of under the assumption that (31) has a smooth solutionL c on W and that the origin is an asymptotically stable equilibrium of −(f (x) + g(x)g ⊤ (x) ∂Lc ∂x (x)) on W . We would like to avoid solving explicitly the PDEs (30)-(31) and instead find good estimates of their solutions directly from simulated or observed data.

Energy Functions
Following the linear theory developed in Section 2.1, we would like to define analogous controllability and observability energy functions paralleling (4)-(5), but adapted to the nonlinear setting. We first treat the controllability function. Let µ ∞ on the statespace X denote the unknown invariant measure of the nonlinear system (24) when driven by white Gaussian noise. We will consider here the case where the controllability samples {x i } m i=1 are i.i.d. random draws from µ ∞ , and X is a compact subset of R n . The former assumption is implicitly made in much of the empirical balancing literature, and if a system is simulated for long time intervals, it should hold approximately in practice. If we take Φ(x) = K x , the infinite-data limit of (25) is given by In general neither W c nor its empirical approximation W c are invertible, so to define a controllability energy similar to (4) one is tempted to define L c on H as L c (h) = W † c h, h , where A † denotes the pseudoinverse of the operator A. However, the domain of W † c is equal to the range of W c , and so in general K x may not be in the domain of W † c . We will therefore introduce the orthogonal projection W † c W c mapping H → range(W c ) and define the nonlinear control energy on H as We will consider finite sample approximations to (33), however a further complication is that W † c W c may not converge to W † c W c in the limit of infinite data (taking the pseudoinverse is not a continuous operation), and W † c can easily be ill-conditioned in any event. Thus one needs to impose regularization, and we replace the pseudoinverse A † with a regularized inverse (A + λI) −1 , λ > 0 throughout. We note that the preceding observations were also made in [11]. Intuitively, regularization prevents the estimator from overfitting to a bad or unrepresentative sample of data. We thus define the estimatorL c : where λ > 0 is the regularization parameter. Towards deriving an equivalent but computable expression forL c defined in terms of kernels, we recall the sampling operator S x of [29] and its adjoint. Let x = {x i } m i=1 denote a generic sample of m data points. To x we can associate the operators If x is the collection of m = N q controllability samples, one can check that is the N q-dimensional column vector containing the kernel products between x and the controllability samples.
Similarly, letting x now denote the collection of m = N p observability samples, we can approximate the future output energy byL is the N p-dimensional column vector containing the kernel products between x and the observability samples. We collect the above results into the following definition: Definition 4.1. Given a nonlinear control system of the form (24), we define the kernel controllability energy function and the kernel observability energy function as, respectively, Note that the kernels used to defineL c andL o need not be the same.

Consistency
We'll now turn to showing that the estimatorL c is consistent, but note that we do not address the approximation error between the energy function estimates and the true but unknown underlying functions.
Controlling the approximation error requires making specific assumptions about the nonlinear system, and we leave this question open. In the following we will make an important set of assumptions regarding the kernel K and the RKHS H it induces.
Assumption 4.1. The reproducing kernel K defined on the compact statespace X ⊂ R n is locally Lipschitz, measurable and defines a completely regular RKHS. Furthermore the diagonal of K is uniformly bounded, Separable RKHSes are induced by continuous kernels on separable spaces X . Since X ⊂ R n is separable and locally Lipschitz functions are also continuous, H will always be separable. Completely regular RKHSes are introduced in [11] and the reader is referred to this reference for details. Briefly, complete regularity ensures recovery of level sets of any distribution, in the limit of infinite data. The Gaussian kernel does not define a completely regular RKHS, but the L 1 exponential and Laplacian kernels do [11]. We introduce some additional notation. Let W c,m denote the empirical RKHS gramian formed from a sample of size m observations, and let the corresponding control energy estimate in Definition 4.1 involving W c,m and regularization parameter λ be denoted by L λ c,m . The following preliminary lemma provides finite sample error bounds for Hilbert-Schmidt covariance matrices on real, separable reproducing kernel Hilbert spaces.
The following theorem establishes consistency of the estimator L λ c,m , the proof of which follows the method of integral operators developed by [29,7] and subsequently adopted in the context of density estimation by ( [11], Theorem 1).

Theorem 4.2.
(i) Fix λ > 0. For each x ∈ X , with probability at least 1 − δ, then for all x ∈ X , lim Proof. For (i), the sample error, we have where · refers to the operator norm. The second inequality follows from spectral calculus and (39). The third line follows making use of the estimates (W c,m + λI) −2 ≤ λ −2 , (W c + λI) −2 ≤ λ −2 , W c HS ≤ κ 2 , W c,m HS ≤ κ 2 (and the fact that λ > 0 so that the relevant quantities are invertible). Part (i) then follows applying Lemma 4.1 to the quantity W c,m − W c HS . For (ii), the approximation error, note that the compact self-adjoint operator W c can be expanded onto an orthonormal basis {σ i , φ i }. We then have The last quantity above can be seen to converge to 0 as λ → 0 since the sum converges for all x under the condition (42). Lastly for part (iii), we see that if m → ∞ and λ 2 → 0 slower than √ m then the sample error (i) goes to 0 while (ii) also holds. For almost sure convergence in part (i), we additionally require that for any ε ∈ (0, ∞), The choice λ m = log −1/2 m satisfies this requirement, as can be seen from the fact that for large enough M < ∞, m>M e −m/ log 2 m ≤ m>M e − √ m < ∞.
We note that the condition (42) required in part (ii) of the theorem has also been discussed in the context of support estimation in forthcoming work from the authors of [11].

Observability and Controllability Ellipsoids
Given the preceding, we can estimate the reachable and observable sets of a nonlinear control system as level sets of the RKHS energy functionsL c ,L o from Definition 4.1: Definition 4.2. Given a nonlinear control system (9), its reachable set can be estimated as and its observable set can be estimated as for suitable choices of the threshold parameters τ, τ ′ .
If the energy function estimates above are replaced with the true energy functions, and τ = τ ′ = 1/2, one obtains a finite sample approximation to the controllability and observability ellipsoids defined in Section 2.1 if the system is linear. In general, τ may be chosen empirically based on the data, using for instance a cross-validation procedure. Note that in the linear setting, the ellipsoid of strongly observable states is more commonly characterized as

Estimation of Invariant Measures for Ergodic Nonlinear SDEs
In this Section we consider ergodic nonlinear SDEs of the form (10), where the invariant (or "stationary") measure is a key quantity providing a great deal of insight. In the context of control, the support of the stationary distribution corresponds to the reachable set of the nonlinear control system and may be estimated by (43). Solving a Fokker-Planck equation of the form (11) is one way to determine the probability distribution describing the solution to an SDE. However, for nonlinear systems finding an explicit solution to the Fokker-Planck equation -or even its steady-state solution-is a challenging problem. The study of existence of steady-state solutions can be traced back to the 1960s [15,31], however explicit formulas for steady-state solutions of the Fokker-Planck equation exist in only a few special cases (see [6,10,15,17,21,24] for example). Such systems are often conservative or second order vector-fields. Hartmann [18] among others has studied balanced truncation in the context of linear SDEs, where empirical estimation of gramians plays a key role. We propose here a data-based non-parametric estimate of the solution to the steady-state Fokker-Planck equation (12) for a nonlinear SDE, by combining the relation (15) with the control energy estimate (38). Following the general theme of this paper, we make use of the theory from the linear Gaussian setting described in Section 2.2, but in a suitable reproducing kernel Hilbert space. Other estimators have of course been proposed in the literature for approximating invariant measures and for density estimation from data more generally (see e.g. [2,13,14,19,11]), however to our knowledge we are not aware of any estimation techniques which combine RKHS theory and nonlinear dynamical control systems. An advantage of our approach over other non-parametric methods is that an invariant density is approximated by way of a regularized fitting process, giving the user an additional degree of freedom in the regularization parameter.
Our setting adopts the perspective that the nonlinear stochastic system (10) behaves approximately linearly when mapped via Φ into the RKHS H, and as such may be modeled by an infinite dimensional linear system in H. Although this system is unknown, we know that it is linear and that we can estimate its gramians and control energies from observed data. Furthermore, we know that the invariant measure of the system in H is zero-mean Gaussian with covariance given by the controllability gramian. Thus the original nonlinear system's invariant measure on X should be reasonably approximated by the pullback along Φ of the Gaussian invariant measure associated with the linear infinite dimensional SDE in H.
We summarize the setting in the following modeling Assumption: Assumption 5.1. Let H be a real, possibly infinite dimensional RKHS satisfying Assumption 4.1.
(i) Given a suitable choice of kernel K, if the R d -valued stochastic process x(t) is a solution to the (ergodic) stochastically excited nonlinear system (10), the H-valued stochastic process (Φ • x)(t) =: X(t) can be reasonably modeled as an Ornstein-Uhlenbeck process where A is linear, negative and is the infinitesimal generator of a strongly continuous semigroup e tA , C is linear, continuous, positive and self-adjoint, and W (t) is the cylindrical Wiener process.
(ii) The measure P ∞ is the invariant measure of the OU process (45) and P ∞ is the pushforward along Φ of the unknown invariant measure µ ∞ on the statespace X we would like to approximate.
(iii) The measure µ ∞ is absolutely continuous with respect to Lebesgue measure, and so admits a density.
We will proceed in deriving an estimate of the invariant density under these assumptions, but note that there are interesting systems for which the assumptions may not always hold in practice. For example, uncontrollable systems may not have a unique invariant measure. In these cases one must interpret the results discussed here as heuristic in nature.
It is known that a mild solution X(t) to the SDE (45) exists and is unique ( [10], Thm. 5.4. pg. 121). Furthermore, the controllability gramian associated to (45) is trace class ( [9], Lemma 8.19), and the unique measure P ∞ invariant with respect to the Markov semigroup associated to the OU process has characteristic function ( [9], Theorem 8.20) We will use the notation P to refer to the Fourier transform of the measure P . The law of the solution X(t) to problem (45) given initial condition X(0) = 0 is Gaussian with zero mean and covariance operator Q t = t 0 e sA Ce sA * ds. Thus where the last integral follows pulling P ∞ back to X via Φ, establishing the equivalence between (46) and (32). Given that the measure P ∞ has Fourier transform (47) and by Assumption 5.1 is interpreted as the pushforward of µ ∞ (that is, for Borel sets B ∈ B(H), P ∞ (B) = (Φ * µ ∞ )(B) = µ ∞ (Φ −1 (B)) formally), we have that µ ∞ (x) = exp − 1 2 W c K x , K x . The invariant measure µ ∞ is defined on a finite dimensional space, so together with part (iii) of Assumption 5.1, we may consider the corresponding (Radon-Nikodym) density whenever the condition (42) holds. If (42) does not hold or if we are considering a finite data sample, then we regularize to arrive at as discussed in Section 2.2 (see Eq. 15) and Section 4.1. This density may be estimated from data {x i } N i=1 since the controllability energy may be estimated from data: at a new point x, we havê whereL c is the empirical approximation computed according to Definition 4.1, and the constant Z may be either computed analytically in some cases or simply estimated from the data sample to enforce summation to unity. We may also estimate, for example, level sets of ρ ∞ (such as the support) by considering level sets of the regularized control energy function estimator, {x ∈ X | L c,m (x) ≤ τ }.

Conclusion
To summarize our contributions, we have introduced estimators for the controllability/observability energies and the reachable/observable sets of nonlinear control systems. We showed that the controllability energy estimator may be used to approximate the stationary solution of the Fokker-Planck equation governing nonlinear SDEs (and its support). The estimators we derived were based on applying linear methods for control and random dynamical systems to nonlinear control systems and SDEs, once mapped into an infinite-dimensional RKHS acting as a "linearizing space". These results collectively argue that there is a reasonable passage from linear dynamical systems theory to a data-based nonlinear dynamical systems theory through reproducing kernel Hilbert spaces.
We leave for future work the formulation of data-based estimators for Lyapunov exponents and the controllability/observability operators Ψ c , Ψ o associated to nonlinear systems.