Advanced Search
Article Contents
Article Contents

Unsupervised learning of observation functions in state space models by nonparametric moment methods

  • *Corresponding author: Fei Lu

    *Corresponding author: Fei Lu 

MM, YGK and FL are partially supported by DE-SC0021361 and FA9550-21-1-0317. FL is partially funded by the NSF Award DMS-1913243

Abstract / Introduction Full Text(HTML) Figure(8) / Table(2) Related Papers Cited by
  • We investigate the unsupervised learning of non-invertible observation functions in nonlinear state space models. Assuming abundant data of the observation process along with the distribution of the state process, we introduce a nonparametric generalized moment method to estimate the observation function via constrained regression. The major challenge comes from the non-invertibility of the observation function and the lack of data pairs between the state and observation. We address the fundamental issue of identifiability from quadratic loss functionals and show that the function space of identifiability is the closure of a RKHS that is intrinsic to the state process. Numerical results show that the first two moments and temporal correlations, along with upper and lower bounds, can identify functions ranging from piecewise polynomials to smooth functions, leading to convergent estimators. The limitations of this method, such as non-identifiability due to symmetry and stationarity, are also discussed.

    Mathematics Subject Classification: Primary: 62G05, 68Q32, 62M15.


    \begin{equation} \\ \end{equation}
  • 加载中
  • Figure 1.  Empirical densities from the data trajectories of the process $ (X_{t_l}) $ in (4.2) and the observation processes $ (Y_{t_l}) $ with $ {{f_*}} = f_i $, where $ f_i $'s are the three observation functions in (4.3). Since we do not have data pairs between $ (X_{t_l}^{(m)},Y_{t_l}^{(m)}) $, these empirical densities are the available information from data. Our goal is to find the function $ {{f_*}} $ in the operator that maps the densities of $ \{X_{t_l}\} $ to the densities of $ \{Y_{t_l}\} $

    Figure 2.  Learning results of Sine function $ f_1(x) = \sin(x) $ with model (4.2)

    Figure 3.  Learning results of Sine-Cosine function $ f_2(x) = 2\sin(x) + \cos(6x) $ with model (4.2)

    Figure 4.  Learning results of Arch function $ f_3 $ with model (4.2)

    Figure 5.  Learning results of Arch function $ f_3 $ with model (4.2) and i.i.d Gaussian observation noise

    Figure 6.  Learning results of $ {{f_*}}(x) = \sin(x) $ with the state space model being $ X_t = B_t+ X_0 $ where $ X_{0}\sim \mathrm{Unif}(0,1) $. Due to the symmetry with respect to the line $ x = \frac{1}{2} $, the estimator $ \widehat{f}(x) $ and its reflection $ \widehat{f}(1-x) $ are indistinguishable by the loss functional and they lead to similar prediction of the distribution of $ \{Y_{t_l}\} $

    Figure 7.  Learning results of $ {{f_*}}(x) = \sin(x) $ with stationary Ornstein-Uhlenbeck process. Due to limited information from the moments, the estimator is inaccurate

    Figure 8.  The selection of the dimension and the degree of B-spline basis functions in the case of Sine-Cosine function. In (a), the 2-Wasserstein distance reaches minimum among all cases when the degree is 2 and the knot number is 15, at the same time as the $ L^2({\overline \rho_T}^L) $ error reaches the minimum. Figure (b) shows the cross-validating error indicator $ g $ (defined in (B.3)) for selecting the dimension range $ N $, suggesting an upper bound $ N = 60 $ with the threshold

    Table Algorithm 1.  Estimating the observation function by nonparametric generalized moment methods

    Input: The state space model and data $ \{Y_{t_0:t_L}^{(m)} \}_{m=1}^M $ consisting of multiple trajectories of the observation process.
    Output: Estimator $ \widehat f $.
    $\;\;$1:$\;\;$Estimate the empirical density $ {\overline \rho_T}^L $ in (2.16) and find its support $ [R_{min}, R_{max}] $.
    $\;\;$2:$\;\;$Select a basis type, Fourier or B-spline, with an estimated dimension range $ [1,N] $ (by Algorithm 2), and compute the basis functions as described in Section 2.3 using the support of $ {\overline \rho_T}^L $.
    $\;\;$3:$\;\;$for $ n =1:N $ do
    $\;\;$4:$\;\;\;\;\;\;$Compute the moment matrices in (2.6)-(2.7) and the vectors $ b_{k,l}^M $ in (2.11).
    $\;\;$5:$\;\;\;\;\;\;$Find the estimator $ \widehat c_n $ by optimization with multiple initial conditions. Compute and record the values of the loss functional and the 2-Wasserstein distances.
    $\;\;$6:$\;\;$Select the optimal dimension $ n $ (and degree if B-spline basis) that has the minimal 2-Wasserstein distance in (B.5). Return the estimator $ \widehat f = \sum_{i = 1}^{n} c^i_{n} \phi_i $.
     | Show Table
    DownLoad: CSV

    Table Algorithm 2.  Cross-validating Estimation of Dimension Range (CEDR) for hypothesis space

    Input: The state space model and data $ \{Y_{t_0:t_L}^{(m)} \}_{m=1}^M $.
    Output: A range $ [1,N] $ for the dimension of the hypothesis space for further selection.
    $\;\;$1$\;\;$stimate the empirical density $ {\overline \rho_T} $ in (2.16) and find its support $ [R_{min}, R_{max}] $.
    $\;\;$2:$\;\;$Set $ n=1 $ and $ g(n)=0 $. Estimate the threshold $ \tau $ in (B.4).
    $\;\;$3:$\;\;$While $ g(n)\leq \tau $ do
    $\;\;$4:$\;\;\;\;\;\;$Set $ n\leftarrow n+1 $. Update the basis functions, Fourier or B-spline, as in Section 2.3.
    $\;\;$5:$\;\;\;\;\;\;$Compute normal matrix $ \overline{A}_1 $ in (2.6) by Monte Carlo. Also, compute $ b $ and $ b' $ in (B.1).
    $\;\;$6:$\;\;\;\;\;\;$Eigen-decomposition of $ \overline{A}_1 $ as in (B.2); return $ \overline{A}_1 =\sum_{i=1}^n u_i \sigma_i u_i^T $ with $ u_i^\top B u_j= \delta_{i,j} $.
    $\;\;$7:$\;\;\;\;\;\;$Compute the Picard projection ratios: $ r_i = \frac{|u_i^\top (b-b')|}{\sigma_i} $ for $ i=1,\ldots,n $ and $ g(n)= \sum_{i=1}^n r_i^2 $.
    $\;\;$8:$\;\;$Return $ N=n $.
     | Show Table
    DownLoad: CSV
  • [1] C. Berg, J. P. R. Christensen and P. Ressel, Harmonic Analysis on Semigroups: Theory of Positive Definite and Related Functions, volume 100., New York: Springer, 1984. doi: 10.1007/978-1-4612-1128-0.
    [2] S. A. Billings, Nonlinear System Identification, John Wiley & Sons, Ltd, Chichester, UK, 2013. doi: 10.1002/9781118535561.
    [3] P. J. Brockwell and R. A. Davis, Time Series: Theory and Methods, Springer, New York, 2nd edition, 1991. doi: 10.1007/978-1-4419-0320-4.
    [4] O. Cappé, E. Moulines and T. Rydén, Inference in Hidden Markov Models, Springer Series in Statistics. Springer, New York; London, 2005.
    [5] J. A. Carrillo and G. Toscani, Wasserstein metric and large–time asymptotics of nonlinear diffusion equations, In New Trends in Mathematical Physics: In Honour of the Salvatore Rionero 70th Birthday, 234-244. World Scientific, 2004.
    [6] R. R. CoifmanS. LafonA. B. LeeM. MaggioniB. NadlerF. Warner and S. W. Zucker, Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps, Proceedings of the National Academy of Sciences of the United States of America, 102 (2005), 7426-7431.  doi: 10.1073/pnas.0500334102.
    [7] F. Cucker and D.-X. Zhou, Learning Theory: An Approximation Theory Viewpoint, volume 24. Cambridge University Press, 2007. doi: 10.1017/CBO9780511618796.
    [8] J. Fan and Q. Yao, Nonlinear Time Series: Nonparametric and Parametric Methods, Springer, New York, NY, 2003. doi: 10.1007/b97702.
    [9] R. D. FierroG. H. GolubP. C. Hansen and D. P. O'Leary, Regularization by truncated total least squares, SIAM J. Sci. Comput., 18 (1997), 1223-1241.  doi: 10.1137/S1064827594263837.
    [10] A. Friedman, Stochastic differential equations and applications, In Stochastic Differential Equations, 75-148. Springer, 2010. doi: 10.1007/978-3-642-11079-5_2.
    [11] C. Gelada, S. Kumar, J. Buckman, O. Nachum and M. G. Bellemare, DeepMDP: Learning continuous latent space models for representation learning, arXiv: 1906.2736, Cs Stat, 2019.
    [12] A. GhoshS. MukhopadhyayS. Roy and S. Bhattacharya, Bayesian inference in nonparametric dynamic state space models, Statistical Methodology, 21 (2014), 35-48.  doi: 10.1016/j.stamet.2014.02.004.
    [13] N. Guglielmi and E. Hairer, Classification of hidden dynamics in discontinuous dynamical systems, SIAM J. Appl. Dyn. Syst., 14 (2015), 1454-1477.  doi: 10.1137/15100326X.
    [14] L. Györfi, M. Kohler, A. Krzyzak and H. Walk, A Distribution-Free Theory of Nonparametric Regression, Springer Science & Business Media, 2006.
    [15] D. Hafner, T. Lillicrap, I. Fischer, R. Villegas, D. Ha, H. Lee and J. Davidson, Learning Latent dynamics for planning from pixels, arXiv: 1811.4551, Cs Stat, 2019.
    [16] P. C. Hansen, The L-curve and its use in the numerical treatment of inverse problems, In in Computational Inverse Problems in Electrocardiology, ed. P. Johnston, Advances in Computational Bioengineering, 119-142. WIT Press, 2000.
    [17] M. R. Jeffrey, Hidden Dynamics: The Mathematics of Switches, Decisions and Other Discontinuous Behaviour, Springer International Publishing, Cham, 2018. doi: 10.1007/978-3-030-02107-8.
    [18] L. Kaiser, M. Babaeizadeh, P. Milos, B. Osinski, R. H. Campbell, K. Czechowski, D. Erhan, C. Finn, P. Kozakowski, S. Levine, A. Mohiuddin, R. Sepassi, G. Tucker and H. Michalewski, Model-Based Reinforcement Learning for Atari, arXiv: 1903.0374, Cs Stat, 2020.
    [19] N. KantasA. DoucetS. S. Singh and J. M. Maciejowski, An overview of sequential Monte Carlo methods for parameter estimation in general state-space models, IFAC Proc. Vol., 42 (2009), 774-785.  doi: 10.3182/20090706-3-FR-2004.00129.
    [20] N. Kolbe, Wasserstein distance, https://github.com/nklb/wasserstein-distance, 2020.
    [21] Q. Lang and F. Lu, Identifiability of interaction kernels in mean-field equations of interacting particles, arXiv preprint, arXiv: 2106.05565, 2021.
    [22] K. Law, A. Stuart and K. Zygalakis, Data Assimilation: A Mathematical Introduction, Springer, 2015. doi: 10.1007/978-3-319-20325-6.
    [23] Z. LiF. LuM. MaggioniS. Tang and C. Zhang, On the identifiability of interaction functions in systems of interacting particles, Stochastic Processes and their Applications, 132 (2021), 135-163.  doi: 10.1016/j.spa.2020.10.005.
    [24] L. Ljung, System identification, In Signal Analysis and Prediction, 163-173. Springer, 1998. doi: 10.1007/978-1-4612-1768-8_11.
    [25] F. Lu, Q. Lang and Q. An, Data adaptive RKHS Tikhonov regularization for learning kernels in operators, arXiv preprint, arXiv: 2203.03791, 2022.
    [26] F. LuM. ZhongS. Tang and M. Maggioni, Nonparametric inference of interaction laws in systems of agents from trajectory data, Proc. Natl. Acad. Sci. USA, 116 (2019), 14424-14433.  doi: 10.1073/pnas.1822012116.
    [27] T. Lyche, C. Manni and H. Speleers, Foundations of spline theory: B-splines, spline approximation, and hierarchical refinement, Splines and PDEs: From Approximation Theory to Numerical Linear Algebra, volume 2219, Springer International Publishing, Cham, 2018, 1-76.
    [28] C. Moosmüller, F. Dietrich and I. G. Kevrekidis, A geometric approach to the transport of discontinuous densities, arXiv: 1907.8260, Phys. Stat, 2019.
    [29] V. M. Panaretos and Y. Zemel, Statistical aspects of wasserstein distances, Annual Review of Statistics and its Application, 6 (2019), 405-431.  doi: 10.1146/annurev-statistics-030718-104938.
    [30] L. Piegl and W. Tiller, The NURBS Book, Monographs in Visual Communication, Springer Berlin Heidelberg, Berlin, Heidelberg, 1997.
    [31] Y. PokernA. M. Stuart and P. Wiberg, Parameter estimation for partially observed hypoelliptic diffusions, J. R. Stat. Soc. Ser. B Stat. Methodol., 71 (2009), 49-73.  doi: 10.1111/j.1467-9868.2008.00689.x.
    [32] B. L. S. Prakasa Rao, Statistical inference from sampled data for stochastic processes, In N. U. Prabhu, editor, Contemporary Mathematics, volume 80,249-284. American Mathematical Society, Providence, Rhode Island, 1988. doi: 10.1090/conm/080/999016.
    [33] A. Rahimi and B. Recht, Unsupervised regression with applications to nonlinear system identification, In Advances in Neural Information Processing Systems, (2007), 1113-1120.
    [34] M. Sørensen, Estimating functions for diffusion-type processes, In Statistical Methods for Stochastic Differential Equations, volume 124, 1-107. Monogr. Statist. Appl. Probab, 2012. doi: 10.1201/b12126-2.
    [35] H. Sun, Mercer theorem for RKHS on noncompact sets, Journal of Complexity, 21 (2005), 337-349.  doi: 10.1016/j.jco.2004.09.002.
    [36] A. Svensson and T. B. Schön, A flexible state-space model for learning nonlinear dynamical systems, Automatica, 80 (2017), 189-199.  doi: 10.1016/j.automatica.2017.02.030.
    [37] F. TobarP. M. Djuric and D. P. Mandic, Unsupervised state-space modeling using reproducing kernels, IEEE Trans. Signal Process., 63 (2015), 5210-5221.  doi: 10.1109/TSP.2015.2448527.
    [38] F. X. F. Ye, S. Yang and M. Maggioni, Nonlinear model reduction for slow-fast stochastic systems near manifolds, 2021.
  • 加载中




Article Metrics

HTML views(1686) PDF downloads(156) Cited by(0)

Access History

Other Articles By Authors



    DownLoad:  Full-Size Img  PowerPoint