Advanced Search
Article Contents
Article Contents

Dimension reduction in recurrent networks by canonicalization

  • *Corresponding author: Juan-Pablo Ortega

    *Corresponding author: Juan-Pablo Ortega

JPO acknowledges partial financial support coming from the Research Commission of the Universität Sankt Gallen and the Swiss National Science Foundation (grant number 200021 175801/1)

Abstract Full Text(HTML) Related Papers Cited by
  • Many recurrent neural network machine learning paradigms can be formulated using state-space representations. The classical notion of canonical state-space realization is adapted in this paper to accommodate semi-infinite inputs so that it can be used as a dimension reduction tool in the recurrent networks setup. The so-called input forgetting property is identified as the key hypothesis that guarantees the existence and uniqueness (up to system isomorphisms) of canonical realizations for causal and time-invariant input/output systems with semi-infinite inputs. Additionally, the notion of optimal reduction coming from the theory of symmetric Hamiltonian systems is implemented in our setup to construct canonical realizations out of input forgetting but not necessarily canonical ones. These two procedures are studied in detail in the framework of linear fading memory input/output systems. {Finally, the notion of implicit reduction using reproducing kernel Hilbert spaces (RKHS) is introduced which allows, for systems with linear readouts, to achieve dimension reduction without the need to actually compute the reduced spaces introduced in the first part of the paper.

    Mathematics Subject Classification: Primary: 68T05, 68T07; Secondary: 93-08, 93Bxx.


    \begin{equation} \\ \end{equation}
  • 加载中
  • [1] A. C. Antoulas, Mathematical System Theory. The Influence of R. E. Kalman, Springer-Verlag, Berlin, 1991. doi: 10.1007/978-3-662-08546-2.
    [2] P. Barančok and I. Farkaš, Memory capacity of input-driven echo state networks at the edge of chaos, in Artificial Neural Networks and Machine Learning – ICANN 2014, Lecture Notes in Computer Science, 8681, Springer, Cham, 2014, 41–48. doi: 10.1007/978-3-319-11179-7_6.
    [3] L. E. Baum and T. Petrie, Statistical inference for probabilistic functions of finite state Markov chains, Ann. Math. Statist., 37 (1966), 1554-1563.  doi: 10.1214/aoms/1177699147.
    [4] G. Blankenstein and T. S. Ratiu, Singular reduction of implicit Hamiltonian systems, Rep. Math. Phys., 53 (2004), 211-260.  doi: 10.1016/S0034-4877(04)90013-4.
    [5] A. M. Bloch, Nonholonomic Mechanics and Control, 2$^{nd}$ edition, Interdisciplinary Applied Mathematics, 24, Springer, New York, 2015. doi: 10.1007/978-1-4939-3017-3.
    [6] S. Boyd and L. O. Chua, Fading memory and the problem of approximating nonlinear operators with Volterra series, IEEE Trans. Circuits and Systems, 32 (1985), 1150-1161.  doi: 10.1109/TCS.1985.1085649.
    [7] F. Bullo and A. D. Lewis, Geometric Control of Mechanical Systems. Modeling, Analysis, and Design for Simple Mechanical Control Systems, Applied Mathematics, 49, Springer-Verlag, New York, 2005. doi: 10.1007/978-1-4899-7276-7.
    [8] P. ButeneersD. VerstraetenB. V. NieuwenhuyseD. Stroobandt and R. Raedt, et al., Real-time detection of epileptic seizures in animal models using reservoir computing, Epilepsy Res., 103 (2013), 124-134.  doi: 10.1016/j.eplepsyres.2012.07.013.
    [9] A. S. Charles, D. Yin and C. J. Rozell, Distributed sequence memory of multidimensional inputs in recurrent networks, J. Mach. Learn. Res., 18 (2017), 37pp.
    [10] R. Couillet, G. Wainrib, H. Sevi and H. Tiomoko Ali, The asymptotic performance of linear echo state neural networks, J. Mach. Learn. Res., 17 (2016), 35pp.
    [11] C. Cuchiero, L. Gonon, L. Grigoryeva, J.-P. Ortega and J. Teichmann, Discrete-time signatures and randomness in reservoir computing, IEEE Trans. Neural Netw. Learn. Syst., (2021), 1–10. doi: 10.1109/TNNLS.2021.3076777.
    [12] J. Dambre, D. Verstraeten, B. Schrauwen and S. Massar, Information processing capacity of dynamical systems, Scientific Reports, 2 (2012). doi: 10.1038/srep00514.
    [13] H. Dang Van Mien and D. Normand-Cyrot, Nonlinear state affine identification methods: Applications to electrical power plants, Automatica, 20 (1984), 175-188.  doi: 10.1016/0005-1098(84)90023-2.
    [14] K. Doya, Bifurcations in the learning of recurrent neural networks, Proc. IEEE Internat. Symposium on Circuits Syst., San Diego, CA, 1992. doi: 10.1109/ISCAS.1992.230622.
    [15] M. Duflo, Random Iterative Models, Applications of Mathematics (New York), 34, Springer-Verlag, Berlin, 1997.
    [16] J. Durbin and S. J. Koopman, Time Series Analysis by State Space Methods, Oxford Statistical Science Series, 38, Oxford University Press, Oxford, 2012. doi: 10.1093/acprof:oso/9780199641178.001.0001.
    [17] I. FarkašR. Bosák and P. Gergel', Computational analysis of memory capacity in echo state networks, Neural Networks, 83 (2016), 109-120.  doi: 10.1016/j.neunet.2016.07.012.
    [18] M. Fliess and D. Normand-Cyrot, A group-theoretic approach to discrete-time non-linear controllability, 20th IEEE Conference on Decision and Control including the Symposium on Adaptive Processes, San Diego, CA, 1981. doi: 10.1109/CDC.1981.269266.
    [19] S. GanguliD. Huh and H. Sompolinsky, Memory traces in dynamical systems, PNAS, 105 (2008), 18970-18975.  doi: 10.1073/pnas.0804451105.
    [20] F. Gay-Balmaz and T. S. Ratiu, Clebsch optimal control formulation in mechanics, J. Geom. Mech, 3 (2011), 41-79.  doi: 10.3934/jgm.2011.3.41.
    [21] L. Gonon, L. Grigoryeva and J.-P. Ortega, Approximation error estimates for random neural networks and reservoir systems, preprint, arXiv: 2002.05933.
    [22] L. Gonon, L. Grigoryeva and J.-P. Ortega, Memory and forecasting capacities of nonlinear recurrent networks, Phys. D, 414 (2020), 13pp. doi: 10.1016/j.physd.2020.132721.
    [23] L. Gonon, L. Grigoryeva and J.-P. Ortega, Risk bounds for reservoir computing, J. Mach. Learn. Res., 21 (2020), 61pp.
    [24] L. Gonon and J.-P. Ortega, Fading memory echo state networks are universal, Neural Networks, 138 (2021), 10-13.  doi: 10.1016/j.neunet.2021.01.025.
    [25] L. Gonon and J.-P. Ortega, Reservoir computing universality with stochastic inputs, IEEE Trans. Neural Netw. Learn. Syst., 31 (2020), 100-112.  doi: 10.1109/TNNLS.2019.2899649.
    [26] A. Goudarzi, S. Marzen, P. Banda, G. Feldman, M. R. Lakin, C. Teuscher and D. Stefanovic, Memory and information processing in recurrent neural networks, preprint, arXiv: 1604.06929.
    [27] A. Graves, A.-R. Mohamed and G. Hinton, Speech recognition with deep recurrent neural networks, IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, 2013. doi: 10.1109/ICASSP.2013.6638947.
    [28] L. Grigoryeva, A. Hart and J.-P. Ortega, Chaos on compact manifolds: Differentiable synchronizations beyond the Takens theorem, Phys. Rev. E, 103 (2021), 12pp. doi: 10.1103/physreve.103.062204.
    [29] L. Grigoryeva, A. Hart and J.-P. Ortega, Learning strange attractors with reservoir systems, preprint, arXiv: 2108.05024.
    [30] L. GrigoryevaJ. HenriquesL. Larger and J.-P. Ortega, Nonlinear memory capacity of parallel time-delay reservoir computers in the processing of multidimensional signals, Neural Comput., 28 (2016), 1411-1451.  doi: 10.1162/NECO_a_00845.
    [31] L. GrigoryevaJ. HenriquesL. Larger and J.-P. Ortega, Optimal nonlinear information processing capacity in delay-based reservoir computers, Scientific Rep., 5 (2015), 1-11.  doi: 10.1038/srep12858.
    [32] L. GrigoryevaJ. HenriquesL. Larger and J.-P. Ortega, Stochastic time series forecasting using time-delay reservoir computers: Performance and universality, Neural Networks, 55 (2014), 59-71.  doi: 10.2139/ssrn.2350331.
    [33] L. Grigoryeva, J. Henriques and J.-P. Ortega, Reservoir computing: Information processing of stationary signals, IEEE International Conference on Computational Science and Engineering (CSE), Paris, France, 2016. doi: 10.1109/CSE-EUC-DCABES.2016.231.
    [34] L. Grigoryeva and J.-P. Ortega, Differentiable reservoir computing, J. Mach. Learn. Res., 20 (2019), 62pp.
    [35] L. Grigoryeva and J.-P. Ortega, Echo state networks are universal, Neural Networks, 108 (2018), 495-508.  doi: 10.1016/j.neunet.2018.08.025.
    [36] L. Grigoryeva and J.-P. Ortega, Universal discrete-time reservoir computers with stochastic inputs and linear readouts using non-homogeneous state-affine systems, J. Mach. Learn. Res., 19 (2018), 40pp.
    [37] J. W. Grizzle and S. I. Marcus, The structure of nonlinear control systems possessing symmetries, IEEE Trans. Automat. Control, 30 (1985), 248-258.  doi: 10.1109/TAC.1985.1103927.
    [38] J. Hanson and M. Raginsky, Universal approximation of input-output maps by temporal convolutional nets, preprint, arXiv: 1906.09211.
    [39] A. HartJ. Hook and J. Dawes, Embedding and approximation theorems for echo state networks, Neural Networks, 128 (2020), 234-247.  doi: 10.1016/j.neunet.2020.05.013.
    [40] A. G. Hart, J. L. Hook and J. H. P. Dawes, Echo state networks trained by Tikhonov least squares are $L^2(\mu)$ approximators of ergodic dynamical systems, Phys. D, 421 (2021), 9pp. doi: 10.1016/j.physd.2021.132882.
    [41] S. Haykin, Neural Networks and Learning Machines, Pearson, Addison Wesley, 2009.
    [42] M. Hermans and B. Schrauwen, Memory in linear recurrent neural networks in continuous time., Neural Networks, 23 (2010), 341-355.  doi: 10.1016/j.neunet.2009.08.008.
    [43] R. A. Horn and C. R. Johnson, Matrix Analysis, 2$^{nd}$ edition, Cambridge University Press, Cambridge, 2013.
    [44] G.-B. HuangQ.-Y. Zhu and C.-K. Siew, Extreme learning machine: Theory and applications, Neurocomputing, 70 (2006), 489-501.  doi: 10.1016/j.neucom.2005.12.126.
    [45] C. E. Hutchinson, The Kalman filter applied to aerospace and electronic systems, IEEE Trans. Aerospace Electron. Syst., AES-20 (1984), 500-504.  doi: 10.1109/TAES.1984.4502068.
    [46] H. Jaeger, The "Echo State" Approach to Analysing and Training Recurrent Neural Networks with an Erratum Note, GMD Report, German National Research Center for Information Technology, 2010.
    [47] H. Jaeger, Short term memory in echo state networks, Fraunhofer Institute for Autonomous Intelligent Systems, 152.
    [48] H. Jaeger and H. Haas, Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication, Science, 304 (2004), 78-80.  doi: 10.1126/science.1091277.
    [49] B. Jakubczyk and E. D. Sontag, Controllability of nonlinear discrete-time systems: A Lie-algebraic approach, SIAM J. Control Optim., 28 (1990), 1-33.  doi: 10.1137/0328001.
    [50] W. B. Johnson and J. Lindenstrauss, Extensions of Lipschitz mappings into a Hilbert space, in Conference in Modern Analysis and Probability, Contemp. Math., 26, Amer. Math. Soc., Providence, RI, 1984,189–206. doi: 10.1090/conm/026/737400.
    [51] R. E. Kalman, Canonical structure of linear dynamical systems, Proc. Nat. Acad. Sci. U.S.A., 48 (1962), 596-600.  doi: 10.1073/pnas.48.4.596.
    [52] R. E. Kalman, Lectures on controllability and observability, in Controllability and Observability, C.I.M.E. Summer Sch., 46, Springer, Heidelberg, 2010, 1–149. doi: 10.1007/978-3-642-11063-4_1.
    [53] R. E. Kalman, A new approach to linear filtering and prediction problems, Trans. ASME Ser. D. J. Basic Engrg., 82 (1960), 35-45.  doi: 10.1115/1.3662552.
    [54] R. E. Kalman and J. E. Bertram, General synthesis procedure for computer control of single-loop and multiloop linear systems (an optimal sampling system), Trans. Amer. Inst. Electrical Engineers, Part II: Appl. Industry, 77 (1959), 602-609.  doi: 10.1109/TAI.1959.6371508.
    [55] R. E. Kalman and J. E. Bertram, A unified approach to the theory of sampling systems, J. Franklin Inst., 267 (1959), 405-436.  doi: 10.1016/0016-0032(59)90093-6.
    [56] R. E. Kalman and R. S. Bucy, New results in linear filtering and prediction theory, Trans. ASME Ser. D. J. Basic Engrg., 83 (1961), 95-108.  doi: 10.1115/1.3658902.
    [57] P. E. Kloeden and M. Rasmussen, Nonautonomous Dynamical Systems, Mathematical Surveys and Monographs, 176, American Mathematical Society, Providence, RI, 2011. doi: 10.1090/surv/176.
    [58] I. Kolář, P. W. Michor and J. Slovák, Natural Operations in Differential Geometry, Springer-Verlag, Berlin, 1993. doi: 10.1007/978-3-662-02950-3.
    [59] B. Kostant, Orbits, symplectic structures and representation theory, in Proc. U.S.-Japan Seminar in Differential Geometry, Nippon Hyoronsha, Tokyo, 1966.
    [60] P. D. Lax, Functional Analysis, Pure and Applied Mathematics, Wiley-Interscience, New York, 2002.
    [61] A. Lewis, A brief on controllability of nonlinear systems, 2002.
    [62] A. Lindquist and G. Picci, Linear Stochastic Systems. A Geometric Approach to Modeling, Estimation and Identification, Series in Contemporary Mathematics, 1, Springer, Heidelberg, 2015.
    [63] L. LiviF. M. Bianchi and C. Alippi, Determination of the edge of criticality in echo state networks through Fisher information maximization, IEEE Trans. Neural Netw. Learn. Syst., 29 (2018), 706-717.  doi: 10.1109/TNNLS.2016.2644268.
    [64] Z. Lu, B. R. Hunt and E. Ott, Attractor reconstruction by machine learning, Chaos, 28 (2018), 9pp. doi: 10.1063/1.5039508.
    [65] M. Lukoševičius and H. Jaeger, Reservoir computing approaches to recurrent neural network training, Comput. Sci. Rev., 3 (2009), 127-149.  doi: 10.1016/j.cosrev.2009.03.005.
    [66] W. Maass, Liquid state machines: Motivation, theory, and applications, in Computability in Context, Imp. Coll. Press, London, 2011,275-296. doi: 10.1142/9781848162778_0008.
    [67] W. MaassT. Natschläger and H. Markram, Real-time computing without stable states: A new framework for neural computation based on perturbations, Neural Comput., 14 (2002), 2531-2560.  doi: 10.1162/089976602760407955.
    [68] G. Manjunath, P. Tiňo and H. Jaeger, Theory of input driven dynamical systems, ESANN 2012 Proceedings, 20th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, 1–12.
    [69] J. Marsden and A. Weinstein, Reduction of symplectic manifolds with symmetry, Rep. Mathematical Phys., 5 (1974), 121-130.  doi: 10.1016/0034-4877(74)90021-4.
    [70] J. E. Marsden, G. Misiołek, J.-P. Ortega, M. Perlmutter and T. S. Ratiu, Hamiltonian Reduction by Stages, Lecture Notes in Mathematics, 1913, Springer, Berlin, 2007. doi: 10.1007/978-3-540-72470-4.
    [71] S. Marzen, Difference between memory and prediction in linear recurrent networks, Phys. Rev. E, 96 (2017), 1-7.  doi: 10.1103/PhysRevE.96.032308.
    [72] M. B. Matthews, On the Uniform Approximation of Nonlinear Discrete-Time Fading-Memory Systems Using Neural Network Models, Ph.D thesis, ETH Zürich, 1992. Available from: https://www.research-collection.ethz.ch:443/handle/20.500.11850/140592.
    [73] M. B. Matthews and G. S. Moschytz, The identification of nonlinear discrete-time fading-memory systems using neural network models, IEEE Trans. Circuits Syst. II: Analog and Digital Signal Process., 41 (1994), 740-751.  doi: 10.1109/82.331544.
    [74] M. Mohri, A. Rostamizadeh and A. Tawalkar, c, 2$^{nd}$ edition, Adaptive Computation and Machine Learning, MIT Press, Cambridge, MA, 2018.
    [75] K. S. Narendra and K. Parthasarathy, Identification and control of dynamical systems using neural networks, IEEE Trans. Neural Networks, 1 (1990), 4-27.  doi: 10.1109/72.80202.
    [76] H. Nijmeijer and A. van der Schaft, Controlled invariance for nonlinear systems, IEEE Trans. Automat. Control, 27 (1982), 904-914.  doi: 10.1109/TAC.1982.1103025.
    [77] D. Normand-Cyrot, Théorie et Pratique des Systèmes Non Linéaires en Temps Discret, Ph.D thesis, Université Paris-Sud, 1983.
    [78] T. Ohsawa, Symmetry reduction of optimal control systems and principal connections, SIAM J. Control Optim., 51 (2013), 96-120.  doi: 10.1137/110835219.
    [79] J.-P. Ortega, The symplectic reduced spaces of a Poisson action, C. R. Math. Acad. Sci. Paris, 334 (2002), 999-1004.  doi: 10.1016/S1631-073X(02)02394-4.
    [80] J.-P. Ortega and T. S. Ratiu, Momentum Maps and Hamiltonian Reduction, Progress in Mathematics, 222, Birkhäuser Boston, Inc., Boston, MA, 2004. doi: 10.1007/978-1-4757-3811-7.
    [81] J.-P. Ortega and T. S. Ratiu, The optimal momentum map, in Geometry, Mechanics, and Dynamics, Springer, New York, 2002,329–362. doi: 10.1007/0-387-21791-6_11.
    [82] R. Pascanu, C. Gulcehre, K. Cho and Y. Bengio, How to construct deep recurrent neural networks, preprint, arXiv: 1312.6026.
    [83] J. Pathak, B. Hunt, M. Girvan, Z. Lu and E. Ott, Model-free prediction of large spatiotemporally chaotic systems from data: A reservoir computing approach, Phys. Rev. Lett., 120 (2018). doi: 10.1103/PhysRevLett.120.024102.
    [84] J. Pathak, A. Wikner, R. Fussell, S. Chandra, B. R. Hunt, M. Girvan and E. Ott, Hybrid forecasting of chaotic processes: Using machine learning in conjunction with a knowledge-based model, \emphChaos, 28 (2018), 9pp. doi: 10.1063/1.5028373.
    [85] A. Rahimi and B. Recht, Random features for large-scale kernel machines, Advances in Neural Information. Available from: http://people.eecs.berkeley.edu/ brecht/papers/07.rah.rec.nips.pdf.
    [86] S. Särkkä, Bayesian Filtering and Smoothing, Institute of Mathematical Statistics Textbooks, 3, Cambridge University Press, Cambridge, 2013. doi: 10.1017/CBO9781139344203.
    [87] B. Schölkopf and A. J. Smola, Learning with Kernels, MIT Press, 2002.
    [88] S. Smale, Topology and mechanics. I., Invent. Math., 10 (1970), 305-331.  doi: 10.1007/BF01418778.
    [89] E. D. Sontag, Mathematical Control Theory. Deterministic Finite-Dimensional Systems, 2$^{nd}$ edition, Texts in Applied Mathematics, 6, Springer-Verlag, New York, 1998. doi: 10.1007/978-1-4612-0577-7.
    [90] E. D. Sontag, Realization theory of discrete-time nonlinear systems. I. The bounded case, IEEE Trans. Circuits and Systems, 26 (1979), 342-356.  doi: 10.1109/TCS.1979.1084646.
    [91] J.-M. Souriau, Quantification géométrique, Comm. Math. Phys., 1 (1966), 374-398. 
    [92] J.-M. Souriau, Structure des Systèmes Dynamiques, Dunod, Paris, 1970.
    [93] P. Stefan, Accessibility and foliations with singularities, Bull. Amer. Math. Soc., 80 (1974), 1142-1145.  doi: 10.1090/S0002-9904-1974-13648-7.
    [94] P. Stefan, Accessible sets, orbits, and foliations with singularities, Proc. London Math. Soc. (3), 29 (1974), 699-713. doi: 10.1112/plms/s3-29.4.699.
    [95] H. J. Sussmann, Orbits of families of vector fields and integrability of distributions, Trans. Amer. Math. Soc., 180 (1973), 171-188.  doi: 10.1090/S0002-9947-1973-0321133-2.
    [96] P. Tiňo, Asymptotic Fisher memory of randomized linear symmetric Echo State Networks, Neurocomputing, 298 (2018), 4-8.  doi: 10.1016/j.neucom.2017.11.076.
    [97] P. Tino and A. Rodan, Short term memory in input-driven linear dynamical systems, Neurocomputing, 112 (2013), 58-63.  doi: 10.1016/j.neucom.2012.12.041.
    [98] A. van der Schaft, Symmetries and conservation laws for Hamiltonian systems with inputs and outputs: A generalization of Noether's theorem, Systems Control Lett., 1 (1981/82), 108-115.  doi: 10.1016/S0167-6911(81)80046-1.
    [99] A. J. van der Schaft, Symmetries in optimal control, SIAM J. Control Optim., 25 (1987), 245-259.  doi: 10.1137/0325015.
    [100] P. Verzelli, C. Alippi and L. Livi, Echo state networks with self-normalizing activations on the hyper-sphere, Scientific Rep., 9 (2019). doi: 10.1038/s41598-019-50158-4.
    [101] O. L. White, D. D. Lee and H. Sompolinsky, Short-term memory in orthogonal neural networks, Phys. Rev. Lett., 92 (2004). doi: 10.1103/PhysRevLett.92.148102.
    [102] F. Wyffels and B. Schrauwen, A comparative study of Reservoir Computing strategies for monthly time series prediction, Neurocomputing, 73 (2010), 1958-1964.  doi: 10.1016/j.neucom.2010.01.016.
    [103] F. Wyffels, B. Schrauwen and D. Stroobandt, Using reservoir computing in a decomposition approach for time series prediction, 2008.
    [104] F. Xue, Q. Li and X. Li, The combination of circle topology and leaky integrator neurons remarkably improves the performance of echo state network on time series prediction., PLoS one, 12 (2017). doi: 10.1371/journal.pone.0181816.
    [105] K. I. Yared and N. R. Sandell, Maximum likelihood identification of state space models for linear dynamic systems, Electronic Systems Laboratory, Dept. of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, R-814. Available from: http://dspace.mit.edu/handle/1721.1/1297.
    [106] W. Zaremba, I. Sutskever and O. Vinyals, Recurrent neural network regularization, preprint, arXiv: 1409.2329.
  • 加载中

Article Metrics

HTML views(324) PDF downloads(238) Cited by(0)

Access History

Other Articles By Authors



    DownLoad:  Full-Size Img  PowerPoint