\`x^2+y_1+z_12^34\`
Advanced Search
Article Contents
Article Contents

Manifold learning for accelerating coarse-grained optimization

  • * Corresponding author

    * Corresponding author
Abstract Full Text(HTML) Figure(18) Related Papers Cited by
  • Algorithms proposed for solving high-dimensional optimization problems with no derivative information frequently encounter the "curse of dimensionality, " becoming ineffective as the dimension of the parameter space grows. One feature of a subclass of such problems that are effectively low-dimensional is that only a few parameters (or combinations thereof) are important for the optimization and must be explored in detail. Knowing these parameters/combinations in advance would greatly simplify the problem and its solution. We propose the data-driven construction of an effective (coarse-grained, "trend") optimizer, based on data obtained from ensembles of brief simulation bursts with an "inner" optimization algorithm, that has the potential to accelerate the exploration of the parameter space. The trajectories of this "effective optimizer" quickly become attracted onto a slow manifold parameterized by the few relevant parameter combinations. We obtain the parameterization of this low-dimensional, effective optimization manifold on the fly using data mining/manifold learning techniques on the results of simulation (inner optimizer iteration) burst ensembles and exploit it locally to "jump" forward along this manifold. As a result, we can bias the exploration of the parameter space towards the few, important directions and, through this "wrapper algorithm, " speed up the convergence of traditional optimization algorithms.

    Mathematics Subject Classification: Primary: 37N40; Secondary: 90C56.

    Citation:

    \begin{equation} \\ \end{equation}
  • 加载中
  • Figure 1.  Evolution in time of the probability density of current points using either the Langevin equation or SA at constant $T$. The objective function is $f(x) = 0.5x^2$; $10^4$ realizations are used with starting point $x = 1$, $T = 0.5$; and the time step is $dt = 10^{-3}$

    Figure 2.  Applying Diffusion Maps to the Swiss roll data set

    Figure 3.  One complete run of the algorithm that approaches the global maximum. Six total "coarse iterations" (shown in red) were performed. In addition, a single run of SA using the same number of function evaluations is shown in green. Our algorithm visibly approaches the maximum much faster

    Figure 4.  A comparison of the evolving maximum objective value for both methods. SA combined with DMaps needs only a fraction of the total function evaluations compared to simple SA

    Figure 5.  Diffusion Maps applied to bursts on a rectangular strip]{Diffusion maps applied to (A) the original 2D data with Euclidean distances, and (B) the transformed data with Mahalanobis distances. In both cases, the inherent dimensionality is recovered in the first two eigenvectors, as the coloring of the data by the leading Diffusion Map coordinates corresponding to these two eigenvectors show

    Figure 6.  Estimation of the first drift coefficient $ \theta_1 $. "Theoretical" are obtained numerically from (5), "Euclidean" via DMaps on the original data, and "Mahalanobis" from DMaps on the transformed data. The last subplot shows results from Euclidean DMaps on transformed data, which, as expected, yields incorrect estimates

    Figure 7.  Estimation of the second drift coefficient $ \theta_2 $. The subplots are analogous to those in Figure 6

    Figure 8.  Coarse-grained (two-dimensional) optimization on a cylindrical surface in three dimensions. One complete run of the algorithm, illustrating the short bursts of trajectories and new points after being lifted by geometric harmonics. Observe that, although the lifting does not perfectly locate the cylinder, the burst trajectories are quickly attracted back to it. The second plot is a top-down view of the first

    Figure 9.  Simulation of a short trajectory initialized at perturbed initial conditions $ (x_{ic}, \bf{y}_{ic}) $

    Figure 10.  The first diffusion coordinate parameterizes $ x $

    Figure 11.  Estimation of the drift coefficient in diffusion map space

    Figure 12.  Estimation of the diffusion coefficient in diffusion map space

    Figure 13.  Nonlinear transformation of the slow variable $ x $ onto a semicircle in the plane

    Figure 14.  Embeddings of the original and transformed data set

    Figure 15.  Estimation of the drift coefficient in Diffusion Map space

    Figure 16.  Estimation of the diffusion coefficient in Diffusion Map space

    Figure 17.  Estimation of the drift coefficient in diffusion map space

    Figure 18.  Estimation of the diffusion coefficient in diffusion map space

  • [1] Y. Aït-Sahalia, Maximum likelihood estimation of discretely sampled diffusions: A closed-form approximation approach, Econometrica, 70 (2002), 223-262.  doi: 10.1111/1468-0262.00274.
    [2] Y. Aït-Sahalia, Closed-form likelihood expansions for multivariate diffusions, Ann. Statist., 36 (2008), 906-937.  doi: 10.1214/009053607000000622.
    [3] R. Barton, Metamodeling: A state of the art review, Proc. Winter Simul. Conf., (1994), 237–244. doi: 10.1109/WSC.1994.717134.
    [4] K. ChanG. KarolyiF. Longstaff and A. Sanders, An empirical comparison of alternative models of the short-term interest rate, J. Finance, 47 (1992), 1209-1227.  doi: 10.1111/j.1540-6261.1992.tb04011.x.
    [5] E. ChiavazzoC. GearC. DsilvaN. Rabin and I. Kevrekidis, Reduced models in chemical kinetics via nonlinear data-mining, Processes, 2 (2014), 112-140.  doi: 10.3390/pr2010112.
    [6] R. R. Coifman and S. Lafon, Diffusion maps, Appl. Comput. Harmon. Anal., 21 (2006), 5-30.  doi: 10.1016/j.acha.2006.04.006.
    [7] R. R. Coifman and S. Lafon, Geometric harmonics: A novel tool for multiscale out-of-sample extension of empirical functions, Appl. Comput. Harmon. Anal., 21 (2006), 31-52.  doi: 10.1016/j.acha.2005.07.005.
    [8] R. CoifmanS. LafonA. LeeM. MaggioniB. NadlerF. Warner and S. Zucker, Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps, Proc. Nat. Acad. Sci. USA, 102 (2005), 7426-7431.  doi: 10.1073/pnas.0500334102.
    [9] A. R. Conn, N. I. M. Gould and P. L. Toint, Trust-Region Methods, MPS/SIAM Series on Optimization, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2000. doi: 10.1137/1.9780898719857.
    [10] A. R. Conn, K. Scheinberg and L. N. Vicente, Introduction to Derivative-Free Optimization, MPS/SIAM Series on Optimization, 8, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2009. doi: 10.1137/1.9780898718768.
    [11] C. J. DsilvaR. TalmonR. R. Coifman and I. G. Kevrekidis, Parsimonious representation of nonlinear dynamical systems through manifold learning: A chemotaxis case study, Appl. Comput. Harmon. Anal., 44 (2018), 759-773.  doi: 10.1016/j.acha.2015.06.008.
    [12] C. J. Dsilva, R. Talmon, N. Rabin, R. R. Coifman and I. G. Kevrekidis, Nonlinear intrinsic variables and state reconstruction in multiscale simulations, J. Chemical Phys., 139 (2013). doi: 10.1063/1.4828457.
    [13] A. Duncan, G. Pavliotis and K. Zygalakis, Nonreversible langevin samplers: Splitting schemes, analysis and implementation, preprint, arXiv: 1701.04247.
    [14] R. Eberhart and J. Kennedy, A new optimizer using particle swarm theory, Proc. Sixth Internat. Symposium Micro Machine Human Science, Nagoya, Japan, 1995, 39–43. doi: 10.1109/MHS.1995.494215.
    [15] M. Fathi and G. Stoltz, Improving dynamical properties of metropolized discretizations of overdamped Langevin dynamics, Numer. Math., 136 (2017), 545-602.  doi: 10.1007/s00211-016-0849-3.
    [16] C. W. GearD. Givon and I. G. Kevrekidis, Computing on virtual slow manifolds of fast stochastic systems, JNAIAM. J. Numer. Anal. Ind. Appl. Math., 5 (2010), 61-72. 
    [17] C. W. Gear and I. G. Kevrekidis, Projective methods for stiff differential equations: Problems with gaps in their eigenvalue spectrum, SIAM J. Sci. Comput., 24 (2003), 1091-1106.  doi: 10.1137/S1064827501388157.
    [18] S. Geman and C.-R. Hwang, Diffusions for global optimization, SIAM J. Control Optim., 24 (1986), 1031-1043.  doi: 10.1137/0324060.
    [19] B. Gidas, Global optimization via the langevin equation, 24th IEEE Conference on Decision and Control, Ft. Lauderdale, FL, 1985,774–778. doi: 10.1109/CDC.1985.268602.
    [20] J. Gradišek, S. Siegert, R. Friedrich and I. Grabec, Analysis of time series from stochastic processes, Phys. Rev. E, 62 (2000).
    [21] L. Hansen, Large sample properties of generalized method of moments estimators, Econometrica, 50 (1982), 1029-1054.  doi: 10.2307/1912775.
    [22] J. H. Holland, Adaptation in Natural and Artificial Systems. An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence, University of Michigan Press, Ann Arbor, Mich., 1975.
    [23] R. Hooke and T. Jeeves, "Direct search" solution of numerical and statistical problems, J. ACM, 8 (1961), 212-229.  doi: 10.1145/321062.321069.
    [24] W. Huyer and A. Neumaier, Global optimization by multilevel coordinate search, J. Global Optim., 14 (1999), 331-355.  doi: 10.1023/A:1008382309369.
    [25] I. T. Jolliffe, Principal Component Analysis, Springer Series in Statistics. Springer-Verlag, New York, 1986. doi: 10.1007/978-1-4757-1904-8.
    [26] D. R. Jones, A taxonomy of global optimization methods based on response surfaces, J. Global Optim., 21 (2001), 345-383.  doi: 10.1023/A:1012771025575.
    [27] D. R. JonesC. D. Perttunen and B. E. Stuckman, Lipschitzian optimization without the Lipschitz constant, J. Optim. Theory Appl., 79 (1993), 157-181.  doi: 10.1007/BF00941892.
    [28] S. KalliadasisS. Krumscheid and G. A. Pavliotis, A new framework for extracting coarse-grained models from time series with multiscale structure, J. Comput. Phys., 296 (2015), 314-328.  doi: 10.1016/j.jcp.2015.05.002.
    [29] C. Kelley, Iterative Methods for Optimization, Frontiers in Applied Mathematics, 18, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1999. doi: 10.1137/1.9781611970920.
    [30] I. G. KevrekidisC. W. GearJ. M. HymanP. G. KevrekidisO. Runborg and C. Theodoropoulos, Equation-free, coarse-grained multiscale computation: Enabling mocroscopic simulators to perform system-level analysis, Commun. Math. Sci., 1 (2003), 715-762.  doi: 10.4310/CMS.2003.v1.n4.a5.
    [31] S. KirkpatrickC. D. Gelatt and M. P. Vecchi, Optimization by simulated annealing, Science, 220 (1983), 671-680.  doi: 10.1126/science.220.4598.671.
    [32] S. KrumscheidG. A. Pavliotis and S. Kalliadasis, Semiparametric drift and diffusion estimation for multiscale diffusions, Multiscale Model. Simul., 11 (2013), 442-473.  doi: 10.1137/110854485.
    [33] S. Krumscheid, M. Pradas, G. Pavliotis and S. Kalliadasis, Data-driven coarse graining in action: Modeling and prediction of complex systems, Phys. Rev. E, 92 (2015). doi: 10.1103/PhysRevE.92.042139.
    [34] S. LafonY. Keller and R. Coifman, Data fusion and multicue data matching by diffusion maps, IEEE Transac. Pattern Anal. Machine Intelligence, 28 (2006), 1784-1797.  doi: 10.1109/TPAMI.2006.223.
    [35] G. LiC. Rosenthal and H. Rabitz, High dimensional model representations, J. Phys. Chem. A, 105 (2001), 7765-7777.  doi: 10.1021/jp010450t.
    [36] M. Locatelli, Simulated annealing algorithms for continuous global optimization, in Handbook of Global Optimization, Nonconvex Optim. Appl., 62, Kluwer Acad. Publ., Dordrecht, 2002,179–229. doi: 10.1007/978-1-4757-5362-2_6.
    [37] I. Melbourne and A. M. Stuart, A note on diffusion limits of chaotic skew-product flows, Nonlinearity, 24 (2011), 1361-1367.  doi: 10.1088/0951-7715/24/4/018.
    [38] N. MetropolisA. RosenbluthM. RosenbluthA. Teller and E. Teller, Equation of state calculations by fast computing machines, J. Phys. Chem., 21 (1953), 1087-1092.  doi: 10.2172/4390578.
    [39] M. MontgomeryR. Meglen and N. Damrauer, General method for the dimension reduction of adaptive control experiments, J. Phys. Chem. A, 110 (2006), 6391-6394.  doi: 10.1021/jp061160l.
    [40] M. MontgomeryR. Meglen and N. Damrauer, General method for reducing adaptive laser pulse-shaping experiments to a single control variable, J. Phys. Chem. A, 111 (2007), 5126-5129.  doi: 10.1021/jp073132o.
    [41] J. A. Nelder and R. Mead, A simplex method for function minimization, Comput. J., 7 (1965), 308-313.  doi: 10.1093/comjnl/7.4.308.
    [42] B. Øksendal, Stochastic Differential Equations, Universitext, Springer-Verlag, Berlin, 2003. doi: 10.1007/978-3-642-14394-6.
    [43] A. Papavasiliou, Particle filters for multiscale diffusions, in Conference Oxford sur les méthodes de Monte Carlo séquentielles, ESAIM Proc., 19, EDP Sci., Les Ulis, 2007,108–114. doi: 10.1051/proc:071914.
    [44] G. A. Pavliotis, Stochastic Processes and Applications. Diffusion Processes, the Fokker-Planck and Langevin Equations, Texts in Applied Mathematics, 60, Springer, New York, 2014. doi: 10.1007/978-1-4939-1323-7.
    [45] G. A. Pavliotis and A. M. Stuart, Parameter estimation for multiscale diffusions, J. Stat. Phys., 127 (2007), 741-781.  doi: 10.1007/s10955-007-9300-6.
    [46] L. M. Rios and N. V. Sahinidis, Derivative-free optimization: A review of algorithms and comparison of software implementations, J. Global Optim., 56 (2013), 1247-1293.  doi: 10.1007/s10898-012-9951-y.
    [47] J. Roslund and H. Rabitz, Dynamic dimensionality identification for quantum control, Phys. Rev. Lett., 112 (2014). doi: 10.1103/PhysRevLett.112.143001.
    [48] C. Schillings and A. M. Stuart, Analysis of the ensemble Kalman filter for inverse problems, SIAM J. Numer. Anal., 55 (2017), 1264-1290.  doi: 10.1137/16M105959X.
    [49] S. Shan and G. G. Wang, Survey of modeling and optimization strategies to solve high-dimensional design problems with computationally-expensive black-box functions, Struct. Multidiscip. Optim., 41 (2010), 219-241.  doi: 10.1007/s00158-009-0420-2.
    [50] A. Singer and R. R. Coifman, Non-linear independent component analysis with diffusion maps, Appl. Comput. Harmon. Anal., 25 (2008), 226-239.  doi: 10.1016/j.acha.2007.11.001.
    [51] E. Vanden-Eijnden, Numerical techniques for multi-scale dynamical systems with stochastic effects, Commun. Math. Sci., 1 (2003), 385-391.  doi: 10.4310/CMS.2003.v1.n2.a11.
    [52] E. WeinanB. EngquistX. LiW. Ren and E. Vanden-Eijnden, Heterogeneous multiscale methods: A review, Commun. Comput. Phys., 2 (2007), 367-450. 
  • 加载中

Figures(18)

SHARE

Article Metrics

HTML views(813) PDF downloads(234) Cited by(0)

Access History

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return