\`x^2+y_1+z_12^34\`
Advanced Search
Article Contents
Article Contents

Synchronizing pretrained kernel regressors with applications to American option pricing

Abstract / Introduction Full Text(HTML) Figure(2) / Table(2) Related Papers Cited by
  • American option pricing is a well-studied, computationally-heavy optimal stopping problem, which has motivated the need for light and accurate computational pipelines. Currently, the state-of-the-art light American option prices are built on randomly generated kernel regressors such as random neural networks; however, a single random (or pre-trained) kernel can yield unstable performance on new tasks. With this motivation, we study the problem of fine-tuning several pre-trained kernel regressors, trained on distinct datasets to a given novel dataset with a downstream view toward light American option pricing. This paper addresses the meta-optmization problem of the task of designing optimal transfer learning algorithms, which minimize a pathwise regret functional. Using techniques from optimal control, we construct the unique regret-optimal optimization algorithms for fine-tuning mixtures of pre-trained kernel regressors on $ \mathbb{R}^d $ sharing the same feature map. Our regret functional balances the objects of predictive power on the new dataset against transfer learning from other datasets under an algorithmic stability penalty. We show that an adversary which perturbs a proportion $ 0\leq q\leq 1 $ of training pairs by at most $ \varepsilon>0 $, across all training sets, cannot reduce the regret-optimal algorithm's regret by more than $ \mathcal{O}(\varepsilon (q \bar{N} )^{1/2} ) $, where $ \bar{N} $ is the aggregate number of training pairs. Further, we derive estimates of the computational complexity of our regret-optimal optimization algorithm. Our theoretical findings are used to improve state-of-the-art (SOTA) in computationally light American option pricing problems using random feature models, where we achieve SOTA performance compared to the benchmark light American option pricer of [32].

    Mathematics Subject Classification: 91-10, 49N90, 68T01.

    Citation:

    \begin{equation} \\ \end{equation}
  • 加载中
  • Figure 1.  Loss and energy $ {\mathcal L} $ for gradient descent (GD), the regret-optimal algorithm (RO), and the accelerated regret-optimal algorithm (ARO). Means and standard deviations over 10 runs are shown. Left: all iterations; middle: first 1000 iterations; right: iterations 500 to 1000 for our algorithms

    Figure 2.  Loss and energy $ {\mathcal L} $ for the regret-optimal algorithm (RO) and the accelerated regret-optimal algorithm (ARO) with means and standard deviations over $ 10 $ runs. Left: $ \lambda = 10^{-4} $; right: $ \lambda = 2 $

    Table 1.  Ablation Results: American Option Pricing. Relative performance (RP) and $ 95\% $-confidence-intervals for different optimization methods. We compare the local optimizers on the different datasets (LO-$ n $) with the mean local optimizer (MLO), the joint optimizer (JO), and our regret-optimal method (RO). The oracle local optimizer on the main dataset with additional training samples (standard: $ 100 $ samples per dataset) is included. See Table 2 for additional details

    (A) Experiment $ 1 $ (compressed) (b) Experiment 2
    method $ RP $ $ 95\% $-CI method $ RP $ $ 95\% $-CI
    LO-$ 1 $ 1.000 $ [0.991; 1.009] $ LO-$ 1 $ 1.000 [0.993; 1.007]
    LO-2, $ \dotsc $, LO-7 $ \geq 0.966 $ $ [\geq 0.957; \leq 0.994] $ LO-2 0.773 [0.765; 0.782]
    LO-8, $ \dotsc $, LO-13 $ \leq 0.725 $ $ [\geq 0.696; \leq 0.736] $ LO-3 1.003 [0.994; 1.012]
    MLO 0.823 $ [0.813; 0.833] $ MLO 0.932 [0.920; 0.944]
    JO 0.886 $ [0.878; 0.894] $ JO 0.763 [0.754; 0.772]
    JSO-$ \{1, \dotsc, 7\} $ 1.062 $ [1.057; 1.067] $ RO ($ \eta=10 $) 0.993 [0.985; 1.000]
    RO ($ \eta=100 $) 1.090 [1.086; 1.095] RO ($ \eta=50 $) 1.040 [1.034; 1.046]
    LO-1 (700 tr. samp.) 1.100 [1.097; 1.104] RO ($ \eta=100$) 1.041 [1.036; 1.047]
    LO-1 (50K tr. samp.) 1.194 [1.192; 1.195]
     | Show Table
    DownLoad: CSV

    Table 2.  Relative performance (RP) and $ 95\% $-confidence-intervals for different optimization methods in Experiment 4.1. We compare the local optimizers on the different datasets (LO-$ n $) with the mean local optimizer (MLO), the joint optimizer (JO), and our regret-optimal method (RO). The oracle local optimizer on the main dataset with additional training samples (standard: $ 100 $ samples per dataset) is included. When the information sharing parameter $ \eta $ is set to $ 100 $, the regret-optimal (RO) algorithm outperforms all "non-oracle" baselines

    method $ RP $ $ 95\% $-CI
    LO-1 1 $[0.991; 1.009]$
    LO-2 0.985 $[0.977; 0.994]$
    LO-3 0.966 $[0.958; 0.973]$
    LO-4 0.975 $[0.966; 0.984]$
    LO-5 0.966 $[0.957; 0.975]$
    LO-6 0.979 $[0.971; 0.988]$
    LO-7 0.978 $[0.971; 0.986]$
    LO-8 0.721 $[0.711; 0.730]$
    LO-9 0.72 $[0.711; 0.729]$
    LO-10 0.706 $[0.696; 0.716]$
    LO-11 0.725 $[0.715; 0.736]$
    LO-12 0.707 $[0.697; 0.716]$
    LO-13 0.718 $[0.708; 0.727]$
    MLO 0.823 $[0.813; 0.833]$
    JO 0.886 $[0.878; 0.894]$
    JO (datasets 1-7) 1.062 $[1.057; 1.067]$
    RO ($\eta=10$) 1 $[0.990; 1.010]$
    RO ($\eta=100$) 1.091 [1.087; 1.095]
    RO ($\eta=500$) 1.05 $[1.043; 1.057]$
    LO-1 (700 tr. samp.) 1.1 $[1.097; 1.104]$
    LO-1 (50K tr. samp.) 1.194 $[1.192; 1.195]$
     | Show Table
    DownLoad: CSV
  • [1] T. Ahrendt, Fast computations of the exponential function, in Christoph Meinel and Sophie Tison, editors, STACS 99, Berlin, Heidelberg, Springer Berlin Heidelberg, (1999), 302-312. doi: 10.1007/3-540-49116-3_28.
    [2] P. Alquier, User-friendly introduction to PAC-Bayes bounds, arXiv preprint, 2025. arXiv: 2110.11216.
    [3] A. A. Amini, Spectrally-truncated kernel ridge regression and its free lunch, Electron. J. Stat., 15 (2021), 3743-3761.  doi: 10.1214/21-EJS1873.
    [4] A. A. Amini, R. Baumgartner and D. Feng, Target alignment in truncated kernel ridge regression, arXiv preprint, 2022. arXiv: 2206.14255.
    [5] A. Argyriou, T. Evgeniou and M. Pontil, Multi-task feature learning, Advances in Neural Information Processing Systems, 19 (2006). doi: 10.7551/mitpress/7503.003.0010.
    [6] L. M. B. Arias and F. Roldán, Four-operator splitting via a forward-backward-half-forward algorithm with line search, J. Optim. Theory Appl., 195 (2022), 205-225.  doi: 10.1007/s10957-022-02074-3.
    [7] T. Başar and G. J. Olsder, Dynamic Noncooperative Game Theory, 2nd edition, Society for Industrial and Applied Mathematics, 1999.
    [8] J. Baxter, A model of inductive bias learning, J. Artificial Intelligence Res., 12 (2000) 149-198. doi: 10.1613/jair.731.
    [9] S. Becker, P. Cheridito and A. Jentzen, Deep optimal stopping, Journal of Machine Learning Research, 20 (2019), Paper No. 74, 25 pp.
    [10] S. BeckerP. CheriditoA. Jentzen and T. Welti, Solving high-dimensional optimal stopping problems using deep learning, European Journal of Applied Mathematics, 32 (2021), 470-514.  doi: 10.1017/S0956792521000073.
    [11] S. BeckerP. CheriditoA. Jentzen and T. Welti, Solving high-dimensional optimal stopping problems using deep learning, European J. Appl. Math., 32 (2021), 470-514.  doi: 10.1017/S0956792521000073.
    [12] D. P. Bertsekas, Convex Optimization Algorithms, Athena Scientific, 2015.
    [13] H. Cao, H. Gu, X. Guo and M. Rosenbaum, Transfer learning for portfolio optimization, arXiv preprint, 2023. arXiv: 2307.13546.
    [14] R. Carmona, F. Delarue, et al., Probabilistic Theory of Mean Field Games with Applications I-II, Springer, 2018.
    [15] P. Casgrain, A latent variational framework for stochastic optimization, Advances in Neural Information Processing Systems, 32 (2019).
    [16] P. Casgrain, A latent variational framework for stochastic optimization, in H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, Curran Associates, Inc., 32 (2019).
    [17] P. Casgrain and A. Kratsios, Optimizing optimizers: Regret-optimal gradient descent algorithms, In Conference on Learning Theory, PMLR, (2021), 883-926.
    [18] T. S. Cheng, A. Lucchi, A. Kratsios, D. Belius and I. Dokmanić, A theoretical analysis of the test error of finite-rank kernel ridge regression, Neural Information Processing Systems, 2023.
    [19] P. L. Combettes and J.-C. Pesquet, Stochastic approximations and perturbations in forward-backward splitting for monotone operators, Pure Appl. Funct. Anal., 1 (2016), 13-37. 
    [20] E. M. Compagnoni, A. Scampicchio, L. Biggio, A. Orvieto, T. Hofmann and J. Teichmann, On the effectiveness of randomized signatures as reservoir for learning rough dynamics, in 2023 International Joint Conference on Neural Networks (IJCNN), IEEE, (2023), 1-8.
    [21] C. CuchieroL. GononL. GrigoryevaJ.-P. Ortega and J. Teichmann, Discrete-time signatures and randomness in reservoir computing, IEEE Transactions on Neural Networks and Learning Systems, 33 (2022), 6321-6330.  doi: 10.1109/TNNLS.2021.3076777.
    [22] J. DuchiE. Hazan and Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization, Journal of Machine Learning Research, 12 (2011), 2121-2159. 
    [23] J. FliegeA. I. F. Vaz and L. N. Vicente, Complexity of gradient descent for multiobjective optimization, Optimization Methods and Software, 34 (2019), 949-959.  doi: 10.1080/10556788.2018.1510928.
    [24] F. L. Gall, Powers of tensors and fast matrix multiplication, in ISSAC 2014—Proceedings of the 39th International Symposium on Symbolic and Algebraic Computation, ACM, New York, (2014), 296-303.
    [25] B. GhorbaniS. MeiT. Misiakiewicz and A. Montanari, Linearized two-layers neural networks in high dimension, The Annals of Statistics, 49 (2021), 1029-1054.  doi: 10.1214/20-AOS1990.
    [26] L. Gonon, Random feature neural networks learn {Black-Scholes} type PDEs without curse of dimensionality, Journal of Machine Learning Research, 24 (2023), Paper No. 189, 51 pp.
    [27] L. Gonon, L. Grigoryeva and J.-P. Ortega, Risk bounds for reservoir computing, Journal of Machine Learning Research, 21 (2020), Paper No. 240, 61 pp.
    [28] L. GononL. Grigoryeva and J.-P. Ortega, Approximation bounds for random neural networks and reservoir systems, Ann. Appl. Probab., 33 (2023), 28-69.  doi: 10.1214/22-AAP1806.
    [29] L. Grigoryeva and J.-P. Ortega, Echo state networks are universal, Neural Networks, 108 (2018), 495-508.  doi: 10.1016/j.neunet.2018.08.025.
    [30] E. Hazan, et al., Introduction to online convex optimization, Foundations and Trends in Optimization, 2 (2016), 157-325.  doi: 10.1561/2400000013.
    [31] J. Heiss, J. Teichmann and H. Wutte, How infinitely wide neural networks benefit from multi-task learning-an exact macroscopic characterization, 2022.
    [32] C. HerreraF. KrachP. Ruyssen and J. Teichmann, Optimal stopping via randomized neural networks, Front. Math. Finance, 3 (2024), 31-77.  doi: 10.3934/fmf.2023022.
    [33] R. A. Horn and C. R. Johnson, Matrix Analysis, Cambridge University Press, 2 edition, 2013.
    [34] S. HouP. KassraieA. KratsiosA. Krausen and J. Rothfuss, Instance-dependent generalization bounds via optimal transport, Journal of Machine Learning Research, 24 (2023), 16815-16865. 
    [35] J. Howard and S. Ruder, Universal language model fine-tuning for text classification, arXiv preprint, 2018. arXiv: 1801.06146.
    [36] R. Hu, Deep learning for ranking response surfaces with applications to optimal stopping problems, Quant. Finance, 20 (2020), 1567-1581.  doi: 10.1080/14697688.2020.1741669.
    [37] M. Huang and X. Yang, Linear quadratic mean field social optimization: Asymptotic solvability and decentralized control, Applied Mathematics & Optimization, 84 (2021), 1969-2010. 
    [38] G. Jeong and H. Y. Kim, Improving financial trading decisions using deep Q-learning: Predicting the number of shares, action strategies, and transfer learning, Expert Systems with Applications, 117 (2019), 125-138.  doi: 10.1016/j.eswa.2018.09.036.
    [39] Y. Jia, M. Johnson, W. Macherey, R. J. Weiss, Y. Cao, C.-C. Chiu, N. Ari, S. Laurenzo and Y. Wu, Leveraging weakly supervised data to improve end-to-end speech-to-text translation, in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, (2019), 7180-7184. doi: 10.1109/ICASSP.2019.8683343.
    [40] A. Khaled, K. Mishchenko and P. Richtarik, Tighter theory for local sgd on identical and heterogeneous data, in Silvia Chiappa and Roberto Calandra, editors, Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, Proceedings of Machine Learning Research, PMLR, 108 (2020), 4519-4529.
    [41] G. Kimeldorf and G. Wahba, Some results on Tchebycheffian spline functions, J. Math. Anal. Appl., 33 (1971), 82-95.  doi: 10.1016/0022-247X(71)90184-3.
    [42] G. S. Kimeldorf and G. Wahba, A correspondence between Bayesian estimation on stochastic processes and smoothing by splines, Ann. Math. Statist., 41 (1970), 495-502.  doi: 10.1214/aoms/1177697089.
    [43] M. Kraus and S. Feuerriegel, Decision support from financial disclosures with deep neural networks and transfer learning, Decision Support Systems, 104 (2017), 38-48.  doi: 10.1016/j.dss.2017.10.001.
    [44] I. Kuzborskij and F. Orabona, Stability and hypothesis transfer learning, in Sanjoy Dasgupta and David McAllester, editors, Proceedings of the 30th International Conference on Machine Learning, Proceedings of Machine Learning Research, Atlanta, Georgia, USA, PMLR, 28 (2013), 942-950.
    [45] D. Kwon, J. Park and S. Hong, Tighter regret analysis and optimization of online federated learning, arXiv preprint, 2023. arXiv: 2205.06491.
    [46] P. D. Lax, Linear Algebra and its Applications, Pure and Applied Mathematics (Hoboken), Wiley-Interscience, Hoboken, NJ, second edition, 2007.
    [47] Q. Li, L. Chen, C. Tai and W. E, Maximum principle based algorithms for deep learning, Journal of Machine Learning Research, 18 (2018), Paper No. 165, 29 pp.
    [48] Q. Li, C. Tai and W. E, Stochastic modified equations and adaptive stochastic gradient algorithms, in International Conference on Machine Learning, PMLR, 70 (2017), 2101-2110.
    [49] H. Lin and M. Reimherr, Smoothness adaptive hypothesis transfer learning, in International Conference on Machine Learning, PMLR, 235 (2024), 30286-30316.
    [50] M. Lukoševičius and H. Jaeger, Reservoir computing approaches to recurrent neural network training, Computer Science Review, 3 (2009), 127-149.  doi: 10.1016/j.cosrev.2009.03.005.
    [51] O. L. Mangasarian, Parallel gradient distribution in unconstrained optimization, SIAM J. Control Optim., 33 (1995), 1916-1925.  doi: 10.1137/S0363012993250220.
    [52] T. Manole and J. Niles-Weed, Sharp convergence rates for empirical optimal transport with smooth costs, Ann. Appl. Probab., 34 (2024), 1108-1135.  doi: 10.1214/23-AAP1986.
    [53] B. McMahan, E. Moore, D. Ramage, S. Hampson and B. A. Y. Arcas, Communication-efficient learning of deep networks from decentralized data, in Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, PMLR, 54 (2017), 1273-1282.
    [54] S. Mei and A. Montanari, The generalization error of random features regression: Precise asymptotics and the double descent curve, Comm. Pure Appl. Math., 75 (2022), 667-766.  doi: 10.1002/cpa.22008.
    [55] Y. Nesterov and A. Nemirovskii, Interior-Point Polynomial Algorithms in Convex Programming, Society for Industrial and Applied Mathematics, 1994.
    [56] K. L. Pavasovic, J. Rothfuss and A. Krause, Mars: Meta-learning as score matching in the function space, arXiv preprint, 2022. arXiv: 2210.13319.
    [57] G. Peskir and A. Shiryaev, Optimal Stopping and Free-Boundary Problems, Lectures in Mathematics ETH Zürich. Birkhäuser Verlag, Basel, 2006.
    [58] Z. Ren and Y. J. Lee, Cross-domain self-supervised multi-task feature learning using synthetic imagery, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2018), 762-771. doi: 10.1109/CVPR.2018.00086.
    [59] J. Rothfuss, V. Fortuin, M. Josifoski and A. Krause, PACOH: Bayes-optimal meta-learning with PAC-guarantees, in Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, Proceedings of Machine Learning Research, PMLR, 139 (2021), 9116-9126.
    [60] J. Rothfuss, M. Josifoski, V. Fortuin and A. Krause, Scalable PAC-Bayesian meta-learning via the PAC-optimal hyper-posterior: From theory to practice, Journal of Machine Learning Research, 24 (2023), Paper No. 386, 62 pp.
    [61] O. Sener and V. Koltun, Multi-task learning as multi-objective optimization, in S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, Curran Associates, Inc., 31 (2018).
    [62] J. L. Snell, Applications of martingale system theorems, Trans. Amer. Math. Soc., 73 (1952), 293-312.  doi: 10.1090/S0002-9947-1952-0050209-9.
    [63] Y. Tian, Y. Gu and Y. Feng, Learning from similar linear representations: Adaptivity, minimaxity, and robustness, Journal of Machine Learning Research, 26 (2025), Paper No. 187,125 pp.
    [64] C. Villani, Optimal Transport: Old and New, volume 338 of Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], Springer-Verlag, Berlin, 2009.
    [65] R. WangC. Hyndmann and A. Kratsios, The entropic measure transform, Canad. J. Statist., 48 (2020), 97-129.  doi: 10.1002/cjs.11537.
    [66] X. Wang, J. B. Oliva, J. Schneider and B. Póczos, Nonparametric risk and stability analysis for multi-task learning problems, IJCAI'16, AAAI Press, (2016), 2146-2152.
    [67] Y. WeiF. Yang and M. J. Wainwright, Early stopping for kernel boosting algorithms: A general analysis with localized complexities, IEEE Trans. Inform. Theory, 65 (2019), 6685-6703. 
    [68] K. WeissT. M. Khoshgoftaar and D. Wang, A survey of transfer learning, Journal of Big Data, 3 (2016), 1-40.  doi: 10.1186/s40537-016-0043-6.
    [69] Y. XueX. LiaoL. Carin and B. Krishnapuram, Multi-task learning for classification with Dirichlet process priors, Journal of Machine Learning Research, 8 (2007), 35-63. 
    [70] Y. YaoL. Rosasco and A. Caponnetto, On early stopping in gradient descent learning, Constr. Approx., 26 (2007), 289-315.  doi: 10.1007/s00365-006-0663-2.
    [71] J. Yosinski, J. Clune, Y. Bengio and H. Lipson, How transferable are features in deep neural networks?, Advances in Neural Information Processing Systems, 27 (2014).
    [72] M. D. Zeiler and R. Fergus, Visualizing and understanding convolutional networks, in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I 13, Springer, (2014), 818-833. doi: 10.1007/978-3-319-10590-1_53.
    [73] T. Zhang and B. Yu, Boosting with early stopping: Convergence and consistency, Ann. Statist., 33 (2005), 1538-1579.  doi: 10.1214/009053605000000255.
    [74] J. Zhao, A. Lucchi, F. N. Proske, A. Orvieto and H. Kersting, Batch size selection by stochastic optimal control, in Has it Trained Yet? NeurIPS 2022 Workshop, 2022.
  • 加载中
Open Access Under a Creative Commons license

Figures(2)

Tables(2)

SHARE

Article Metrics

HTML views(1200) PDF downloads(149) Cited by(0)

Access History

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return