|
[1]
|
T. Ahrendt, Fast computations of the exponential function, in Christoph Meinel and Sophie Tison, editors, STACS 99, Berlin, Heidelberg, Springer Berlin Heidelberg, (1999), 302-312.
doi: 10.1007/3-540-49116-3_28.
|
|
[2]
|
P. Alquier, User-friendly introduction to PAC-Bayes bounds, arXiv preprint, 2025. arXiv: 2110.11216.
|
|
[3]
|
A. A. Amini, Spectrally-truncated kernel ridge regression and its free lunch, Electron. J. Stat., 15 (2021), 3743-3761.
doi: 10.1214/21-EJS1873.
|
|
[4]
|
A. A. Amini, R. Baumgartner and D. Feng, Target alignment in truncated kernel ridge regression, arXiv preprint, 2022. arXiv: 2206.14255.
|
|
[5]
|
A. Argyriou, T. Evgeniou and M. Pontil, Multi-task feature learning, Advances in Neural Information Processing Systems, 19 (2006).
doi: 10.7551/mitpress/7503.003.0010.
|
|
[6]
|
L. M. B. Arias and F. Roldán, Four-operator splitting via a forward-backward-half-forward algorithm with line search, J. Optim. Theory Appl., 195 (2022), 205-225.
doi: 10.1007/s10957-022-02074-3.
|
|
[7]
|
T. Başar and G. J. Olsder, Dynamic Noncooperative Game Theory, 2nd edition, Society for Industrial and Applied Mathematics, 1999.
|
|
[8]
|
J. Baxter, A model of inductive bias learning, J. Artificial Intelligence Res., 12 (2000) 149-198.
doi: 10.1613/jair.731.
|
|
[9]
|
S. Becker, P. Cheridito and A. Jentzen, Deep optimal stopping, Journal of Machine Learning Research, 20 (2019), Paper No. 74, 25 pp.
|
|
[10]
|
S. Becker, P. Cheridito, A. Jentzen and T. Welti, Solving high-dimensional optimal stopping problems using deep learning, European Journal of Applied Mathematics, 32 (2021), 470-514.
doi: 10.1017/S0956792521000073.
|
|
[11]
|
S. Becker, P. Cheridito, A. Jentzen and T. Welti, Solving high-dimensional optimal stopping problems using deep learning, European J. Appl. Math., 32 (2021), 470-514.
doi: 10.1017/S0956792521000073.
|
|
[12]
|
D. P. Bertsekas, Convex Optimization Algorithms, Athena Scientific, 2015.
|
|
[13]
|
H. Cao, H. Gu, X. Guo and M. Rosenbaum, Transfer learning for portfolio optimization, arXiv preprint, 2023. arXiv: 2307.13546.
|
|
[14]
|
R. Carmona, F. Delarue, et al., Probabilistic Theory of Mean Field Games with Applications I-II, Springer, 2018.
|
|
[15]
|
P. Casgrain, A latent variational framework for stochastic optimization, Advances in Neural Information Processing Systems, 32 (2019).
|
|
[16]
|
P. Casgrain, A latent variational framework for stochastic optimization, in H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, Curran Associates, Inc., 32 (2019).
|
|
[17]
|
P. Casgrain and A. Kratsios, Optimizing optimizers: Regret-optimal gradient descent algorithms, In Conference on Learning Theory, PMLR, (2021), 883-926.
|
|
[18]
|
T. S. Cheng, A. Lucchi, A. Kratsios, D. Belius and I. Dokmanić, A theoretical analysis of the test error of finite-rank kernel ridge regression, Neural Information Processing Systems, 2023.
|
|
[19]
|
P. L. Combettes and J.-C. Pesquet, Stochastic approximations and perturbations in forward-backward splitting for monotone operators, Pure Appl. Funct. Anal., 1 (2016), 13-37.
|
|
[20]
|
E. M. Compagnoni, A. Scampicchio, L. Biggio, A. Orvieto, T. Hofmann and J. Teichmann, On the effectiveness of randomized signatures as reservoir for learning rough dynamics, in 2023 International Joint Conference on Neural Networks (IJCNN), IEEE, (2023), 1-8.
|
|
[21]
|
C. Cuchiero, L. Gonon, L. Grigoryeva, J.-P. Ortega and J. Teichmann, Discrete-time signatures and randomness in reservoir computing, IEEE Transactions on Neural Networks and Learning Systems, 33 (2022), 6321-6330.
doi: 10.1109/TNNLS.2021.3076777.
|
|
[22]
|
J. Duchi, E. Hazan and Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization, Journal of Machine Learning Research, 12 (2011), 2121-2159.
|
|
[23]
|
J. Fliege, A. I. F. Vaz and L. N. Vicente, Complexity of gradient descent for multiobjective optimization, Optimization Methods and Software, 34 (2019), 949-959.
doi: 10.1080/10556788.2018.1510928.
|
|
[24]
|
F. L. Gall, Powers of tensors and fast matrix multiplication, in ISSAC 2014—Proceedings of the 39th International Symposium on Symbolic and Algebraic Computation, ACM, New York, (2014), 296-303.
|
|
[25]
|
B. Ghorbani, S. Mei, T. Misiakiewicz and A. Montanari, Linearized two-layers neural networks in high dimension, The Annals of Statistics, 49 (2021), 1029-1054.
doi: 10.1214/20-AOS1990.
|
|
[26]
|
L. Gonon, Random feature neural networks learn {Black-Scholes} type PDEs without curse of dimensionality, Journal of Machine Learning Research, 24 (2023), Paper No. 189, 51 pp.
|
|
[27]
|
L. Gonon, L. Grigoryeva and J.-P. Ortega, Risk bounds for reservoir computing, Journal of Machine Learning Research, 21 (2020), Paper No. 240, 61 pp.
|
|
[28]
|
L. Gonon, L. Grigoryeva and J.-P. Ortega, Approximation bounds for random neural networks and reservoir systems, Ann. Appl. Probab., 33 (2023), 28-69.
doi: 10.1214/22-AAP1806.
|
|
[29]
|
L. Grigoryeva and J.-P. Ortega, Echo state networks are universal, Neural Networks, 108 (2018), 495-508.
doi: 10.1016/j.neunet.2018.08.025.
|
|
[30]
|
E. Hazan, et al., Introduction to online convex optimization, Foundations and Trends in Optimization, 2 (2016), 157-325.
doi: 10.1561/2400000013.
|
|
[31]
|
J. Heiss, J. Teichmann and H. Wutte, How infinitely wide neural networks benefit from multi-task learning-an exact macroscopic characterization, 2022.
|
|
[32]
|
C. Herrera, F. Krach, P. Ruyssen and J. Teichmann, Optimal stopping via randomized neural networks, Front. Math. Finance, 3 (2024), 31-77.
doi: 10.3934/fmf.2023022.
|
|
[33]
|
R. A. Horn and C. R. Johnson, Matrix Analysis, Cambridge University Press, 2 edition, 2013.
|
|
[34]
|
S. Hou, P. Kassraie, A. Kratsios, A. Krausen and J. Rothfuss, Instance-dependent generalization bounds via optimal transport, Journal of Machine Learning Research, 24 (2023), 16815-16865.
|
|
[35]
|
J. Howard and S. Ruder, Universal language model fine-tuning for text classification, arXiv preprint, 2018. arXiv: 1801.06146.
|
|
[36]
|
R. Hu, Deep learning for ranking response surfaces with applications to optimal stopping problems, Quant. Finance, 20 (2020), 1567-1581.
doi: 10.1080/14697688.2020.1741669.
|
|
[37]
|
M. Huang and X. Yang, Linear quadratic mean field social optimization: Asymptotic solvability and decentralized control, Applied Mathematics & Optimization, 84 (2021), 1969-2010.
|
|
[38]
|
G. Jeong and H. Y. Kim, Improving financial trading decisions using deep Q-learning: Predicting the number of shares, action strategies, and transfer learning, Expert Systems with Applications, 117 (2019), 125-138.
doi: 10.1016/j.eswa.2018.09.036.
|
|
[39]
|
Y. Jia, M. Johnson, W. Macherey, R. J. Weiss, Y. Cao, C.-C. Chiu, N. Ari, S. Laurenzo and Y. Wu, Leveraging weakly supervised data to improve end-to-end speech-to-text translation, in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, (2019), 7180-7184.
doi: 10.1109/ICASSP.2019.8683343.
|
|
[40]
|
A. Khaled, K. Mishchenko and P. Richtarik, Tighter theory for local sgd on identical and heterogeneous data, in Silvia Chiappa and Roberto Calandra, editors, Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, Proceedings of Machine Learning Research, PMLR, 108 (2020), 4519-4529.
|
|
[41]
|
G. Kimeldorf and G. Wahba, Some results on Tchebycheffian spline functions, J. Math. Anal. Appl., 33 (1971), 82-95.
doi: 10.1016/0022-247X(71)90184-3.
|
|
[42]
|
G. S. Kimeldorf and G. Wahba, A correspondence between Bayesian estimation on stochastic processes and smoothing by splines, Ann. Math. Statist., 41 (1970), 495-502.
doi: 10.1214/aoms/1177697089.
|
|
[43]
|
M. Kraus and S. Feuerriegel, Decision support from financial disclosures with deep neural networks and transfer learning, Decision Support Systems, 104 (2017), 38-48.
doi: 10.1016/j.dss.2017.10.001.
|
|
[44]
|
I. Kuzborskij and F. Orabona, Stability and hypothesis transfer learning, in Sanjoy Dasgupta and David McAllester, editors, Proceedings of the 30th International Conference on Machine Learning, Proceedings of Machine Learning Research, Atlanta, Georgia, USA, PMLR, 28 (2013), 942-950.
|
|
[45]
|
D. Kwon, J. Park and S. Hong, Tighter regret analysis and optimization of online federated learning, arXiv preprint, 2023. arXiv: 2205.06491.
|
|
[46]
|
P. D. Lax, Linear Algebra and its Applications, Pure and Applied Mathematics (Hoboken), Wiley-Interscience, Hoboken, NJ, second edition, 2007.
|
|
[47]
|
Q. Li, L. Chen, C. Tai and W. E, Maximum principle based algorithms for deep learning, Journal of Machine Learning Research, 18 (2018), Paper No. 165, 29 pp.
|
|
[48]
|
Q. Li, C. Tai and W. E, Stochastic modified equations and adaptive stochastic gradient algorithms, in International Conference on Machine Learning, PMLR, 70 (2017), 2101-2110.
|
|
[49]
|
H. Lin and M. Reimherr, Smoothness adaptive hypothesis transfer learning, in International Conference on Machine Learning, PMLR, 235 (2024), 30286-30316.
|
|
[50]
|
M. Lukoševičius and H. Jaeger, Reservoir computing approaches to recurrent neural network training, Computer Science Review, 3 (2009), 127-149.
doi: 10.1016/j.cosrev.2009.03.005.
|
|
[51]
|
O. L. Mangasarian, Parallel gradient distribution in unconstrained optimization, SIAM J. Control Optim., 33 (1995), 1916-1925.
doi: 10.1137/S0363012993250220.
|
|
[52]
|
T. Manole and J. Niles-Weed, Sharp convergence rates for empirical optimal transport with smooth costs, Ann. Appl. Probab., 34 (2024), 1108-1135.
doi: 10.1214/23-AAP1986.
|
|
[53]
|
B. McMahan, E. Moore, D. Ramage, S. Hampson and B. A. Y. Arcas, Communication-efficient learning of deep networks from decentralized data, in Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, PMLR, 54 (2017), 1273-1282.
|
|
[54]
|
S. Mei and A. Montanari, The generalization error of random features regression: Precise asymptotics and the double descent curve, Comm. Pure Appl. Math., 75 (2022), 667-766.
doi: 10.1002/cpa.22008.
|
|
[55]
|
Y. Nesterov and A. Nemirovskii, Interior-Point Polynomial Algorithms in Convex Programming, Society for Industrial and Applied Mathematics, 1994.
|
|
[56]
|
K. L. Pavasovic, J. Rothfuss and A. Krause, Mars: Meta-learning as score matching in the function space, arXiv preprint, 2022. arXiv: 2210.13319.
|
|
[57]
|
G. Peskir and A. Shiryaev, Optimal Stopping and Free-Boundary Problems, Lectures in Mathematics ETH Zürich. Birkhäuser Verlag, Basel, 2006.
|
|
[58]
|
Z. Ren and Y. J. Lee, Cross-domain self-supervised multi-task feature learning using synthetic imagery, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2018), 762-771.
doi: 10.1109/CVPR.2018.00086.
|
|
[59]
|
J. Rothfuss, V. Fortuin, M. Josifoski and A. Krause, PACOH: Bayes-optimal meta-learning with PAC-guarantees, in Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, Proceedings of Machine Learning Research, PMLR, 139 (2021), 9116-9126.
|
|
[60]
|
J. Rothfuss, M. Josifoski, V. Fortuin and A. Krause, Scalable PAC-Bayesian meta-learning via the PAC-optimal hyper-posterior: From theory to practice, Journal of Machine Learning Research, 24 (2023), Paper No. 386, 62 pp.
|
|
[61]
|
O. Sener and V. Koltun, Multi-task learning as multi-objective optimization, in S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, Curran Associates, Inc., 31 (2018).
|
|
[62]
|
J. L. Snell, Applications of martingale system theorems, Trans. Amer. Math. Soc., 73 (1952), 293-312.
doi: 10.1090/S0002-9947-1952-0050209-9.
|
|
[63]
|
Y. Tian, Y. Gu and Y. Feng, Learning from similar linear representations: Adaptivity, minimaxity, and robustness, Journal of Machine Learning Research, 26 (2025), Paper No. 187,125 pp.
|
|
[64]
|
C. Villani, Optimal Transport: Old and New, volume 338 of Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], Springer-Verlag, Berlin, 2009.
|
|
[65]
|
R. Wang, C. Hyndmann and A. Kratsios, The entropic measure transform, Canad. J. Statist., 48 (2020), 97-129.
doi: 10.1002/cjs.11537.
|
|
[66]
|
X. Wang, J. B. Oliva, J. Schneider and B. Póczos, Nonparametric risk and stability analysis for multi-task learning problems, IJCAI'16, AAAI Press, (2016), 2146-2152.
|
|
[67]
|
Y. Wei, F. Yang and M. J. Wainwright, Early stopping for kernel boosting algorithms: A general analysis with localized complexities, IEEE Trans. Inform. Theory, 65 (2019), 6685-6703.
|
|
[68]
|
K. Weiss, T. M. Khoshgoftaar and D. Wang, A survey of transfer learning, Journal of Big Data, 3 (2016), 1-40.
doi: 10.1186/s40537-016-0043-6.
|
|
[69]
|
Y. Xue, X. Liao, L. Carin and B. Krishnapuram, Multi-task learning for classification with Dirichlet process priors, Journal of Machine Learning Research, 8 (2007), 35-63.
|
|
[70]
|
Y. Yao, L. Rosasco and A. Caponnetto, On early stopping in gradient descent learning, Constr. Approx., 26 (2007), 289-315.
doi: 10.1007/s00365-006-0663-2.
|
|
[71]
|
J. Yosinski, J. Clune, Y. Bengio and H. Lipson, How transferable are features in deep neural networks?, Advances in Neural Information Processing Systems, 27 (2014).
|
|
[72]
|
M. D. Zeiler and R. Fergus, Visualizing and understanding convolutional networks, in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I 13, Springer, (2014), 818-833.
doi: 10.1007/978-3-319-10590-1_53.
|
|
[73]
|
T. Zhang and B. Yu, Boosting with early stopping: Convergence and consistency, Ann. Statist., 33 (2005), 1538-1579.
doi: 10.1214/009053605000000255.
|
|
[74]
|
J. Zhao, A. Lucchi, F. N. Proske, A. Orvieto and H. Kersting, Batch size selection by stochastic optimal control, in Has it Trained Yet? NeurIPS 2022 Workshop, 2022.
|