\`x^2+y_1+z_12^34\`
Advanced Search
Article Contents
Article Contents

Flexible online multivariate regression with variational Bayes and the matrix-variate Dirichlet process

  • * Corresponding author: David J. Nott

    * Corresponding author: David J. Nott 
Abstract Full Text(HTML) Figure(1) / Table(5) Related Papers Cited by
  • Flexible density regression methods, in which the whole distribution of a response vector changes with the covariates, are very useful in some applications. A recently developed technique of this kind uses the matrix-variate Dirichlet process as a prior for a mixing distribution on a coefficient in a multivariate linear regression model. The method is attractive for the convenient way that it allows borrowing strength across different component regressions and for its computational simplicity and tractability. The purpose of the present article is to develop fast online variational Bayes approaches to fitting this model, and to investigate how they perform compared to MCMC and batch variational methods in a number of scenarios.

    Mathematics Subject Classification: Primary: 62F15, 62G08; Secondary: 62J99.

    Citation:

    \begin{equation} \\ \end{equation}
  • 加载中
  • Figure 1.  (a) Estimated degree of weak informativity at $\gamma = 0.05$ for the bioassay example. (b) The plot points in grey and black represent estimated distribution of the conflict p-value using VSUGS-adjusted approach and by direct simulation from the prior predictive distribution. Both estimation uses the alternative prior $\sigma_0 = 2$ and $\sigma_1 = 7.5$

    Table 1.  In-sample accuracy of various approaches for the energy efficiency data

    RMSE MAPE
    $ y_1 $ $ y_2 $ Mean $ y_1 $ $ y_2 $ Mean Time (mins)
    Gibbs Sampler 0.0474 0.0450 0.0462 0.0605 0.0667 0.0636 141
    VB (Stick Breaking) 0.1853 0.2150 0.2001 0.2163 0.2469 0.2163 18
    VB (Pólya urn) 0.1579 0.1562 0.1570 0.1954 0.2806 0.2380 18
    Matrix VSUGS 0.3387 0.3160 0.3273 0.4527 0.5176 0.4851 3
     | Show Table
    DownLoad: CSV

    Table 2.  Prediction accuracy of various approaches for test set for the energy efficiency data

    RMSE MAPE
    $ y_1 $ $ y_2 $ Mean $ y_1 $ $ y_2 $ Mean
    Gibbs Sampler 0.5736 0.5881 0.5809 0.7427 0.8055 0.7741
    VB (Stick Breaking) 0.3930 0.4911 0.4421 0.5731 0.8347 0.7039
    VB (Pólya urn) 0.4005 0.5000 0.4503 0.4784 0.5038 0.4911
    Matrix VSUGS 0.4038 0.4882 0.4460 0.5364 0.6140 0.5752
    Adjusted VSUGS 0.2558 0.3329 0.2943 0.3399 0.4687 0.4043
     | Show Table
    DownLoad: CSV

    Table 3.  In-sample accuracy of various approaches for the robot arm data

    Method $ y_1 $ $ y_2 $ $ y_3 $ $ y_4 $ $ y_6 $ $ y_6 $ $ y_7 $ Mean
    RMSE Gibbs Sampler 0.1439 0.1338 0.1154 0.0854 0.1615 0.1615 0.1698 0.1304
    VB (Stick Breaking) 0.2058 0.1827 0.1492 0.1512 0.1917 0.2086 0.1521 0.1773
    VB (Pólya urn) 0.1887 0.1976 0.1704 0.1287 0.2042 0.2121 0.1403 0.1774
    Matrix VSUGS 0.5297 0.4665 0.3894 0.4125 0.4531 0.4630 0.4097 0.4463
    MAPE Gibbs Sampler 0.6156 0.4852 0.5554 0.5607 0.5902 0.8916 0.4822 0.5973
    VB (Stick Breaking) 0.8257 0.5441 0.5941 0.7593 0.6417 1.0223 0.5984 0.7122
    VB (Pólya urn) 0.6466 0.5814 0.6625 0.6585 0.6605 0.9861 0.6677 0.6948
    Matrix VSUGS 1.4872 1.0410 1.1798 1.5815 1.1536 1.5569 1.3610 1.3373
     | Show Table
    DownLoad: CSV

    Table 4.  Prediction accuracy of various approaches for the robot arm data

    Method $ y_1 $ $ y_2 $ $ y_3 $ $ y_4 $ $ y_6 $ $ y_6 $ $ y_7 $ Mean
    RMSE Gibbs Sampler 0.4099 0.3636 0.3404 0.3638 0.3598 0.3867 0.3395 0.3662
    VB (Stick Breaking) 0.5524 0.4909 0.4547 0.4225 0.5227 0.4653 0.4037 0.4732
    VB (Pólya urn) 0.4323 0.4195 0.3983 0.3851 0.4410 0.3916 0.3762 0.4063
    Matrix VSUGS 0.4198 0.3650 0.3375 0.3684 0.3737 0.3867 0.3490 0.3714
    Adjusted VSUGS 0.3513 0.3105 0.2869 0.2910 0.3249 0.3479 0.2741 0.3124
    MAPE Gibbs Sampler 1.0327 0.5987 1.5229 1.4137 1.0385 0.7330 1.2885 1.0897
    VB (Stick Breaking) 1.3711 0.7539 1.9556 1.0920 1.2136 1.1278 1.3825 1.2709
    VB (Pólya urn) 0.7587 0.6397 0.9774 0.9263 1.4217 1.1021 1.1947 1.0029
    Matrix VSUGS 0.9679 0.6600 1.8472 1.4600 1.2228 0.8428 1.4971 1.2140
    Adjusted VSUGS 0.7813 0.6097 1.7723 0.6627 1.1116 0.8206 1.0809 0.9770
     | Show Table
    DownLoad: CSV

    Table 5.  Prediction accuracy of various approaches using full training set for robot arm data

    Method $ y_1 $ $ y_2 $ $ y_3 $ $ y_4 $ $ y_6 $ $ y_6 $ $ y_7 $ Mean
    RMSE Matrix VSUGS 0.3930 0.3453 0.3065 0.3071 0.3402 0.3330 0.2923 0.3311
    Adjusted VSUGS 0.2505 0.2500 0.2315 0.1626 0.2541 0.2515 0.1785 0.2255
    MAPE Matrix VSUGS 0.8212 0.5288 1.9353 1.0970 0.9967 1.0329 0.7880 1.0285
    Adjusted VSUGS 0.7204 0.4575 1.8183 0.7178 0.8687 0.7910 0.7220 0.8708
     | Show Table
    DownLoad: CSV
  • [1] H. Attias, A variational Bayesian framework for graphical models, in Advances in Neural Information Processing Systems 12, MIT Press, 2000,209–215.
    [2] M. A. BeaumontW. Zhang and D. J. Balding, Approximate Bayesian computation in population genetics, Genetics, 162 (2002), 2025-2035. 
    [3] D. M. Blei and M. I. Jordan, Variational inference for Dirichlet process mixtures, Bayesian Analysis, 1 (2006), 121-143.  doi: 10.1214/06-BA104.
    [4] M. G. B. Blum, Approximate Bayesian computation: A nonparametric perspective, Journal of the American Statistical Association, 105 (2010), 1178-1187.  doi: 10.1198/jasa.2010.tm09448.
    [5] M. G. B. Blum and O. François, Non-linear regression models for approximate Bayesian computation, Statistics and Computing, 20 (2010), 63-75.  doi: 10.1007/s11222-009-9116-0.
    [6] M. G. Blum and V. C. Tran, HIV with contact tracing: A case study in approximate Bayesian computation, Biostatistics, 11 (2010), 644-660.  doi: 10.1093/biostatistics/kxq022.
    [7] G. Box, Sampling and Bayes' inference in scientific modelling and robustness (with discussion), Journal of the Royal Statistical Society, Series A, 143 (1980), 383-430.  doi: 10.2307/2982063.
    [8] M. Bryant and E. B. Sudderth, Truly nonparametric online variational inference for hierarchical Dirichlet processes, in Advances in Neural Information Processing Systems 25, 2012, 2708–2716.
    [9] M. Evans and G. H. Jang, Weak informativity and the information in one prior relative to another, Statist. Science, 26 (2011), 423-439.  doi: 10.1214/11-STS357.
    [10] M. Evans and H. Moshonov, Checking for prior-data conflict, Bayesian Analysis, 1 (2006), 893-914.  doi: 10.1214/06-BA129.
    [11] A. Gelman, Prior distributions for variance parameters in hierarchical models, Bayesian Analysis, 1 (2006), 515-533.  doi: 10.1214/06-BA117A.
    [12] A. GelmanA. JakulinM. G. Pittau and Y.-S. Su, A weakly informative default prior distribution for logistic and other regression models, The Annals of Applied Statistics, 2 (2008), 1360-1383.  doi: 10.1214/08-AOAS191.
    [13] A. K. Gupta and D. K. Nagar, Matrix Variate Distributions, Chapman & Hall/CRC Monographs and Surveys in Pure and Applied Mathematics, 104, Chapman & Hall/CRC, Boca Raton, FL, 2000.
    [14] L. A. HannahD. M. Blei and W. B. Powell, Dirichlet process mixtures of generalized linear models, Journal of Machine Learning Research, 12 (2011), 1923-1953. 
    [15] M. D. HoffmanD. M. BleiC. Wang and J. Paisley, Stochastic variational inference, Journal of Machine Learnng Research, 14 (2013), 1303-1347. 
    [16] M. C. Hughes and E. Sudderth, Memoized online variational inference for dirichlet process mixture models, in Advances in Neural Information Processing Systems 26 (eds. C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani and K. Q. Weinberger), Curran Associates, Inc., 2013, 1133–1141.
    [17] V. Huynh and D. Phung, Streaming clustering with Bayesian nonparametric models, Neurocomputing, 258 (2017), 52-62.  doi: 10.1016/j.neucom.2017.02.078.
    [18] M. I. JordanZ. GhahramaniT. S. Jaakkola and L. K. Saul, An introduction to variational methods for graphical models, Machine Learning, 37 (1999), 183-233.  doi: 10.1007/978-94-011-5014-9_5.
    [19] S. T. KabisaD. B. Dunson and J. S. Morris, Online variational Bayes inference for high-dimensional correlated data, Journal of Computational and Graphical Statistics, 25 (2016), 426-444.  doi: 10.1080/10618600.2014.998336.
    [20] K. Kurihara, M. Welling and Y. W. Teh, Collapsed variational Dirichlet process mixture models, in Proceedings of the 20th International Joint Conference on Artifical Intelligence, IJCAI'07, 2007, 2796–2801.
    [21] D. Lin, Online learning of nonparametric mixture models via sequential variational approximation, in Advances in Neural Information Processing Systems 26 (eds. C. Burges, L. Bottou, M. Welling, Z. Ghahramani and K. Weinberger), Curran Associates, Inc., 2013,395–403.
    [22] J. LutsT. Broderick and M. P. Wand, Real-time semiparametric regression, Journal of Computational and Graphical Statistics, 23 (2014), 589-615.  doi: 10.1080/10618600.2013.810150.
    [23] S. MontagnaS. T. TokdarB. Neelon and D. B. Dunson, Bayesian latent factor regression for functional and longitudinal data, Biometrics, 68 (2012), 1064-1073.  doi: 10.1111/j.1541-0420.2012.01788.x.
    [24] D. J. NottC. C. DrovandiK. Mengersen and M. Evans, Approximation of Bayesian predictive p-values with regression ABC, Bayesian Analysis, 13 (2018), 59-83.  doi: 10.1214/16-BA1033.
    [25] J. Ormerod and M. Wand, Explaining variational approximations, The American Statistician, 64 (2010), 140-153.  doi: 10.1198/tast.2010.09058.
    [26] A. RacineA. P. GrieveH. Flühler and A. F. M. Smith, Bayesian methods in practice: Experiences in the pharmaceutical industry, J. Roy. Statist. Soc. Ser. C, 35 (1986), 93-150.  doi: 10.2307/2347264.
    [27] S. Ray and B. Mallick, Functional clustering by Bayesian wavelet methods, J. R. Stat. Soc. Ser. B Stat. Methodol., 68 (2006), 305-332.  doi: 10.1111/j.1467-9868.2006.00545.x.
    [28] M.-A. Sato, Online model selection based on the variational Bayes, Neural Computation, 13 (2001), 1649-1681.  doi: 10.1162/089976601750265045.
    [29] B. Shahbaba and R. Neal, Nonlinear models using Dirichlet process mixtures, Journal of Machine Learning Research, 10 (2009), 1829-1850. 
    [30] A. Tank, N. Foti and E. Fox, Streaming Variational Inference for Bayesian Nonparametric Mixture Models, in Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics (eds. G. Lebanon and S. V. N. Vishwanathan), Proceedings of Machine Learning Research, 38, PMLR, San Diego, California, USA, 2015,968–976.
    [31] A. Tsanas and A. Xifara, Accurate quantitative estimation of energy performance of residential buildings using statistical machine learning tools, Energy and Buildings, 49 (2012), 560-567.  doi: 10.1016/j.enbuild.2012.03.003.
    [32] C. Wang and D. M. Blei, Truncation-free online variational inference for Bayesian nonparametric models, in Advances in Neural Information Processing Systems 25 (eds. F. Pereira, C. Burges, L. Bottou and K. Weinberger), Curran Associates, Inc., 2012,413–421.
    [33] C. Wang, J. Paisley and D. M. Blei, Online variational inference for the hierarchical Dirichlet process, in Proc. of the 14th Int'l. Conf. on Artificial Intelligence and Statistics (AISTATS), vol. 15, 2011,752–760, URL http://jmlr.csail.mit.edu/proceedings/papers/v15/wang11a/wang11a.pdf.
    [34] L. Wang and D. B. Dunson, Fast Bayesian inference in Dirichlet process mixture models, Journal of Computational and Graphical Statistics, 20 (2011), 196–216, Supplementary material available online. doi: 10.1198/jcgs.2010.07081.
    [35] S. Waterhouse, D. Mackay and T. Robinson, Bayesian methods for mixture of experts, in Advances in Neural Information Processing Systems 8, MIT Press, 1996,351–357.
    [36] X. ZhangD. J. NottC. Yau and A. Jasra, A sequential algorithm for fast fitting of Dirichlet process mixture models, Journal of Computational and Graphical Statistics, 23 (2014), 1143-1162.  doi: 10.1080/10618600.2013.870906.
    [37] Z. ZhangG. Dai and M. Jordan, Matrix-variate Dirichlet process mixture models, Proceedings of the Thirteenth Conference on Artificial Intelligence and Statistics (AISTATS), 9 (2010), 988-995. 
    [38] Z. ZhangD. WangG. Dai and M. I. Jordan, Matrix-variate Dirichlet process priors with applications, Bayesian Analysis, 9 (2014), 259-286.  doi: 10.1214/13-BA853.
  • 加载中

Figures(1)

Tables(5)

SHARE

Article Metrics

HTML views(1124) PDF downloads(291) Cited by(0)

Access History

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return