Advanced Search
Article Contents
Article Contents

# Approximate Bayesian inference for geostatistical generalised linear models

• * Corresponding author: Evangelos Evangelou
• The aim of this paper is to bring together recent developments in Bayesian generalised linear mixed models and geostatistics. We focus on approximate methods on both areas. A technique known as full-scale approximation, proposed by Sang and Huang (2012) for improving the computational drawbacks of large geostatistical data, is incorporated into the INLA methodology, used for approximate Bayesian inference. We also discuss how INLA can be used for approximating the posterior distribution of transformations of parameters, useful for practical applications. Issues regarding the choice of the parameters of the approximation such as the knots and taper range are also addressed. Emphasis is given in applications in the context of disease mapping by illustrating the methodology for modelling the loa loa prevalence in Cameroon and malaria in the Gambia.

Mathematics Subject Classification: Primary: 62H11, 62F15; Secondary: 60G15.

 Citation:

• Figure 2.  Scaled Frobenius norm ($\circ$) and computational time ($+$) against taper range $\gamma$

Figure 1.  Locations for the simulated example, indicated by $\cdot$, and grid for the full scale approximation, indicated by $\times$. Prediction is considered at a central site ($\circ$) and a far site ($\square$)

Figure 3.  Posterior densities for (a) logarithm of sill, (b) range, and (c) intercept. The histogram shows the MCMC sample. The approximation using INLA and exact covariance matrix is shown by a solid line, the INLA with the full scale approximation is shown by a dashed line, and the INLA with the predictive process approximation is shown by a dotted line. The true parameter value is indicated by a triangle on the horizontal axis

Figure 4.  Predictive distribution of the random field at a central site (left) and a far site (right). The histogram shows the MCMC sample. The approximation using INLA and exact covariance matrix is shown by a solid line, the INLA with the full scale approximation is shown by a dashed line, and the INLA with the predictive process approximation is shown by a dotted line

Figure 7.  Predicted prevalence of the loa loa parasite (top), and prediction standard deviation (bottom)

Figure 5.  Posterior plots for the variance parameters. (a) Joint posterior of $\log(\sigma^2)$ and $\rho$ using exact INLA, (b) Marginal posterior of $\log(\sigma^2)$, (c) Marginal posterior of $\rho$. The histogram is for the MCMC sample, the exact INLA is shown by a solid line and the full-scale INLA by a dadhed line

Figure 6.  Posterior for the regressor coefficients. The histogram is for the MCMC sample, the exact INLA is shown by a solid line and the full-scale INLA by a dashed line

Figure 8.  Sampled locations for the Gambia data from [30]

Figure 9.  Posterior densities for the parameters (a) $\tau^2$, (b) $\sigma^2$, and (c) $\rho$ of the Gambia malaria data

Figure 10.  Prediction of spatial random field for the Gambia malaria data (top) and prediction standard deviation (bottom)

Table 1.  Parameter estimates for the loa loa prevalence in Cameroon using exact INLA, approximate INLA, and MCMC

 Parameter Exact INLA Estimate 95% interval Intercept $(\beta_0)$ $-14.17$ $-18.58$ $-9.76$ Elevation $0-.65$Km $(\beta_1)$ $2.28$ $1.07$ $3.49$ Elevation $.65 - 1$Km $(\beta_2)$ $1.62$ $0.90$ $2.34$ Elevation $1 - 1.3$Km $(\beta_3)$ $0.81$ $0.17$ $1.45$ Max(NDVI) $(\beta_4)$ $14.09$ $8.00$ $20.17$ Sd(NDVI) $(\beta_5)$ $0.71$ $-9.68$ $11.10$ Sill $(\sigma^2)$ $0.72$ $0.50$ $1.02$ Range $(\rho)$ $0.55$ $0.25$ $1.08$ Parameter Full-scale INLA Estimate 95% interval Intercept $(\beta_0)$ $-15.03$ $-19.28$ $-10.77$ Elevation $0-.65$Km $(\beta_1)$ $2.19$ $1.02$ $3.36$ Elevation $.65 - 1$Km $(\beta_2)$ $1.60$ $0.91$ $2.29$ Elevation $1 - 1.3$Km $(\beta_3)$ $0.68$ $0.05$ $1.30$ Max(NDVI) $(\beta_4)$ $15.11$ $9.16$ $21.06$ Sd(NDVI) $(\beta_5)$ $1.27$ $-8.87$ $11.42$ Sill $(\sigma^2)$ $0.66$ $0.45$ $0.94$ Range $(\rho)$ $0.64$ $0.36$ $1.08$ Parameter MCMC Estimate 95% interval Intercept $(\beta_0)$ $-14.67$ $-19.16$ $-9.90$ Elevation $0-.65$Km $(\beta_1)$ $2.35$ $1.15$ $3.60$ Elevation $.65 - 1$Km $(\beta_2)$ $1.68$ $1.00$ $2.39$ Elevation $1 - 1.3$Km $(\beta_3)$ $0.83$ $0.18$ $1.46$ Max(NDVI) $(\beta_4)$ $14.66$ $8.19$ $20.82$ Sd(NDVI) $(\beta_5)$ $0.68$ $-9.04$ $11.22$ Sill $(\sigma^2)$ $0.70$ $0.51$ $0.99$ Range $(\rho)$ $0.48$ $0.28$ $0.92$

Table 2.  Parameter estimates of the Gambia malaria data

 Parameter Estimate 95% interval Intercept ($\beta_0$) $-0.07309$ $-2.95100$ $2.80483$ Age ($\beta_1$) $0.00066$ $0.00042$ $0.00090$ Untreated bed net ($\beta_2$) $-0.36216$ $-0.67639$ $-0.04793$ Treated bed net ($\beta_3$) $-0.68297$ $-1.07497$ $-0.29097$ Greenness ($\beta_4$) $-0.01334$ $-0.07507$ $0.04839$ PHC ($\beta_5$) $-0.32790$ $-0.77921$ $0.12340$ Area 2 ($\beta_6$) $-0.69385$ $-2.26728$ $0.87958$ Area 3 ($\beta_7$) $-0.78240$ $-2.44258$ $0.87778$ Area 4 ($\beta_8$) $0.65537$ $-1.12152$ $2.43226$ Area 5 ($\beta_9$) $0.97627$ $-0.80963$ $2.76217$ Nugget ($\tau^2$) $0.13209$ $0.00310$ $0.26136$ Sill ($\sigma^2$) $0.98459$ $0.34501$ $1.82461$ Range ($\rho$) $9.82025$ $0.54713$ $18.63800$
•  [1] S. Banerjee, A. E. Gelfand, A. O. Finley and H. Sang, Gaussian predictive process models for large spatial data sets, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70 (2008), 825-848.  doi: 10.1111/j.1467-9868.2008.00663.x. [2] O. E. Barndorff-Nielsen and D. R. Cox, Asymptotic Techniques for Use in Statistics, Chapman & Hall Ltd, 1989. doi: 10.1007/978-1-4899-3424-6. [3] N. Breslow and D. Clayton, Approximate inference in generalized linear mixed models, Journal of the American Statistical Association, 88 (1993), 9-25. [4] M. Cameletti, F. Lindgren, D. Simpson and H. Rue, Spatio-temporal modeling of particulate matter concentration through the SPDE approach, AStA Advances in Statistical Analysis, 97 (2013), 109-131.  doi: 10.1007/s10182-012-0196-3. [5] O. F. Christensen, J. Møller and R. Waagepetersen, Analysis of Spatial Data Using Generalized Linear Mixed Models and Langevin-type Markov chain Monte Carlo, Technical report, Department of Mathematical Sciences, Aalborg University, 2000. [6] O. F. Christensen and R. Waagepetersen, Bayesian prediction of spatial count data using generalized linear mixed models, Biometrics, 58 (2002), 280-286.  doi: 10.1111/j.0006-341X.2002.00280.x. [7] O. F. Christensen, Monte Carlo maximum likelihood in model-based geostatistics, Journal of Computational and Graphical Statistics, 13 (2004), 702-718.  doi: 10.1198/106186004X2525. [8] O. F. Christensen and P. J. Ribeiro Jr, geoRglm: A package for generalised linear spatial models, R News, 2 (2002), 26-28. [9] P. Diggle, R. Moyeed, B. Rowlingson and M. Thomson, Childhood malaria in the Gambia: A case-study in model-based geostatistics, Journal of the Royal Statistical Society: Series C (Applied Statistics), 51 (2002), 493-506.  doi: 10.1111/1467-9876.00283. [10] P. J. Diggle, J. A. Tawn and R. A. Moyeed, Model-based geostatistics, Journal of the Royal Statistical Society: Series C (Applied Statistics), 47 (1998), 299-350.  doi: 10.1111/1467-9876.00113. [11] P. J. Diggle, M. C. Thomson, O. F. Christensen, B. Rowlingson, V. Obsomer, J. Gardon, S. Wanji, I. Takougang, P. Enyong, J. Kamgno, J. H. Remme, M. Boussinesq and D. H. Molyneux, Spatial modelling and the prediction of loa loa risk: decision making under uncertainty, Annals of Tropical Medicine and Parasitology, 101 (2007), 499-509. [12] J. Eidsvik, S. Martino and H. Rue, Approximate Bayesian inference in spatial generalized linear mixed models, Scandinavian Journal of Statistics, 36 (2009), 1-22.  doi: 10.1111/j.1467-9469.2008.00621.x. [13] J. Eidsvik, A. O. Finley, S. Banerjee and H. Rue, Approximate bayesian inference for large spatial datasets using predictive process models, Computational Statistics & Data Analysis, 56 (2012), 1362-1380.  doi: 10.1016/j.csda.2011.10.022. [14] E. Evangelou, Z. Zhu and R. L. Smith, Estimation and prediction for spatial generalized linear mixed models using high order laplace approximation, Journal of Statistical Planning and Inference, 141 (2011), 3564-3577.  doi: 10.1016/j.jspi.2011.05.008. [15] E. Evangelou and V. Maroulas, Sequential empirical Bayes method for filtering dynamic spatiotemporal processes, Spatial Statistics, 21 (2017), 114-129.  doi: 10.1016/j.spasta.2017.06.006. [16] E. Evangelou and Z. Zhu, Optimal predictive design augmentation for spatial generalised linear mixed models, Journal of Statistical Planning and Inference, 142 (2012), 3242-3253.  doi: 10.1016/j.jspi.2012.05.008. [17] A. O. Finley, H. Sang, S. Banerjee and A. E. Gelfand, Improving the performance of predictive process modeling for large datasets, Data Analysis, 53 (2009), 2873-2884.  doi: 10.1016/j.csda.2008.09.008. [18] R. Furrer, M. G. Genton and D. Nychka, Covariance tapering for interpolation of large spatial datasets, Journal of Computational and Graphical Statistics, 15 (2006), 502-523.  doi: 10.1198/106186006X132178. [19] T. Gneiting, Compactly supported correlation functions, Journal of Multivariate Analysis, 83 (2002), 493-508.  doi: 10.1006/jmva.2001.2056. [20] F. Hosseini, J. Eidsvik and M. Mohammadzadeh, Approximate bayesian inference in spatial glmm with skew normal latent variables, Data Analysis, 55 (2011), 1791-1806.  doi: 10.1016/j.csda.2010.11.011. [21] J. B. Illian, S. H. Sørbye and H. Rue, A toolbox for fitting complex spatial point process models using integrated nested Laplace approximation (INLA), The Annals of Applied Statistics, 6 (2012), 1499-1530.  doi: 10.1214/11-AOAS530. [22] C. G. Kaufman, M. J. Schervish and D. W. Nychka, Covariance tapering for likelihood-based estimation in large spatial data sets, Journal of the American Statistical Association, 103 (2008), 1545-1555.  doi: 10.1198/016214508000000959. [23] R. Langrock, Some applications of nonlinear and non-Gaussian state–space modelling by means of hidden Markov models, Journal of Applied Statistics, 38 (2011), 2955-2970.  doi: 10.1080/02664763.2011.573543. [24] F. Lindgren, H. Rue and J. Lindström, An explicit link between Gaussian fields and Gaussian Markov random fields: The stochastic partial differential equation approach, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73 (2011), 423-498.  doi: 10.1111/j.1467-9868.2011.00777.x. [25] S. Martino, R. Akerkar and H. Rue, Approximate Bayesian inference for survival models, Scandinavian Journal of Statistics, 38 (2011), 514-528.  doi: 10.1111/j.1467-9469.2010.00715.x. [26] P. McCullagh and J. A. Nelder, Generalized Linear Models, Chapman & Hall/CRC, 1999. doi: 10.1007/978-1-4899-3242-6. [27] W. Müller, Collecting Spatial Data: Optimum Design of Experiments for Random Fields, Springer Verlag, 2007. [28] M. Paul, A. Riebler, L. M. Bachmann, H. Rue and L. Held, Bayesian bivariate meta-analysis of diagnostic test studies using integrated nested Laplace approximations, Statistics in Medicine, 29 (2010), 1325-1339.  doi: 10.1002/sim.3858. [29] R Core Team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, 2018, URL https://www.R-project.org/. [30] P. J. Ribeiro Jr and P. J. Diggle, geoR: A package for geostatistical analysis, R News, 1 (2001), 15-18. [31] H. Rue and L. Held, Gaussian Markov Random Fields: Theory and Applications, Monographs on statistics and applied probability, Chapman & Hall/CRC, 2005. doi: 10.1201/9780203492024. [32] H. Rue, S. Martino and N. Chopin, Approximate bayesian inference for latent gaussian models by using integrated nested laplace approximations, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71 (2009), 319-392.  doi: 10.1111/j.1467-9868.2008.00700.x. [33] H. Sang and J. Z. Huang, A full scale approximation of covariance functions for large spatial data sets, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 74 (2012), 111-132.  doi: 10.1111/j.1467-9868.2011.01007.x. [34] H. Sang, M. Jun and J. Huang, Covariance approximation for large multivariate spatial data sets with an application to multiple climate model errors, The Annals of Applied Statistics, 5 (2011), 2519-2548.  doi: 10.1214/11-AOAS478. [35] B. Schrödle and L. Held, A primer on disease mapping and ecological regression using INLA, Computational Statistics, 26 (2011), 241-258.  doi: 10.1007/s00180-010-0208-2. [36] Z. Shun and P. McCullagh, Laplace approximation of high dimensional integrals, Journal of the Royal Statistical Society, Series B, Methodological, 57 (1995), 749-760.  doi: 10.1111/j.2517-6161.1995.tb02060.x. [37] D. Simpson, F. Lindgren and H. Rue, In order to make spatial statistics computationally feasible, we need to forget about the covariance function, Environmetrics, 23 (2012), 65-74.  doi: 10.1002/env.1137. [38] M. L. Stein, Interpolation of Spatial Data: Some Theory for Kriging, Springer-Verlag Inc, 1999. doi: 10.1007/978-1-4612-1494-6. [39] B. M. Taylor and P. J. Diggle, INLA or MCMC? A tutorial and comparative evaluation for spatial prediction in log-Gaussian Cox processes, Journal of Statistical Computation and Simulation, 84 (2014), 2266-2284.  doi: 10.1080/00949655.2013.788653. [40] H. Zhang, On estimation and prediction for spatial generalized linear mixed models, Biometrics, 58 (2004), 129-136.  doi: 10.1111/j.0006-341X.2002.00129.x.

Figures(10)

Tables(2)

## Article Metrics

HTML views(1640) PDF downloads(411) Cited by(0)

## Other Articles By Authors

• on this site
• on Google Scholar

### Catalog

/

DownLoad:  Full-Size Img  PowerPoint