Approximate Bayesian Inference for Geostatistical Generalised Linear Models

: The aim of this paper is to bring together recent developments in Bayesian generalised linear mixed models and geostatistics. We focus on approximate methods on both areas. A technique known as full-scale approximation, proposed by Sang and Huang (2012) for improving the computational drawbacks of large geostatistical data, is incorporated into the INLA methodology, used for approximate Bayesian inference. We also discuss how INLA can be used for approximating the posterior distribution of transformations of parameters, useful for practical applications. Issues regarding the choice of the parameters of the approximation such as the knots and taper range are also addressed. Emphasis is given in applications in the context of disease mapping by illustrating the methodology for modelling the loa loa prevalence in Cameroon and malaria in the Gambia.


Introduction
Since the paper by Diggle et al. (1998), Bayesian methods for fitting spatial geostatistical models have become the norm.The models are fitted using Markov chain Monte-Carlo methods, or other simulation-based methods (e.g Zhang, 2004), but with the drawback of being too computationally intensive in order to obtain good samples.Furthermore, large amounts of geostatistical data are impossible to handle with the common computer.Recently, a number of papers have appeared in the literature that aim to overcome these computational disadvantages of Bayesian spatial models.This Date: February 11, 2019 1 paper aims to illustrate the application of some of these methods to the geostatistical generalised linear model (GGLM) (the term "geostatistical generalised linear model", as opposed to the more general "spatial generalised linear model", is used here to emphasise the geostatistical feature of our model); as well as to address issues related to the implementation of these methods.Rue et al. (2009) introduced the method of integrated, nested, Laplace approximation (INLA) for fitting Bayesian mixed models.The general setting assumes that one can write down the conditional likelihood of the observations given the value of the random effect.The random effect is assumed to be Gaussian with variance depending on a small number of parameters.Moreover, the fixed effects are assumed to have a Gaussian or uniform prior.Inference is performed by approximating the unconditional distribution of the observations using Laplace's method (Barndorff-Nielsen and Cox, 1989) and emphasis is given to the case where the variance-covariance matrix of the random effects has a Markov random field structure (Rue and Held, 2005).Despite some criticism (Taylor and Diggle, 2014), the method has been shown to be relatively accurate in several settings (Eidsvik et al., 2009, Paul et al., 2010, Martino et al., 2011, Schrödle and Held, 2011, Illian et al., 2012).
One advantage of the Markov random field structure, compared to the geostatistical random field, is computational.In the former the precision matrix can be written explicitly in terms of the variance parameters and is sparse.Therefore, calculations involving the Gaussian likelihood are performed in short computational time using routines for sparse matrices.On the other hand, spatial data over a continuous area are naturally represented by a geostatistical random field.
Nevertheless, the assumptions made by INLA are satisfied for geostatistical models and, to that end, Eidsvik et al. (2009) illustrated its application successfully to the GGLM.One point that was only briefly discussed in their paper is the application of the method when the number of sampling locations is large.In this case, the inverse of a large dense matrix is required at every iteration of the Laplace approximation which can have detrimental effects in the speed and accuracy of the method.In a subsequent paper, Eidsvik et al. (2012) account for this by considering the predictive process approximation to the covariance (Banerjee et al., 2008) but found that the approximation is sensitive to the number of knots.
Motivated by the fast calculations for sparse matrices and the success of INLA in the Markov random field case, Lindgren et al. (2011) developed a theory for approximating spatial random fields by Markov random fields and took advantage of the computational benefits of the latter.This method has been illustrated in subsequent papers by Simpson et al. (2012) andCameletti et al. (2013) among others.On the other hand, this method applies only for certain covariance functions, namely a subset of the Matérn family.
An alternative approach, termed full scale approximation, has been suggested by Sang and Huang (2012) by building on the idea of the predictive process model, combined with covariance tapering (Furrer et al., 2006).The idea is to replace the dense covariance matrix of the random field, Σ say, by a matrix of the form Σ S + ΓΣ −1 L Γ T where Σ S is sparse and Σ L is of low dimension.The benefit is that the new matrix is easier to handle computationally and is a good approximation to the original matrix.Sang et al. (2011) and Sang and Huang (2012) apply this approximation to Gaussian models.An interesting question answered in this paper is how we can incorporate this approximation into INLA.
There are cases, e.g. in the context of disease mapping, where interest lies in predicting a nonlinear transformation of the spatial random effects.For MCMC this is straightforward by applying the transformation to the MCMC sample but for INLA, unless the distribution of the transformed variable is known in closed form, it is not so clear.This question is answered in the present paper by first approximating the predictive distribution of the spatial random effects by a mixture of normal distributions, and then use that to approximate the mean of the transformation via a weighted average.
The remaining of the paper is organised as follows.The GGLM is introduced in Section 2.
Section 3 describes the INLA methodology and discusses its application to GGLM.Three examples are presented in Section 4 illustrating the techniques proposed.Finally, Section 5 presents the conclusions of this article.Some technical details are put in the Appendix.

Model formulation
We assume a continuous domain of interest S over which a Gaussian random field Z is defined.
That is, for every finite subset S = {s 1 , . . ., s k } ⊂ S, the value of Z on S follows the k-dimensional normal distribution with mean 0 and variance-covariance matrix Σ.The (i, j) element of the k × k matrix Σ has the form where σ 2 > 0 is the so-called sill parameter, interpreted as the variance of the random field at each location, r is a known function of the distance d ij between locations s i and s j called the correlogram, giving the correlation between the components of the random field having distance d ij apart, and ρ > 0 is parameter of the correlogram called the range.Two forms of the correlogram that we will use in the examples of Section 4 are We assume that a total of N observations y = {y ij , i = 1, . . ., k, j = 1, . . ., n i } are taken within the spatial domain S with the (i, j)th observation corresponding to location s i ∈ S, and n i denotes the number of observations at location i with n i = N .The n i 's may, for example denote the number individuals sampled or the number of observations from a single individual.Finally, each y ij corresponds to the total of m ij replications of the (i, j) experiment, e.g. for binary data m ij corresponds to the number of trials of the experiment and for Poisson data to the length of unit time that the sampling is taking place.
We denote by S the collection of the k sampled locations, and by z the value of Z on S. The observations are modelled by a GGLM, that is, y ij is independent of y i ′ j ′ for i = i ′ or j = j ′ conditioned on the linear predictor w, with distribution from the exponential family.where κ(•) is known as the cumulant function (McCullagh and Nelder, 1999).
The (i, j) component of the N -dimensional linear predictor w, w ij , is assumed to have the form where x ij denotes a vector of p explanatory variables corresponding to the jth sample at location s i , β corresponds to a p-dimensional vector of regressor coefficients, z i denotes the value of the Gaussian random field at location s i and ǫ ij is a zero-mean random component, independent of the z i 's, corresponding to other imponderable effects associated with the (i, j)th individual.
Let X be the N × p matrix with rows the x ij 's, and let A be the N × k matrix of 0's and 1's, with the 1 in each row indicating which component of the random field z is present in the (i, j) element of the linear predictor.In other words we write Thus the linear predictor w follows the N -dimensional normal distribution with mean Xβ and 5 variance-covariance matrix Let θ = (σ 2 , ρ, θ ǫ ) denote the parameters in T, which we will refer to as variance parameters.
Our objective is, given the data y, to predict the value of the random field at locations Q = {q 1 , . . ., q m } ⊂ S, and estimate the parameters β, θ.We adopt a Bayesian approach where we assume prior distributions for the parameters, and their posterior distributions conditioned on the data, as well as the conditional distribution of the random field, are sought.
We use the generic symbol f (•) to denote the probability density/mass function of the expression in the parentheses.In the absence of any information about the explanatory variables, it is common to assume flat improper priors for the regressor coefficients β (Christensen et al., 2000, Christensen andWaagepetersen, 2002) Another popular prior considered in the literature is the p-dimensional normal prior with mean say ξ 0 and precision Φ 0 , i.e.
For the moment we will not specify any priors for the variance parameters θ but note that for the range parameter, an improper prior leads to an improper posterior (Christensen et al., 2000).
3 Methodology: The INLA approach

General
Suppose we observe y and we wish to predict the N -dimensional random variable w.The distribution of either y or w may depend on an unknown parameter θ.Furthermore, suppose that the distribution of w|θ is Gaussian.If θ was known, then prediction is performed by evaluating the predictive distribution of w|y; θ, where f (y, w|θ) = f (y|w; θ)f (w|θ).In most applications of generalised linear mixed models the normalising constant is not available in closed form and exact evaluation of (1) is impossible.(See for example Breslow and Clayton (1993) and Rue et al. (2009).)The idea of the Gaussian approximation (Rue et al., 2009) is to approximate (1) by a Gaussian density centred around and precision The values of ŵ and Ĥ are implicitly functions of (y, θ).Thus, the approximation to (1), obtained by second-order Taylor expansion, is Since the distribution of w|θ is Gaussian, the Gaussian approximation to the predictive distribution is not unreasonable.Alternatively Hosseini et al. (2011) proposed the use of the skew normal as a better approximation when skewness is more apparent and the Gaussian approximation is inaccurate, however this approach will not be pursued in this paper.
The normalising constant (2) is approximated using Laplace approximation (Barndorff-Nielsen and Cox, 1989) by Now suppose θ is unknown and assign a prior θ ∼ f (θ).Then, the posterior density for θ is From (4) we can approximate f (θ|y) ∝ f (y, w|θ)f (θ) f (w|y; θ) w= ŵ . (5) Similarly, for prediction, (1) needs to be integrated with respect to f (θ|y) dθ, i.e. we need to evaluate f (w|y) = f (w|y; θ)f (θ|y) dθ. ( 6) Using ( 3) and ( 5) in ( 6) we arrive at the following approximation On the other hand, as the distribution f (θ) is in general non-Gaussian, application of Laplace approximation to the integral in ( 7) would be inefficient.Instead, we can view (7) as a "continuous mixture" of multivariate normal densities, f (w|y; θ), indexed by θ, with weights given by f (θ|y).As such it can be approximated by a finite mixture of size a at selected points {θ 1 , . . ., θ a } (see Appendix A on how to obtain such values), giving rise to the multivariate normal mixture approximation, W j f (w|y; θ j ), where are the mixture weights.
Using properties of mixture distributions, the mean of ( 8) is and is the approximate best predictor for w, with variance, obtained similarly, In the above, we denote by ŵj and Ĥj the mean and precision respectively of the Gaussian approximation (3) at θ = θ j , j = 1, . . ., a.
Let u be another random quantity of dimension n, for example u may represent components of the random field or the fixed effects, whose distribution is assumed to be independent of y conditioned on w, with conditional distribution for conformable matrices Λ and Ω and vector λ, dependent on θ, i.e. normal with mean and variance E(u|w; θ) = Λw + λ, respectively.Then, from (3) the approximation to the conditional density of u|y; θ, f (u|y; θ), is normal with mean and variance respectively The predictive density, f (u|y) = f (u|y; θ)f (θ|y) dθ, is then approximated by a multivariate normal mixture by similarly to (8).Prediction intervals for the components of u are then computed using the appropriate quantiles of the predictive distribution (12).
Suppose further that we are interested in a function g(u) where u is a single element of u.
For example, in the context of disease mapping, when u is the linear predictor at some location, g(u) = e u /(1+e u ) denotes the prevalence of the disease at that location, and interest lies in creating a prediction map for g(u).For given data y the prediction is If g(•) is a non-linear function of u, simply replacing u by its expectation leads to biased predictions.
Of course, if g(•) is monotone, a prediction interval can be obtained by transforming the prediction interval from ( 12), but for the purpose of creating a prediction map, a point prediction is needed.
The solution we propose is to evaluate (13) numerically using the ideas of Langrock (2011).To that end, let u 0 , . . ., u b be an ordered grid of values of u.These values are chosen in order to contain most of the mass of the marginal of u from approximation (12).Then where the last integral is over a univariate normal density and is straightforward to compute.An approximation to the prediction variance is obtained similarly.

Application to GGLM
The distribution of the linear predictor is i.e. normal with mean Xβ and variance-covariance matrix T = AΣA T + D.
We will distinguish two cases for the prior distribution for β, which are the most common found in the literature, the normal and flat improper priors.These are denoted respectively by for given values ξ 0 and Φ 0 , and In each of these two cases we can in fact integrate out β from (15), and obtain the distribution of the linear predictor without conditioning on the regressor coefficients.In both cases this distribution will be the normal distribution but with different mean and variance each time.We will denote by λ N and λ U the means, and by Υ N and Υ U the precision matrices when the prior for β is respectively normal or uniform.Then The distribution of β|w; θ is also normal.Writing ξ N , ξ U for the means and Φ N , Φ U for the precision matrices in the case of the normal and uniform priors, we have For prediction we compute the conditional mean and precision of the spatial random field given (w; θ).Let µ, P denote this mean and precision matrix.Then with λ and Υ chosen accordingly from ( 16).Moreover, let z 0 be the value of the random field at locations Q, having mean 0 and variance-covariance matrix Σ 0 , and let C denote the covariance between z and z 0 .Then its conditional mean and precision given (w; θ) are µ 0 and P 0 where Equations ( 17), ( 18) and ( 19) are of the form (10) so, given the posterior distribution for θ, their predictive distribution is obtained from ( 11) and ( 12).To that end, write for the general form of the distribution of w|θ, with Φ chosen from (17), and define ŵ = argmax w f (y|w)f (w; θ).
Also let Then, by Laplace approximation, the conditional distribution of w|y; θ is approximately and the posterior for θ becomes

A note on the asymptotics
Central to the INLA methodology is the application of Taylor expansion on (1) which results to approximations (3) and (4).A regulatory condition for the Taylor approximation, and hence the Laplace approximation, to be valid is that the dimension of the domain of the function that is being approximated is constant.Unfortunately, this is not the case for the GGLM, where the two common asymptotic regimes increasing-domain and infill asymptotics assume that the number of sampling locations, k, increases to infinity (Stein, 1999, Section 3.3).Shun and McCullagh (1995) studied the performance of the Laplace approximation under nonstandard asymptotic settings.The case for GGLM was discussed further by Evangelou et al. (2011).
In short, the validity of Laplace approximation for GGLM requires increasing domain asymptotics and the assumption that (m ij n i )/k → ∞.Binary data, for example, where there is one observation per location, do not satisfy these requirements and application of the methodology in this case may lead to inconsistent results.Further research in this area is necessary for a methodology that will include these cases as well.

Full-scale approximation to the covariance
Application of the INLA methodology described in the earlier sections requires the inversion and determinant of k × k matrices.For example, for computing the inverse of the N × N matrix T = D + AΣA T , we may use the formula The computations become intractable if the dimension k is large and Σ is dense, and a big part of the literature in spatial statistics is concerned with tackling this issue.
Recently, Sang and Huang (2012) have proposed the idea of full-scale approximation to the covariance matrix.The approximation starts by specifying a set of k L knot locations S L , where k L is much smaller than k, which cover the region of interest sufficiently.The predictive process approximation to Σ is then (Banerjee et al., 2008) where Σ L is the covariance matrix at locations S L and Γ denotes the covariance of the random field between S and S L .The predictive process approximation is efficient in capturing the largescale dependence structure of the random field but may not perform well for short distances.The approximation can be improved by adding to it the sparse matrix a technique known as tapering (Furrer et al., 2006), where ⊙ denotes the elementwise product between two matrices, and E γ is a sparse correlation matrix on S, arising from a compactly supported correlation function (Gneiting, 2002) e.g. the spherical correlation, with range γ.Then, the full scale approximation to the matrix Σ becomes Once the full-scale approximation is available, it can be incorporated into the INLA methodology as outlined in Appendix B.
Working with Σ is much easier computationally than Σ, as discussed by Sang and Huang (2012).
In particular, the computational cost of using the original k × k covariance matrix Σ is typically to the order of O(k 3 ) while the computational cost of using the approximation Σ is to the order of where k L is the number of knots and k T is the average number of non-zero elements in the rows of Σ which are both much smaller than k.
A number of important questions arise at this point.(a) What is the optimal choice for knot locations?and (b) What is the optimal choice for the taper range γ? Finley et al. (2009) discuss the first question in the context of Gaussian data by putting it in the framework of spatial design (Müller, 2007).To that end, Finley et al. (2009) concluded that a sequential design, i.e. selecting the knot locations one at a time, gives better results than a space-filling design.As a design criterion they used the average prediction variance.However, it is not straightforward how to compute the prediction variance for non-Gaussian responses.On the other hand, Evangelou and Zhu (2012) showed that, if the sample size at each location is large, the optimal design for GGLM comes close to the optimal design for the Gaussian geostatistical model.With this in mind, the sequential updating scheme proposed in Finley et al. (2009), remains a good choice in GGLM as well.
In terms of choosing the taper range γ, Kaufman et al. (2008) suggest to begin with a low value and then increase it sequentially.As γ increases, the approximation error becomes smaller but the computational time increases and a choice that balances accuracy against computational speed should be made.In fact for very large values of γ there is very little improvement in terms of accuracy and speed, if at all.In this paper we measure accuracy in terms of the Frobenius distance, defined as where Σ denotes the exact covariance matrix for the spatial random field z, and Σ its approximation.
The advantage of using the Frobenius norm, as opposed to the Kullback-Leibler divergence, is that it is faster to compute when the matrices are of large dimension.In terms of computational time, we suggest using as a proxy the execution time for the Cholesky decomposition of Σ.The Frobenius norm and the computational time are measured for different values of the taper range and then scaled from 0 to 1.These values are then plotted together against the taper range to produce a plot similar to the one shown in Figure 2  A total of k = 200 locations were chosen within the region (0, 1) × (0, 1) as shown in Figure 1, from where a Gaussian random field was sampled.The observations consist of a binomial GGLM with logit link and exponential covariance function, and m = 100 observations were drawn at each location.The linear predictor at location x = (x 1 , x 2 ), w x , was set to and covariance parameters σ 2 = 0.5, ρ = 0.4, and ǫ x ∼ N(0, τ 2 ) i.i.d. with τ 2 = 0.1σ 2 .
The INLA method was used to estimate the parameters of the model and the random field at the observed locations.We ran the INLA method with three different covariance functions, the exact exponential covariance, its full-scale approximation with knots corresponding to a triangular grid, shown by a × in Figure 1 and using the spherical correlation for tapering, and the predictive process approximation using the same knots.In order to choose an appropriate value for the taper range, the approximation is computed for ρ = 0.1, 0.2, . . ., 1.5 and for γ = 0.100, 0.145, . . ., 1.000 (the maximum distance between locations is 1.078).The Frobenius norm between the exact and approximate covariance matrices, Σ and Σ respectively, was computed as well as the computational time for performing Cholesky decomposition on Σ.The norm and computational time are scaled linearly in order to range from 0 to 1 and are plotted against the different values of the taper range.Figure 2 shows such plot for the case ρ = 0.5 (other values of ρ produce a similar plot).Based on this plot, we choose the taper range to be γ = 0.2 as this allows an approximation errror of about 40% with increase of about 40% in computing time.
The R package geoRglm (R Core Team, 2018, Christensen andRibeiro Jr, 2002) was used for the MCMC.The length of the sample was 1,000,100, with the first 10,000 being discarded as burn-in and the remaining samples were thinned by 100, thus retaining 9901 random samples from the posterior distribution of the parameters and the random field.
The posterior densities for the two covariance parameters and the intercept using the different methods are shown in Figure 3 (the plots for the other two regressor coefficients are similar and are not shown).We observe that the INLA with the exact covariance agrees with the MCMC sample and the method using the full-scale approximation matches closely with the exact.The predictive process approximation may have a small bias when used to estimate the covariance parameters but it still works well for the estimation of the regressor coefficients.
We also considered prediction at two sites, a central site, indicated by a circle in Figure 1, and a far site, indicated by a square.The predictive distributions using each method are shown in Figure 4.The plots show that the exact and full-scale approximations agree with the MCMC sample but the predictive process approximation exhibits small bias.In summary, the example shows that INLA provides a good approximation to the predictive distributions of the parameters and the random field and the full-scale approximation can be used as a computationally milder alternative.

Loa loa prevalence in Cameroon
The loa loa is a worm-like parasite infecting human by travelling through their tissue.The young larvae develop in flies, commonly found in Africa and India, which infect human by biting them.
The main methods of diagnosis include the presence of the parasite in the blood or in the eye, and the presence of skin swellings.When found, the worm is removed by a surgery.Diggle et al. (2007) discuss about the importance of estimating the prevalence of loa loa for precautionary reasons.
The data consist of samples from k = 197 villages in Cameroon and its neighbouring country Nigeria, as shown in Figure 7.For the ith village the number of infected subjects y i and the number of tested subjects m i were recorded.We also have information about the altitude and a measure of vegetation, the normalised-difference vegetation index (NDVI) for each month for the year 1996.
Note that the individuals in this example are the villages.
The p = 6 explanatory variables are as in Diggle et al. (2007), although our values are slightly different from theirs.They consist of an intercept (β 0 ), a piecewise linear effect of the altitude with the regression coefficient changing at .65Km and 1Km (β 1 , β 2 , β 3 ), a linear effect of the maximum NDVI when it is below the 0.8 level β 4 , and a linear effect of the standard deviation of NDVI β 5 .
We assign flat improper priors for all regressor coefficients.
Based on the above information, an appropriate model for the linear predictor is The number of infected individuals in the ith village conditioned on the linear predictor is modelled as binomial GGLM with logit link, i.e.Each of the regressor coefficients is assumed to have an independent improper flat prior.
The random field z is assumed to have an exponential covariance structure.In addition, the relative nugget parameter is fixed to τ 2 /σ 2 = 0.4 as in Diggle et al. (2007), and we assume a uniform improper prior for σ 2 , i.e. σ 2 ∼ U(0, ∞), while for the range parameter we assign ρ ∼ U(0.1, 1.4).
We consider model fitting using INLA with exact and approximate covariance matrix at 36 knots forming a square grid.Figure 5 shows the posterior density of (log σ 2 , ρ)|y, evaluated at 21 × 21 equidistant points for each parameter using the two methods.The two methods agree in terms of log σ 2 but some discrepancies exist when estimating ρ.There are also small discrepancies for some of the regressor coefficients as can be seen in Figure 6 which suggests that a larger number of knots is required.A 95% confidence interval for the variance parameters is obtained by spline approximation to the marginal posterior for each parameter.For the regressor coefficients the confidence interval is obtained assuming normality of the estimator.These results are summarised in Table 1.Besides the standard deviation of NDVI, all other coefficients are found to be significant using both methods.
A probability map of the prevalence of loa loa is of interest.Let w = x T β + z + ǫ denote the linear predictor at any given location.The prevalence is then given by prevalence = e w 1 + e w , and is estimated using (14).Figure 7 shows the predicted prevalence and prediction standard deviation for the region of interest.The map closely resembles the one reported by Diggle et al. (2007).We also note that the uncertainty of the prediction is higher where there are no sampled

Childhood malaria in the Gambia
In this study, presented by Diggle et al. (2002), the data are obtained by sampling children in k = 65 villages in the Gambia as shown in Figure 8.These villages are classified into 5 areas (respectively, south-west, north-west, centre, north-east, and south-east), and 42 of them belong to the primary health care structure (PHC) of the Ministry of Health, while the remaining 23 do not.
For the jth child in the ith village an indicator y ij of the presence of malaria in a blood sample was recorded, along with the child's age, whether or not it regularly slept under a bed net, and if so whether the net was treated with insecticide.
The p = 10 explanatory variables for each individual consist of the intercept, age (in years), a three-level effect for bed net (no net, untreated, treated), the amount of greenness at the corresponding village, an indicator of whether the village belongs to the PHC, and a five-level effect for the area.Note that some of these variables correspond to the individual within the village so the response variable is the indicator y ij of whether the jth child in the ith village is infected or not.
Let x ij denote the set of explanatory variables for the (i, j) observation.The linear predictor is then given by We adopt the binomial GGLM with logit link, i.e.For covariance, we use the exponential model.The prior distributions for the covariance parameters are chosen to be τ 2 ∼ IG(0.01, 0.01), σ 2 ∼ IG(0.01, 0.01), ρ ∼ U(1.5, 21.5), i.e. inverse gamma, inverse gamma, and uniform.
The posterior densities for these parameters are shown in Figure 9. Table 2 displays a summary of the parameter estimates and a 95% confidence interval for each parameter.
We consider prediction of the spatial random field in the region which is useful in identifying high-risk areas.The prediction, and its standard deviation are shown in Figure 10.The sampled locations are also indicated.Prediction far from the sampled sites is associated with high degree of uncertainty so it is not attempted.The pattern is similar to the one reported by Diggle et al. (2002).

Conclusion
In this paper a methodology for approximate inference in Bayesian geostatistical models is presented.The application of full scale approximation to the spatial covariance matrix is discussed and its implementation within INLA is demonstrated.Issues regarding the choice of parameters for the full-scale approximation are addressed.In terms of choosing the knots for the predictive process approximation we use spatial design for GGLM while for choosing the taper range, a graphical method is presented.Furthermore, we discuss prediction of transformed parameters using the The proposed method is compared against MCMC in a simulated example and found to have good performance.The only caveat is that INLA with the full scale approximation may exhibit bias when estimating the posterior distribution for the range parameter, therefore more research to this direction is needed.Despite this, it is evident from the two examples that our method can be a useful tool for geostatistical inference in GGLM.
The proposed model is especially useful for disease mapping as demonstrated by the examples.
Its potential for extending it to allow for autoregressive errors, in order to account for temporal effects, is left for future research.
A Selecting grid values for the variance parameters In this section we explain how we obtain a grid of values {θ 1 , . . ., θ a } for the discrete approximation to the posterior f (θ|y) when the number a is pre-specified.Since the elements of θ are positive, in practice we work with the logarithmic transformation of its elements, however in the following we don't make a distinction on the scale used.
The suggestion of Rue et al. (2009) is to create a grid within a confidence interval (of some high level, say 99%) obtained by maximising (20).In our case, the objective function for the maximisation is the logarithm of (20), i.e.where ∂ j = ∂ ∂θ j .
The matrix inside the curly brackets is of low dimension, and its inverse is available.The matrix D + AΣ S A T is N -dimensional, and, even though it may be sparse, computing its inverse directly may not be the best approach due to its high dimension.Using the key formula ( 22) for the matrix inverse, and noting that the matrix in parentheses is now k-dimensional and sparse, since it consists of a sum of two sparse matrices, can significantly ease computations.
Notable members of the exponential family include the binomial and Poisson distributions.The binomial model with logit link has distribution f (y|w) = exp {y w − m log(1 + e w )} m y , and the Poisson model with logarithmic link is f (y|w) = exp {y w − m exp(w)} /y!.These distributions have the general form f (y|w) = exp {y w − mκ(w)} a(y), . Examination of the figure allows us to assess the tradeoff between computing time and accuracy.4 Examples 4.1 A simulated example In this example the performance of INLA with and without the approximation to the covariance function is compared to MCMC.

Figure 1 :
Figure 1: Locations for the simulated example, indicated by •, and grid for the full scale approximation, indicated by ×.Prediction is considered at a central site (•) and a far site ( ).

Figure 3 :Figure 4 :
Figure 3: Posterior densities for (a) logarithm of sill, (b) range, and (c) intercept.The histogram shows the MCMC sample.The approximation using INLA and exact covariance matrix is shown by a solid line, the INLA with the full scale approximation is shown by a dashed line, and the INLA with the predictive process approximation is shown by a dotted line.The true parameter value is indicated by a triangle on the horizontal axis.

Figure 5 :
Figure 5: Posterior plots for the variance parameters.(a) Joint posterior of log(σ 2 ) and ρ using exact INLA, (b) Marginal posterior of log(σ 2 ), (c) Marginal posterior of ρ.The histogram is for the MCMC sample, the exact INLA is shown by a solid line and the full-scale INLA by a dadhed line.

Figure 6 :
Figure 6: Posterior for the regressor coefficients.The histogram is for the MCMC sample, the exact INLA is shown by a solid line and the full-scale INLA by a dashed line.

Figure 8 :
Figure 8: Sampled locations for the Gambia data from Ribeiro Jr and Diggle (2001).

Figure 10 :
Figure 10: Prediction of spatial random field for the Gambia malaria data (top) and prediction standard deviation (bottom).

Table 1 :
Parameter estimates for the loa loa prevalence in Cameroon using exact and approximate INLA.

Table 2 :
Parameter estimates of the Gambia malaria data.