Spatial Location-Scale Model • bamlss

This example is taken from the R2BayesX package (Umlauf et al. 2015) and is about undernutrition of new born children in Zambia. The data is loaded with

data("ZambiaNutrition", package = "R2BayesX")
head(ZambiaNutrition)

##        stunting  mbmi agechild district memployment meducation urban gender
## 1893 -0.6817840 22.33       29       12         yes         no   yes   male
## 1894 -1.5876900 22.33       57       12         yes         no   yes female
## 1895  0.3110890 18.66       16       12         yes         no   yes female
## 1896  0.0067044 18.66       46       12         yes         no   yes   male
## 1897  1.8765000 24.00       28       12         yes    primary   yes female
## 1898  0.1009190 24.22        9       12         yes    primary   yes   male

Here, the primary interest is to model the dependence of stunting of newborn children, with an age ranging from 0 to 5 years, on covariates such as the body mass index of the mother, the age of the child and others. Moreover, we apply a full distributional regression model with \[ \texttt{stunting} \sim \mathcal{N}(\mu = \eta_{\mu}, \log(\sigma) = \eta_{\sigma})), \] where the predictors \(\eta_{\mu}\) and \(\eta_{\sigma}\) are specified by the following formula

f <- list(
  stunting ~ memployment + urban + gender + meducation +
    s(mbmi) + s(agechild) + s(district, bs = "mrf", xt = list("penalty" = K)) +
    s(district, bs = "re"),
  sigma  ~ memployment + urban + gender + meducation +
    s(mbmi) + s(agechild) + s(district, bs = "mrf", xt = list("penalty" = K)) +
    s(district, bs = "re")
)

Note that for setting up the Markov random field smooth term a penalty matrix K needs to be provided. The penalty matrix forces penalization for neighboring regions of the districts in Zambia. To compute the K matrix, we need the spatial information about the regions in Zambia, which is shipped as a "bnd" object in the R2BayesX package and can be loaded with

data("ZambiaBnd", package = "R2BayesX")

The K matrix can then be computed using function neighbormatrix()

K <- neighbormatrix(ZambiaBnd)
head(K)

##    87 76 67 92 97 94 86 85 83 71 45 44 41 32 96 95 91 74 63 65 62 61 53 52 42
## 87  1  0  0  0  0 -1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 76  0  3  0  0  0  0  0 -1 -1 -1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 67  0  0  2  0  0  0  0  0  0 -1 -1  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 92  0  0  0  3 -1 -1  0  0  0  0  0  0  0  0 -1  0  0  0  0  0  0  0  0  0  0
## 97  0  0  0 -1  5 -1  0  0  0  0  0  0  0  0 -1 -1 -1  0  0  0  0  0  0  0  0
## 94 -1  0  0 -1 -1  6 -1  0  0  0  0  0  0  0  0  0 -1 -1  0  0  0  0  0  0  0
##    34 99 93 98 84 75 88 81 43 89 73 72 68 69 66 36 35 26 23 14 12 13 11 33 28
## 87  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 76  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 67  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 92  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 97  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 94  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
##    22 21 15 31 25 24 27
## 87  0  0  0  0  0  0  0
## 76  0  0  0  0  0  0  0
## 67  0  0  0  0  0  0  0
## 92  0  0  0  0  0  0  0
## 97  0  0  0  0  0  0  0
## 94  0  0  0  0  0  0  0

## Also need to transform to factor for
## setting up the MRF smooth.
ZambiaNutrition$district <- as.factor(ZambiaNutrition$district)

## Now note that not all regions are observed,
## therefore we need to remove those regions
## from the penalty matrix
rn <- rownames(K)
lv <- levels(ZambiaNutrition$district)
i <- rn %in% lv
K <- K[i, i]

Then, the model can be estimated with

set.seed(321)
b <- bamlss(f, data = ZambiaNutrition, family = "gaussian")

and the estimated univariate effects are plotted with

plot(b)

The plot indicates that only the effect of variable mbmi on the standard deviation is not significant according the 95% credible intervals and basically follows the zero horizontal line.

To visualize the structured and unstructured spatial effects we predict using the district information

## First, note that we have the structured id = 'mrf1' and unstructured
## spatial effect id = 're2', also indicated in the model summary
summary(b)

## 
## Call:
## bamlss(formula = f, family = "gaussian", data = ZambiaNutrition)
## ---
## Family: gaussian 
## Link function: mu = identity, sigma = log
## *---
## Formula mu:
## ---
## stunting ~ memployment + urban + gender + meducation + s(mbmi) + 
##     s(agechild) + s(district, bs = "mrf", xt = list(penalty = K)) + 
##     s(district, bs = "re")
## -
## Parametric coefficients:
##                        Mean      2.5%       50%     97.5% parameters
## (Intercept)        0.112453  0.054741  0.112688  0.164649      0.111
## memploymentno     -0.009098 -0.033846 -0.009121  0.017945     -0.010
## urbanno           -0.088765 -0.133416 -0.089598 -0.046433     -0.090
## genderfemale       0.059348  0.034577  0.059318  0.083425      0.059
## meducationno      -0.180022 -0.236665 -0.180923 -0.124206     -0.176
## meducationprimary -0.054656 -0.106017 -0.055213 -0.003093     -0.055
## -
## Acceptance probability:
##       Mean 2.5% 50% 97.5%
## alpha    1    1   1     1
## -
## Smooth terms:
##                                  Mean      2.5%       50%     97.5% parameters
## s(mbmi).tau21               2.084e-02 6.699e-05 1.876e-03 1.555e-01      0.000
## s(mbmi).alpha               1.000e+00 1.000e+00 1.000e+00 1.000e+00         NA
## s(mbmi).edf                 1.371e+00 1.004e+00 1.101e+00 2.995e+00      0.968
## s(agechild).tau21           1.781e+00 3.113e-01 1.137e+00 7.115e+00      5.961
## s(agechild).alpha           1.000e+00 1.000e+00 1.000e+00 1.000e+00         NA
## s(agechild).edf             6.351e+00 4.844e+00 6.292e+00 8.056e+00      7.956
## s(district,id='mrf1').tau21 3.519e-03 9.780e-04 3.370e-03 6.699e-03      0.006
## s(district,id='mrf1').alpha 1.000e+00 1.000e+00 1.000e+00 1.000e+00         NA
## s(district,id='mrf1').edf   3.008e+01 1.814e+01 3.066e+01 3.737e+01     35.955
## s(district,id='re2').tau21  4.176e-03 6.933e-05 1.175e-03 2.426e-02      0.000
## s(district,id='re2').alpha  1.000e+00 1.000e+00 1.000e+00 1.000e+00         NA
## s(district,id='re2').edf    1.104e+01 4.409e-01 6.224e+00 3.601e+01      0.000
## ---
## Formula sigma:
## ---
## sigma ~ memployment + urban + gender + meducation + s(mbmi) + 
##     s(agechild) + s(district, bs = "mrf", xt = list(penalty = K)) + 
##     s(district, bs = "re")
## -
## Parametric coefficients:
##                        Mean      2.5%       50%     97.5% parameters
## (Intercept)       -0.136367 -0.177225 -0.136755 -0.092489     -0.165
## memploymentno     -0.015910 -0.038422 -0.015772  0.004915     -0.017
## urbanno            0.036322 -0.002975  0.035881  0.071793      0.022
## genderfemale       0.026088  0.006519  0.025961  0.047295      0.026
## meducationno       0.031518 -0.017891  0.031465  0.075792      0.032
## meducationprimary -0.030673 -0.073803 -0.029815  0.009465     -0.029
## -
## Acceptance probability:
##         Mean   2.5%    50% 97.5%
## alpha 0.9278 0.6580 0.9852     1
## -
## Smooth terms:
##                                  Mean      2.5%       50%     97.5% parameters
## s(mbmi).tau21               1.850e-01 8.480e-05 9.515e-03 1.154e+00     10.289
## s(mbmi).alpha               9.611e-01 7.032e-01 9.979e-01 1.000e+00         NA
## s(mbmi).edf                 2.111e+00 1.007e+00 1.551e+00 5.379e+00      7.754
## s(agechild).tau21           9.199e-02 3.791e-03 3.750e-02 5.667e-01      0.018
## s(agechild).alpha           9.666e-01 8.027e-01 9.965e-01 1.000e+00         NA
## s(agechild).edf             3.394e+00 1.836e+00 3.218e+00 5.938e+00      2.691
## s(district,id='mrf1').tau21 1.548e-03 5.780e-04 1.437e-03 3.226e-03      0.000
## s(district,id='mrf1').alpha 7.536e-01 1.863e-01 8.365e-01 1.000e+00         NA
## s(district,id='mrf1').edf   2.588e+01 1.697e+01 2.607e+01 3.447e+01      0.001
## s(district,id='re2').tau21  4.062e-03 9.555e-05 3.259e-03 1.379e-02      0.017
## s(district,id='re2').alpha  8.480e-01 3.013e-01 9.636e-01 1.000e+00         NA
## s(district,id='re2').edf    1.737e+01 8.979e-01 1.755e+01 3.454e+01     36.606
## ---
## Sampler summary:
## -
## DIC = 12616.75 logLik = -6262.522 pd = 91.7017
## runtime = 150.96
## ---
## Optimizer summary:
## -
## AICc = 12617.72 edf = 103.9308 logLik = -6202.629
## logPost = -35164.45 nobs = 4847 runtime = 19.623

## Now, to predict the spatial effects we set up new data.
nd <- data.frame("district" = unique(ZambiaNutrition$district))

## Predict for the structured spatial effects.
p_str <- predict(b, newdata = nd, term = "s(district,id='mrf1')", intercept = FALSE)

## And the unstructured spatial effect.
p_unstr <- predict(b, newdata = nd, term = "s(district,id='re2')", intercept = FALSE)

Now, to visualize the effects we plot all maps using the same range for the color legends

r_mu <- range(c(p_str$mu, p_unstr$mu))
r_mu <- c(-1 * max(abs(r_mu)), max(abs(r_mu)))

r_sigma <- range(c(p_str$sigma, p_unstr$sigma))
r_sigma <- c(-1 * max(abs(r_sigma)), max(abs(r_sigma)))

and plot the effects using a diverging color legend.

## MRF smooth effect.
plotmap(ZambiaBnd, x = p_str$mu, id = nd$district, color = diverge_hcl, range = r_mu,
  main = expression(mu), shift = 0.1, title = "MRF", mdensity = 20)
plotmap(ZambiaBnd, x = p_str$sigma, id = nd$district, color = diverge_hcl, range = r_sigma,
  main = expression(sigma), shift = 0.1, title = "MRF", mdensity = 20)

## Random effects.
plotmap(ZambiaBnd, x = p_unstr$mu, id = nd$district, color = diverge_hcl, range = r_mu,
  shift = 0.1, title = "Random effect", mdensity = 20)
plotmap(ZambiaBnd, x = p_unstr$sigma, id = nd$district, color = diverge_hcl, range = r_sigma,
  shift = 0.1, title = "Random effect", mdensity = 20)

The maps clearly show that the unstructured spatial effect seems to very small, if existent at all when looking the 95% credible intervals:

## Again predict, but now additionally compute 95% credible intervals
## using function c95().
p_unstr <- predict(b, newdata = nd, term = "s(district,id='re2')", intercept = FALSE, FUN = c95)

## Test if all effects contain zero, i.e., are not significant
## according the 95% credible intervals.
all(p_unstr$mu[["2.5%"]] < 0 & p_unstr$mu[["97.5%"]] > 0)

## [1] TRUE

all(p_unstr$sigma[["2.5%"]] < 0 & p_unstr$sigma[["97.5%"]] > 0)

## [1] TRUE

References

Umlauf, Nikolaus, Daniel Adler, Thomas Kneib, Stefan Lang, and Achim Zeileis. 2015. “Structured Additive Regression Models: An R Interface to BayesX.” Journal of Statistical Software 63 (1): 1–46. https://doi.org/10.18637/jss.v063.i21.

Umlauf, Nikolaus, Nadja Klein, Achim Zeileis, and Thorsten Simon. 2024. bamlss: Bayesian Additive Models for Location Scale and Shape (and Beyond). https://CRAN.R-project.org/package=bamlss.