This vignette is based on Yee (2010). The data is about the marital status of white male in New Zealand in the early 1990s. The aim of this analysis is to explore how the marital status varies with age. The data can be loaded with

data("marital.nz", package = "VGAM")
head(marital.nz)
##   age ethnicity            mstatus
## 1  29  European             Single
## 2  55  European  Married/Partnered
## 3  44  European  Married/Partnered
## 4  53  European Divorced/Separated
## 5  45  European  Married/Partnered
## 7  30  European             Single
levels(marital.nz$mstatus)
## [1] "Divorced/Separated" "Married/Partnered"  "Single"            
## [4] "Widowed"

The response mstatus has 4 levels. We use a multinomial logit model to estimate the age effect, therefore, one category needs to be specified as a reference category. The model can be estimated with

## Model formula, each category may
## have different model terms.
f <- list(
  mstatus ~ s(age),
          ~ s(age),
          ~ s(age)
)

## Set the seed for reproducibility.
set.seed(123)

## Estimate.
b <- bamlss(f, family = "multinomial", data = marital.nz,
  reference = "Married/Partnered")

The model summary gives

## 
## Call:
## bamlss(formula = f, family = "multinomial", data = marital.nz, 
##     reference = "Married/Partnered")
## ---
## Family: multinomial 
## Link function: DivorcedSeparated = log, Single = log, Widowed = log
## *---
## Formula DivorcedSeparated:
## ---
## DivorcedSeparated ~ s(age)
## -
## Parametric coefficients:
##               Mean   2.5%    50%  97.5% parameters
## (Intercept) -2.677 -2.808 -2.676 -2.555     -2.672
## -
## Acceptance probabilty:
##         Mean   2.5%    50% 97.5%
## alpha 0.9779 0.7894 1.0000     1
## -
## Smooth terms:
##                   Mean      2.5%       50%     97.5% parameters
## s(age).tau21 3.063e+00 3.532e-04 1.403e+00 1.669e+01      0.740
## s(age).alpha 8.895e-01 2.588e-01 9.878e-01 1.000e+00         NA
## s(age).edf   2.857e+00 1.003e+00 2.861e+00 5.043e+00      2.452
## ---
## Formula Single:
## ---
## Single ~ s(age)
## -
## Parametric coefficients:
##               Mean   2.5%    50%  97.5% parameters
## (Intercept) -2.499 -2.625 -2.500 -2.368     -2.483
## -
## Acceptance probabilty:
##         Mean   2.5%    50% 97.5%
## alpha 0.9891 0.8937 1.0000     1
## -
## Smooth terms:
##                   Mean      2.5%       50%     97.5% parameters
## s(age).tau21  57.96347  11.32286  43.68286 195.92876     38.101
## s(age).alpha   0.68755   0.01782   0.79914   1.00000         NA
## s(age).edf     6.26760   4.89733   6.26682   7.67148      6.152
## ---
## Formula Widowed:
## ---
## Widowed ~ s(age)
## -
## Parametric coefficients:
##               Mean   2.5%    50%  97.5% parameters
## (Intercept) -5.031 -5.451 -5.024 -4.663     -5.007
## -
## Acceptance probabilty:
##         Mean   2.5%    50% 97.5%
## alpha 0.9693 0.7280 1.0000     1
## -
## Smooth terms:
##                   Mean      2.5%       50%     97.5% parameters
## s(age).tau21 1.722e+00 1.205e-04 3.222e-02 2.267e+01      0.000
## s(age).alpha 9.475e-01 5.801e-01 9.992e-01 1.000e+00         NA
## s(age).edf   1.506e+00 1.000e+00 1.091e+00 4.177e+00      0.999
## ---
## Sampler summary:
## -
## DIC = 6548.443 logLik = -3266.665 pd = 15.1126
## runtime = 49.234
## ---
## Optimizer summary:
## -
## AICc = 6545.377 converged = 1 edf = 12.6029
## logLik = -3260.057 logPost = -3282.814 nobs = 6053
## runtime = 4.771
## ---

and suggests reasonable acceptance rates. However, for a final model run it is recommend to increase the number of iteration, the burn-in phase as well as the thinning parameter of the sampling engine function GMCMC(). The estimated effects can be plotted with

par(mfrow = c(1, 3), mar = c(4.1, 4.1, 0.1, 1.1))
plot(b)

To calculate estimated age dependent probabilities for each category, we use the predict.bamlss() method.

## Create a data frame for prediction.
nd <- data.frame("age" = seq(20, 90, length = 100))

## Predict for the three levels.
p <- predict(b, newdata = nd, type = "parameter")

Now, note that the specification of the predictors in the multinomial_bamlss() family is based on a logarithmic link, therefore, to compute the probabilities we run the following code:

probs <- list()
for(j in names(p))
  probs[[j]] <- p[[j]] / (1 + rowSums(do.call("cbind", p)))
probs <- as.data.frame(probs)
probs[["MarriedPartnered"]] <- 1 - rowSums(probs)

The estimated probabilities can then be visualized with:

par(mar = c(4.1, 4.1, 1.1, 1.1))
plot2d(probs ~ age, data = nd, col.lines = rainbow_hcl(4),
  lwd = 2, scheme = 1, ylab = "Fitted probabilities", ylim = c(0, 1))
legend("center", names(probs), lty = 1, lwd = 2, col = rainbow_hcl(4))

References

Umlauf, Nikolaus, Nadja Klein, Achim Zeileis, and Thorsten Simon. 2019. bamlss: Bayesian Additive Models for Location Scale and Shape (and Beyond). https://CRAN.R-project.org/package=bamlss.

Yee, Thomas W. 2010. “The VGAM Package for Categorical Data Analysis.” Journal of Statistical Software 32 (10): 1–34. doi:10.18637/jss.v032.i10.