Performs stability selection based on gradient boosting.

stabsel(formula, data, family = "gaussian",
q, maxit, B = 100, thr = .9, fraction = 0.5, seed = NULL, ...)
## Plot selection frequencies.
# S3 method for stabsel
plot(x, show = NULL,
pal = function(n) gray.colors(n, start = 0.9, end = 0.3), ...)

## Arguments

formula |
A formula or extended formula. |

data |
A `data.frame` . |

family |
A `bamlss.family` object. |

q |
An integer specifying how many terms to select in each boosting run. |

maxit |
An integer specifying the maximum number of boosting iterations.
See `boost` . Either choose `q` or `maxit` as hyper-parameter
for regularization. |

B |
An integer. The boosting is run B times. |

thr |
Cut-off threshold of relative frequencies (between 0 and 1) for selection. |

fraction |
Numeric between 0 and 1. The fraction of data to be used in each
boosting run. |

seed |
A seed to be set before the stability selection. |

x |
A object of class stabsel. |

show |
Number of terms to be shown. |

pal |
Color palette for different model terms. |

… |
Not used yet in `stabsel` . |

## Value

A object of class stabsel.

## Details

`stabsel`

performs stability selection based on gradient
boosting (`boost`

): The boosting algorithm is run
`B`

times on a randomly drawn `fraction`

of the `data`

.
Each boosting run is stopped either when `q`

terms have been selected,
or when `maxit`

iterations have been performed, i.e. either `q`

or `maxit`

can be used to tune the regularization of the boosting.
After the boosting the relative selection frequencies are evaluated.
Terms with a relative selection frequency larger then `thr`

are suggested for a final regression model.

If neither `q`

nor `maxit`

has been specified, `q`

will be set to the square root of the number of columns in `data`

.

Gradient boosting does not depend on random numbers. Thus, the
individual boosting runs differ only in the subset of data which
is used.

## Examples

# NOT RUN {
## Simulate some data.
set.seed(111)
d <- GAMart()
n <- nrow(d)
## Add some noise variables.
for(i in 4:9)
d[[paste0("x",i)]] <- rnorm(n)
f <- paste0("~ ", paste("s(x", 1:9, ")", collapse = "+", sep = ""))
f <- paste(f, "+ te(lon,lat)")
f <- as.formula(f)
f <- list(update(f, num ~ .), f)
## Run stability selection.
sel <- stabsel(f, data = d, q = 6, B = 10)
plot(sel)
## Estimate selected model.
nf <- formula(sel)
b <- bamlss(nf, data = d)
plot(b)
# }