Simulates longitudinal data with normal error and (Cox-type) survival times using the inversion method. The function simJM() is a wrapper specifying all predictors and the resulting data sets. The wrapper calls rJM() to sample the survival times, a modified version of rSurvtime() from the R package CoxFlexBoost.

simJM(nsub = 300, times = seq(0, 120, 1), probmiss = 0.75,
  long_setting = "functional",
  alpha_setting = if(nonlinear) "linear" else "nonlinear",
  dalpha_setting = "zero", sigma = 0.3, long_df = 6, tmax = NULL,
  seed = NULL, full = FALSE, file = NULL, nonlinear = FALSE,
  fac = FALSE)
  
rJM(hazard, censoring, x, r, 
  subdivisions = 1000, tmin = 0, tmax,
  file = NULL, ...)

Arguments

nsub

number of individuals for which longitudinal data and survival times should be simulated.

times

vector of time points at which longitudinal measurements are "sampled".

probmiss

proportion of longitudinal measurements to be set to missing. Used to induce sparsity in the longitudinal measurements.

long_setting

Specification of the longitudinal trajectories of the sampled subjects. Preset specifications are "linear", "nonlinear" and "functional". See Details.

alpha_setting

specification of the association between survival and longitudinal. Preset specifications are "simple", "linear", "nonlinear" and "nonlinear2". See Details.

dalpha_setting

specification of the association between survival and the derivative of the longitudinal. Work in progress.

sigma

standard deviation of the normal error around the true longitudinal measurements.

long_df

number of basis functions from which functional random intercepts are sampled.

tmax

For function simJM(), longest possible survival time, observations are censored after that timepoint. Defaults to max(times) and should not be specified longer than max(times) for longitudinal setting "functional". For function rJM(), latest time point to sample a survival time.

seed

numeric scalar setting the random seed.

full

logical indicating if only the longitudinal data set should be returned (FALSE) or additionally also the data for the survival part evaluated on a regular time grid and the longitudinal data set without longitudinal missings (TRUE).

file

name of the data file the generated data set should be stored into (e.g., "simdata.RData") or NULL if the dataset should directly be returned in R.

nonlinear

If set to TRUE, a nonlinear interaction between alpha and mu is simulated.

fac

If set to TRUE, a smooth interaction that varies by a factor is simulated.

hazard

complete hazard function to specify the joint model. Time must be the first argument.

censoring

function to compute (random) censoring.

x

matrix of sampled covariate values.

r

matrix of sampled random coefficients.

subdivisions

the maximum number of subintervals for the integration.

tmin

earliest time point to sample a survival time.

...

further arguments to be passed to hazard or censoring.

Details

The function simulates longitudinal data basing on the given specification at given times. The full hazard is built from all joint model predictors \(\eta_{\mu}\), \(\eta_{\sigma}\), \(\eta_{\lambda}\), \(\eta_{\gamma}\), \(\eta_{\alpha}\) as presented in Koehler, Umlauf, and Greven (2016), see also jm_bamlss. Survival times are sampled using the inversion method (cf. Bender, Augustin, & Blettner, 2005). Additional censoring and missingness is introduced. The longitudinal information is censored according to the survival information. The user can also specify own predictors and use only rJM to simulate survival times accordingly.

Pre-specified functions for \(\eta_{\mu}\) in long_setting are for linear $$\eta_{\mu i}(t) = 1.25 + r_{1i} + 0.6 \sin(x_{2i}) + (-0.01) t + 0.02 r_{2i} t$$, for nonlinear $$\eta_{\mu i}(t) = 0.5 + r_{1i} + 0.6 \sin(x_{2i}) + 0.1 (t+1) \exp(-0.075 t)$$ and for functional $$\eta_{\mu i}(t) = 0.5 + r_{1i} + 0.6 \sin(x_{2i}) + 0.1 (t+1) \exp(-0.075 t) + \sum_k \beta_{ki} B(t)$$, where \(B(.)\) denotes a B-spline basis function and \(\beta_{ki}\) are the sampled penalized coefficients from gen_b per person.

Prespecified functions for \(\eta_{\alpha}\) in alpha_setting are for constant $$\eta_{\alpha}(t) = 1$$, for linear $$\eta_{\alpha}(t) = 1 - 0.015 t$$, for nonlinear $$\eta_{\alpha}(t) = \cos((time-20)/20)$$, and for nonlinear $$\eta_{\alpha}(t) = \cos((time-33)/33)$$.

Additionally the fixed functions for \(\eta_{\lambda} = 0.1(t+2)\exp(-0.075t)\) and \(\eta_{\lambda} = 0.1(t+2)\exp(-0.075t)\) are employed.

Value

For full = TRUE a list of the three data.frames is returned:

data

Simulated dataset in long format including all longitudinal and survival covariates.

data_grid

Dataset of the time-varying survival predictors which are not subject specific, evaluated at a grid of fixed time points.

data_full

Simulated data set prior to generating longitudinal missings. Useful to assess the longitudinal fit.

For full = FALSE only the first dataset is returned.

Covariates within these datasets include a subject identifier id, the sampled survival times survtime, the event indicator event, the time points of longitudinally "observed" measurements obstime, the longitudinal response y, the cumulative hazard at the survival time cumhaz, as well as covariates x1, x2, random effects

r1, r2, b1, ..., and the true predictors alpha, lambda, gamma, mu, sigma.

References

Hofner, B (2016). CoxFlexBoost: Boosting Flexible Cox Models (with Time-Varying Effects). R package version 0.7-0.

Bender, R., Augustin, T., and Blettner, M. (2005). Generating Survival Times to Simulate Cox Proportional Hazards Models. Statistics in Medicine, 24, 1713-1723.

Koehler N, Umlauf N, Beyerlein, A., Winkler, C., Ziegler, A., and Greven S (2016). Flexible Bayesian Additive Joint Models with an Application to Type 1 Diabetes Research. (submitted)

See also

Examples

if (FALSE) ## Simulate survival data
## with functional random intercepts and a nonlinear effect 
## of time, time-varying association alpha.
d <- simJM(nsub = 300)
head(d)
#> Error in eval(expr, envir, enclos): object 'd' not found

## Simulate survival data
## with random intercepts/slopes and a linear effect of time,
## constant association alpha.
d <- simJM(nsub = 200, long_setting = "linear", 
  alpha_setting = "constant")
head(d)
#>      id survtime event        x1        x2 x3         r1       r2        b1
#> 1     1 67.63766     0 0.5462693 -2.379859  1 0.04729362 0.318103 -1.298252
#> 1.8   1 67.63766     0 0.5462693 -2.379859  1 0.04729362 0.318103 -1.298252
#> 1.12  1 67.63766     0 0.5462693 -2.379859  1 0.04729362 0.318103 -1.298252
#> 1.13  1 67.63766     0 0.5462693 -2.379859  1 0.04729362 0.318103 -1.298252
#> 1.24  1 67.63766     0 0.5462693 -2.379859  1 0.04729362 0.318103 -1.298252
#> 1.34  1 67.63766     0 0.5462693 -2.379859  1 0.04729362 0.318103 -1.298252
#>             b2        b3         b4        b5       b6   cumhaz obstime dalpha
#> 1    -0.768504 -0.396014 -0.2152782 0.2827463 0.439171 1.125407       0      0
#> 1.8  -0.768504 -0.396014 -0.2152782 0.2827463 0.439171 1.125407       8      0
#> 1.12 -0.768504 -0.396014 -0.2152782 0.2827463 0.439171 1.125407      12      0
#> 1.13 -0.768504 -0.396014 -0.2152782 0.2827463 0.439171 1.125407      13      0
#> 1.24 -0.768504 -0.396014 -0.2152782 0.2827463 0.439171 1.125407      24      0
#> 1.34 -0.768504 -0.396014 -0.2152782 0.2827463 0.439171 1.125407      34      0
#>             mu    lambda alpha     gamma          dmu     sigma         y
#> 1    0.8831874 0.3802428     1 -4.938723 -0.003637939 -1.203973 0.5016701
#> 1.8  0.8540839 0.3802428     1 -4.938723 -0.003637939 -1.203973 1.1900108
#> 1.12 0.8395321 0.3802428     1 -4.938723 -0.003637939 -1.203973 1.3382066
#> 1.13 0.8358942 0.3802428     1 -4.938723 -0.003637939 -1.203973 0.6504030
#> 1.24 0.7958768 0.3802428     1 -4.938723 -0.003637939 -1.203973 0.6490512
#> 1.34 0.7594974 0.3802428     1 -4.938723 -0.003637939 -1.203973 0.8872736