--- title: "Getting Started with wnpmle" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting Started with wnpmle} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 5, fig.height = 4.5 ) ``` ## Overview The `wnpmle` package implements the **weighted nonparametric maximum likelihood estimator (wNPMLE)** for the marginal mean of recurrent events in the presence of a competing terminal event. Competing events are commonly modeled as independent right-censorings, which leads to overestimation of the marginal mean of a recurrent event. `wnpmle` provides valid estimation and prediction via a weighted nonparametric maximum likelihood estimation for two broad classes of semiparametric transformation models. | Models | Link function G(x) | Special case at param = 1 | |-------|---------------------|---------------------------| | Box-Cox transformation models (`model = "boxcox"`) | $((1 + x)^\rho - 1)/\rho$ | Ghosh–Lin ($\rho = 1$) | | Logarithmic transformation models (`model = "log"`) | $\log(1 + r x)/r$ | Proportional odds ($r = 1$) | Both are estimated via automatic differentiation using TMB, which provides exact gradients and fast convergence. --- ## Installation ```{r, eval = FALSE} install.packages("wnpmle") ``` Or from GitHub: ```{r, eval = FALSE} # install.packages("remotes") remotes::install_github("abellach/wnpmle") ``` --- ## Quick start: bladder cancer data The package ships with a pre-processed version of the `bladder` dataset from the `survival` package, accessible via `bladder_prep()`. ```{r load} library(wnpmle) bdata <- bladder_prep() bdata_clean <- bdata[, c("id", "time", "status", "treat", "num", "size")] head(bdata_clean) ``` --- ## Fitting the models ### Ghosh-Lin model (Box-Cox, rho = 1) ```{r fit-bc} fit_bc <- wnpmle_fit( Surv(time, status) ~ treat + num + size, data = bdata_clean, id = "id", model = "boxcox", rho = 1, tau = 59, se = "sandwich_adj" ) summary(fit_bc) bl_bc <- baseline(fit_bc) plot(bl_bc$time, bl_bc$Lambda, type = "s", lwd = 2, xlab = "Time (months)", ylab = expression(hat(Lambda)(t)), main = "Cumulative baseline mean (Ghosh-Lin)") lines(bl_bc$time, bl_bc$lower, lty = 2, col = "grey50") lines(bl_bc$time, bl_bc$upper, lty = 2, col = "grey50") AIC(fit_bc) BIC(fit_bc) ``` ### Proportional odds model (logarithmic, r = 1) ```{r fit-log} fit_log <- wnpmle_fit( Surv(time, status) ~ treat + num + size, data = bdata_clean, id = "id", model = "log", rho = 1, tau = 59, se = "sandwich_adj" ) summary(fit_log) bl_log <- baseline(fit_log) plot(bl_log$time, bl_log$Lambda, type = "s", lwd = 2, xlab = "Time (months)", ylab = expression(hat(Lambda)(t)), main = "Cumulative baseline mean (proportional odds)") lines(bl_log$time, bl_log$lower, lty = 2, col = "grey50") lines(bl_log$time, bl_log$upper, lty = 2, col = "grey50") AIC(fit_log) BIC(fit_log) ``` --- ## Prediction `predict()` evaluates the estimated marginal mean at new covariate values. Here we compare the mean number of recurrences over time for a treated vs. placebo patient, holding the number of initial tumours and tumour size fixed at 1. ```{r predict} newdat <- data.frame(treat = c(0, 1), num = c(1, 1), size = c(1, 1)) pred <- predict(fit_bc, newdata = newdat, times = seq(1, 50, by = 1)) plot(pred$time, pred$mu_1, type = "s", lwd = 2, xlab = "Time (months)", ylab = "Marginal mean number of recurrences", ylim = range(pred[, -1])) lines(pred$time, pred$mu_2, lwd = 2, lty = 2, col = "firebrick") legend("topleft", legend = c("Placebo", "Thiotepa"), lty = c(1, 2), col = c("black", "firebrick"), bty = "n") ``` --- ## Choosing the transformation parameter `plot_loglik()` sweeps over a grid of parameter values and plots the profile log-likelihood for both models on a single axis. The logarithmic parameter r is shown on the left (negative axis) and the Box-Cox parameter rho on the right (positive axis), meeting at zero where the two models coincide. The grids and the follow-up horizon tau are user-controlled. The function fits `length(rho_grid) + length(r_grid)` models in total, so computation time scales with grid size. ```{r loglik-plot, eval = FALSE} plot_loglik( Surv(time, status) ~ treat + num + size, data = bdata_clean, id = "id", tau = 59, rho_grid = seq(0.01, 1.2, by = 0.01), r_grid = seq(0.01, 1.2, by = 0.01) ) ``` The filled circle marks rho = 1 (Ghosh-Lin) and the open circle marks r = 1 (proportional odds). The optimal parameter values are printed to the console. --- ## Standard errors The adjusted sandwich variance estimator accounts for the estimation of the inverse probability of censoring weights. Set `se = "none"` to skip SE computation, which is useful when profiling over a grid of transformation parameters. | Value | Description | |-------|-------------| | `"sandwich_adj"` | Sandwich variance estimator (default) | | `"sandwich"` | Sandwich variance estimator without adjustment (approximation) | | `"fisher"` | Inverse fisher information | | `"none"` | No standard errors, faster, useful for profiling | --- ## S3 methods | Method | Description | |--------|-------------| | `print(fit)` | Compact coefficient table with z-values and p-values | | `summary(fit)` | Adds cumulative baseline at tau/4, tau/2, tau | | `coef(fit)` | Named coefficient vector | | `vcov(fit)` | Full variance-covariance matrix for (beta, Lambda) | | `logLik(fit)` | Log-likelihood (for use with AIC / BIC) | | `AIC(fit)` | Akaike information criterion | | `BIC(fit)` | Bayesian information criterion | | `baseline(fit)` | Cumulative baseline mean data frame with CIs | | `predict(fit, newdata)` | Marginal mean at new covariate values | --- ## References Bellach, A. and Kosorok, M.R. (2026). Weighted NPMLE for the marginal mean of recurrent events with a competing terminal event. *arXiv preprint* [arXiv:2605.25934](https://arxiv.org/abs/2605.25934). Bellach, A., Kosorok, M.R., Rüschendorf, L. and Fine, J.P. (2019). Weighted NPMLE for the subdistribution of a competing risk. *Journal of the American Statistical Association*, 114(525), 259–270. [doi:10.1080/01621459.2017.1401540](https://doi.org/10.1080/01621459.2017.1401540). Ghosh, D. and Lin, D.Y. (2002). Marginal regression models for recurrent and terminal events. *Statistica Sinica*, 12, 663–688. [doi:10.17615/pt0g-y207](https://doi.org/10.17615/pt0g-y207). Zeng, D. and Lin, D.Y. (2006). Semiparametric transformation models with random effects for recurrent events. *Biometrika*, 93(3), 627–640. [doi:10.1093/biomet/93.3.627](https://doi.org/10.1093/biomet/93.3.627).