---
title: "Getting Started with wnpmle"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting Started with wnpmle}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse  = TRUE,
  comment   = "#>",
  fig.width = 5,
  fig.height = 4.5
)
```

## Overview

The `wnpmle` package implements the **weighted nonparametric maximum likelihood
estimator (wNPMLE)** for the marginal mean of recurrent events in the presence
of a competing terminal event.

Competing events are commonly modeled as independent right-censorings, which
leads to overestimation of the marginal mean of a recurrent event. `wnpmle`
provides valid estimation and prediction via a weighted nonparametric maximum
likelihood estimation for two broad classes of semiparametric transformation
models.

| Models | Link function G(x) | Special case at param = 1 |
|-------|---------------------|---------------------------|
| Box-Cox transformation models (`model = "boxcox"`) | $((1 + x)^\rho - 1)/\rho$ | Ghosh–Lin ($\rho = 1$) |
| Logarithmic transformation models (`model = "log"`) | $\log(1 + r x)/r$ | Proportional odds ($r = 1$) |

Both are estimated via automatic differentiation using TMB, which provides
exact gradients and fast convergence.

---

## Installation

```{r, eval = FALSE}
install.packages("wnpmle")
```

Or from GitHub:

```{r, eval = FALSE}
# install.packages("remotes")
remotes::install_github("abellach/wnpmle")
```

---

## Quick start: bladder cancer data

The package ships with a pre-processed version of the `bladder` dataset from
the `survival` package, accessible via `bladder_prep()`.

```{r load}
library(wnpmle)

bdata       <- bladder_prep()
bdata_clean <- bdata[, c("id", "time", "status", "treat", "num", "size")]
head(bdata_clean)
```

---

## Fitting the models

### Ghosh-Lin model (Box-Cox, rho = 1)

```{r fit-bc}
fit_bc <- wnpmle_fit(
  Surv(time, status) ~ treat + num + size,
  data  = bdata_clean,
  id    = "id",
  model = "boxcox",
  rho   = 1,
  tau   = 59,
  se    = "sandwich_adj"
)
summary(fit_bc)
bl_bc <- baseline(fit_bc)
plot(bl_bc$time, bl_bc$Lambda, type = "s", lwd = 2,
     xlab = "Time (months)", ylab = expression(hat(Lambda)(t)),
     main = "Cumulative baseline mean (Ghosh-Lin)")
lines(bl_bc$time, bl_bc$lower, lty = 2, col = "grey50")
lines(bl_bc$time, bl_bc$upper, lty = 2, col = "grey50")
AIC(fit_bc)
BIC(fit_bc)
```

### Proportional odds model (logarithmic, r = 1)

```{r fit-log}
fit_log <- wnpmle_fit(
  Surv(time, status) ~ treat + num + size,
  data  = bdata_clean,
  id    = "id",
  model = "log",
  rho   = 1,
  tau   = 59,
  se    = "sandwich_adj"
)
summary(fit_log)
bl_log <- baseline(fit_log)
plot(bl_log$time, bl_log$Lambda, type = "s", lwd = 2,
     xlab = "Time (months)", ylab = expression(hat(Lambda)(t)),
     main = "Cumulative baseline mean (proportional odds)")
lines(bl_log$time, bl_log$lower, lty = 2, col = "grey50")
lines(bl_log$time, bl_log$upper, lty = 2, col = "grey50")
AIC(fit_log)
BIC(fit_log)
```

---

## Prediction

`predict()` evaluates the estimated marginal mean at new covariate values.
Here we compare the mean number of recurrences over time for a treated
vs. placebo patient, holding the number of initial tumours and tumour size
fixed at 1.

```{r predict}
newdat <- data.frame(treat = c(0, 1), num = c(1, 1), size = c(1, 1))

pred <- predict(fit_bc, newdata = newdat,
                times = seq(1, 50, by = 1))

plot(pred$time, pred$mu_1, type = "s", lwd = 2,
     xlab = "Time (months)", ylab = "Marginal mean number of recurrences",
     ylim = range(pred[, -1]))
lines(pred$time, pred$mu_2, lwd = 2, lty = 2, col = "firebrick")
legend("topleft", legend = c("Placebo", "Thiotepa"),
       lty = c(1, 2), col = c("black", "firebrick"), bty = "n")
```

---

## Choosing the transformation parameter

`plot_loglik()` sweeps over a grid of parameter values and plots the profile
log-likelihood for both models on a single axis. The logarithmic parameter r
is shown on the left (negative axis) and the Box-Cox parameter rho on the
right (positive axis), meeting at zero where the two models coincide.

The grids and the follow-up horizon tau are user-controlled. The function
fits `length(rho_grid) + length(r_grid)` models in total, so computation
time scales with grid size.

```{r loglik-plot, eval = FALSE}
plot_loglik(
  Surv(time, status) ~ treat + num + size,
  data     = bdata_clean,
  id       = "id",
  tau      = 59,
  rho_grid = seq(0.01, 1.2, by = 0.01),
  r_grid   = seq(0.01, 1.2, by = 0.01)
)
```

The filled circle marks rho = 1 (Ghosh-Lin) and the open circle marks
r = 1 (proportional odds). The optimal parameter values are printed to the
console.

---

## Standard errors

The adjusted sandwich variance estimator accounts for the estimation of the inverse
probability of censoring weights. Set `se = "none"` to skip SE computation,
which is useful when profiling over a grid of transformation parameters.

| Value | Description |
|-------|-------------|
| `"sandwich_adj"` | Sandwich variance estimator (default) |
| `"sandwich"` | Sandwich variance estimator without adjustment (approximation) |
| `"fisher"` | Inverse fisher information |
| `"none"` | No standard errors, faster, useful for profiling |

---

## S3 methods

| Method | Description |
|--------|-------------|
| `print(fit)` | Compact coefficient table with z-values and p-values |
| `summary(fit)` | Adds cumulative baseline at tau/4, tau/2, tau |
| `coef(fit)` | Named coefficient vector |
| `vcov(fit)` | Full variance-covariance matrix for (beta, Lambda) |
| `logLik(fit)` | Log-likelihood (for use with AIC / BIC) |
| `AIC(fit)` | Akaike information criterion |
| `BIC(fit)` | Bayesian information criterion |
| `baseline(fit)` | Cumulative baseline mean data frame with CIs |
| `predict(fit, newdata)` | Marginal mean at new covariate values |

---

## References

Bellach, A. and Kosorok, M.R. (2026). Weighted NPMLE for the marginal mean of recurrent events with a competing terminal event. *arXiv preprint* [arXiv:2605.25934](https://arxiv.org/abs/2605.25934).

Bellach, A., Kosorok, M.R., Rüschendorf, L. and Fine, J.P. (2019). Weighted NPMLE for the subdistribution of a competing risk. *Journal of the American Statistical Association*, 114(525), 259–270. [doi:10.1080/01621459.2017.1401540](https://doi.org/10.1080/01621459.2017.1401540).

Ghosh, D. and Lin, D.Y. (2002). Marginal regression models for recurrent and terminal events. *Statistica Sinica*, 12, 663–688. [doi:10.17615/pt0g-y207](https://doi.org/10.17615/pt0g-y207).

Zeng, D. and Lin, D.Y. (2006). Semiparametric transformation models with random effects for recurrent events. *Biometrika*, 93(3), 627–640. [doi:10.1093/biomet/93.3.627](https://doi.org/10.1093/biomet/93.3.627).