University of Pennsylvania Press

We present the lmtp R package for causal inference from longitudinal observational or randomized studies. This package implements the estimators of Díaz et al. (2021) for estimating general non-parametric causal effects based on modified treatment policies. Modified treatment policies generalize static and dynamic interventions, making lmtp and all-purpose package for non-parametric causal inference in observational studies. The methods provided can be applied to both point-treatment and longitudinal settings, and can account for time-varying exposure, covariates, and right censoring thereby providing a very general tool for causal inference. Additionally, two of the provided estimators are based on flexible machine learning regression algorithms, and avoid bias due to parametric model misspecification while maintaining valid statistical inference.


Causal inference, longitudinal data, non-parametric, modified treatment policies

1. Introduction

Modified treatment policies (MTPs) are a recently developed class of interventions that generalize many interesting and widely used parameters in causal inference. They generalize static and dynamic interventions for categorical exposure variables, but also provide a framework to define interesting parameters for continuous or multivariate exposures. The fundamental ideas date back to Robins et al. (2004). In a series of recent articles, these methods have been further developed to allow for more flexible data types and estimation algorithms (Díaz and van der Laan, 2012; Haneuse and Rotnitzky, 2013; Young et al., 2014). Specifically, two recent papers (Díaz et al., 2021, 2022) have generalized the MTP framework to account for (1) time-varying treatment and covariates, (2) informative right-censoring, (3) competing risks, (4) machine learning estimation of nuisance parameters.

In their general formulation, MTPs are hypothetical interventions where the post-intervention value of exposure can depend on the actual observed exposure level and the [End Page 103] unit’s history. As such, MTPs are useful to assess the effect of continuous treatments. For example, Haneuse and Rotnitzky (2013) assess the effect of reducing surgery time by a predetermined amount (e.g., 5 minutes) for lung cancer patients, where the reduction is carried out only for those patients for whom the intervention is feasible. Furthermore, as explained above, MTPs generalize many important causal inference estimands, such as the effect of a dynamic treatment rule in which the treatment level is assigned as a function of a unit’s history. For example, dynamic treatment rules, a particular case of MTPs, may be used to estimate the effect of policies such as switching HIV treatment once the CD4 T-cell count goes below a predetermined threshold (Petersen et al., 2014). MTPs also generalize many interesting causal effects such as the average treatment effects, the causal risk ratio, and causal odds ratio. In this article we describe how lmtp can be used for estimating the causal effects of MTPs, and present examples on the use of the software for several of the above cases.

Flexible and general software for estimation of causal effects in longitudinal studies is currently limited. The ipw and gfoRmula (van der Wal and Geskus, 2011; McGrath et al., 2020) packages provide routines for estimating causal effects using inverse probability weighting (IPW) and the parametric g-formula respectively. The validity of these methods, however, requires making unrealistic parametric assumptions of the underlying data generating distributions and are not doubly-robust. The ltmle and survtmle (Lendle et al., 2017; Benkeser and Hejazi, 2019) packages implement a doubly-robust method for estimating causal effects from longitudinal data, but these packages do not support continuous valued exposures. The txshift (Hejazi and Benkeser, 2020) package provides a method for estimating effects of continuous, point-treatment exposures using MTPs defined as additive shifts. In contrast, lmtp can be used for estimating causal effects for binary, categorical, and continuous valued exposures from longitudinal data using state-of-the-art methodologies that may incorporate machine learning and avoid parametric modeling assumptions.

The goal of this paper is to introduce the lmtp package and to illustrate its capabilities. The package implements four methods for estimating the effects of MTPs. The first two estimators are simple estimation strategies based on g-computation and inverse probability weighting. The sampling distributions of these estimators under machine learning of the nuisance parameters are unknown, and therefore we do not recommend their use in practice. The other two estimators, a targeted minimum-loss based estimator (TMLE)(Laan and Rose, 2011; Laan and Rubin, 2006) and a sequentially doubly-robust estimator (SDR) (Buckley and James, 1979; Fan and Gijbels, 1994; van der Laan and Dudoit, 2003; Rotnitzky et al., 2006; Rubin and Laan, 2006; Kennedy et al., 2017), are multiply-robust and asymptotically normal under conditions, even when the nuisance parameters are fitted with machine learning. TMLE and SDR are implemented using cross-fitting to allow for the use of flexible machine learning regression methodology (Díaz et al., 2021). The package may be download from CRAN at

2. Notation and modified treatment policies

2.1 Data structure

We start the paper with a brief introduction to MTPs. The interested reader is encouraged to consult Díaz et al. (2021) and Díaz et al. (2022) for more technical details and theoretical [End Page 104] results. We will use the notation of Díaz et al. (2021) with only slight modifications. Let i be the index of an observation from a data set with n total units and t be the index of time for a total number of time points τ . The observed data for observation i may be denoted as

inline graphic

where W denotes baseline covariates, Lt denotes time-varying covariates, At denotes a vector of exposure and/or censoring variables and Y denotes an outcome measured at the end of study follow-up. We observe n i.i.d. copies of Z with distribution P. Upper case reference (At) to a variable indicates a random variable while lower case reference (at) indicates a realization of that random variable. If right-censoring exists, At can be adapted so that At = (A1,t, A2,t) where A1,t equals one if an observation is still in the study at time t and zero otherwise, and A2,t denotes the exposure at time t. We use an overbar to indicate the history of a variable up until time t. We then use Ht = (L̄t,Āt−1) to denote the history of all variables up until just before At.

2.2 Modified treatment policies

We use the potential outcomes framework to define the causal effect of interest using our established data structure. We consider a hypothetical policy where Ā is set to a regime d defined as Adt = dt(At, Hdt), where Hdt = (L̄t,Ādt−1), for a set of user-given regimes dt : t ∈ {1, ..., τ }. The defining characteristic that makes regime dt a modified treatment policy is that it depends on the natural value of treatment At, that is, the value that the treatment would have taken under no intervention. However, when the function dt only depends on Ht, the LMTP reduces to the dynamic treatment regimes studied in the literature. Furthermore, when dt is a constant and does not depend on either At or Ht, then LMTPs reduce to the conventional static rules studied in the causal inference literature (e.g., Bang and Robins, 2005; van der Laan and Gruber, 2011). Below we present examples of all these interventions.

First, consider a study of the effect of physical activity on mortality in the elderly. Assume that each patient is monitored at several time points, and that a measure of physical activity such as the metabolic equivalent of task (MET) (Mendes et al., 2018) is measured together with a number of lifestyle, health status, and demographic variables. In this setup, a natural question to ask would be “what is the effect on mortality of an intervention that increases physical activity by δ units for patients whose socioeconomic and health status allows it?” Formally, consider a longitudinal study with loss-to-follow-up. Let At = (A1,t, A2,t) where A1,t equals one if an observation is still in the study at time t and zero otherwise, and A2,t denote a continuous exposure at time t that can be changed through some intervention. A modified treatment policy that increases A2,t, whenever it is feasible to do so, can be defined as

inline graphic

where ut(ht) defines the maximum level of physical activity allowed for a patient with characteristics ht. Note that we also consider an intervention on A1,t because we are interested [End Page 105] in a hypothetical world where there is no loss-to-follow-up. In this case the hypothetical exposure after intervention, Adt depends on the actually observed exposure, At. This is in contrast to a deterministic intervention where Adt would be set to some pre-specified value with probability one.

For dynamic treatment rules, consider a hypothetical longitudinal study where two different antiviral treatments are administered to HIV positive patients. Sometimes an antiviral drug works at first, until the virus develops resistance, at which point it is necessary to change the treatment regime. Assume we are interested in assessing a policy with two treatments encoded as At {0, 1}, and we want to assess the effect of a regime that would switch the antiviral treatment as soon as the CD4 T cell count drops bellow 300. Let At = (A1,t, A2,t) where A1,t equals one if an observation is still in the study at time t and zero otherwise, and A2,t denotes the treatment arm at time t. Let Lt denote the CD4 T cell count at time t. In this case, one may decide to assess the effect of the rule

inline graphic

In contrast to the previous rule (2), the dynamic treatment rule (3) does not depend on the natural value of treatment at time t, it only depends on the history. The software and methods presented here handle both cases seamlessly.

In the case of a single time point setting where the data structure is Z = (W, A, Y ), it follows trivially from the above definitions that the average treatment effect from a cross-sectional study, defined as E[Y (1) − Y (0)], can be estimated using MTPs by simply letting τ = 1 and contrasting two MTPs d(A) = 1 and d(A) = 0. The lmtp package presented in this article allows the contrast of different MTPs using differences, ratios, and odds ratios. We provide examples of these contrasts in §4 below.

In what follows we focus on estimating the the causal effect of MTP d on outcome Y, using lmtp, through the causal parameter

inline graphic

where Y (Ād) is the counterfactual outcome in a world, where possibly contrary to fact, each entry ofĀ was modified according to the MTP d. When Y is continuous, θ is the mean population value of Y under MTP d; when Y is dichotomous, θ is the population proportion of event Y under MTP d. Similarly, when Y is the indicator of an event by end of the study, θ is defined as the cumulative incidence of Y under MTP d.

2.3 Identification

The ability to estimate θ depends on the ability to identify an expression for θ as a function of the data generating distribution P using only the observed data Z and not counterfactual variables Yd. A full review of these identification assumptions is outside the scope of this article but is presented in our technical paper (Díaz et al., 2021). Briefly, the following standard assumptions must hold

Assumption 1 (Consistency)Ā =ā = Y = Y (ā) for allā ∈ supp Ā [End Page 106]

Assumption 2 (Exchangeability) AtY (ā)|Ht for allā ∈ suppĀ and t ∈ {1, ..., τ }

Assumption 3 (Positivity) If (at, ht) supp{At, Ht} then (d(at, ht), ht) supp{At, Ht} for t ∈ {1, ..., τ }

The consistency assumption states that the potential outcome for an observation under their observed exposure is the value of the outcome that we did actually observe. The exchangeability assumption is often also referred to as the no-unmeasured confounding assumption; it is satisfied if all common causes of the exposure and outcome are measured and adjusted for.

Of particular importance to this article is the positivity assumption which states that the distribution of the exposure under the MTP is supported in the data. Concretely, in a study with a continuous exposure and loss-to-follow-up, the positivity assumption states that if an observation with covariate history ht and exposure at who was not lost-to-follow-up at time t exists then there is also an observation with covariate history ht who was not lost-to-follow-up at time t but whose exposure was observed as d(at, ht) that also exists. This assumption is often an issue when working with continuous exposures and/or multiple time points in the context of static or dynamic interventions. Note that, if the treatment is binary and dt(at, ht) = 1, the positivity assumption reduces to “if ht supp{Ht} then (1, ht) supp{At, Ht}”, which is the standard positivity assumption requiring that gt(1 | ht) > 0 whenever p(ht) > 0.

One strength of MTPs is that if the support of the exposure variable is known, then they may be formulated to avoid violations of the positivity assumption. For example,Haneuse and Rotnitzky (2013) consider the effect of an MTP reducing surgery operating times on post-surgery complications. There are real world constraints–ethical considerations, surgery characteristics, limits of the surgeon–that imply a lower bound on surgery operation time and therefore would imply positivity violations for certain regimes. Taking these constraints into consideration, the authors were able to formulate interventions that were in the support of the data. Another example and further discussion is given in Rudolph et al. (2022). Furthermore, the MTP methods presented here can be used to solve positivity problems in other interesting ways, such as defining incremental propensity score interventions that shift the propensity score in a risk-ratio scale Wen et al. (2021); Díaz et al. (2022).

Under these identification assumptions, the causal parameter θ is identified as follows. Set mτ+1 = Y . For t = τ, . . ., 1, recursively define

inline graphic

Then θ is identified as E[m1(Ad1, L1)]. To illustrate this recursive definition, consider an example when τ = 2. Then we have

inline graphic

[End Page 107]

which easily generalizes to many time points. In addition to the function mt, the description of the estimation methods below will benefit from the definition of the ratio between the density of the post-intervention exposure, Adt, and the density of the observed exposure, At. That is, define gdt(at | ht) as the density of Adt conditional on Ht = ht evaluated at ht. Likewise define gt(at | ht) as the conditional density of At. Our estimation procedure will make use of the density ratio defined as:

inline graphic

This density ratio reduces to the well-known inverse probability weights used in estimation of the ATE in the single-time point setting where Z = (W, A, Y ) and A is binary.

3. Estimating modified treatment policy effects

The literature on estimation of the effects of dynamic treatment regimes dt(ht) that do not depend on at is rich. Compared to this literature, estimation of the effect of modified treatment policies d(at, ht) that also depend on at introduces certain technical complexities that make the generalization of the methods non-trivial. Specifically, the parameter functional identifying the average effect of a treatment regime dt(ht) does not depend on the treatment mechanism gt(at | ht). This means that for estimation purposes, the treatment mechanism can often be treated as a nuisance parameter, which leads to important and useful properties such as doubly-robust estimation, etc. When the treatment policies are d(at, ht) and do depend on at, the functional identifying the counterfactual average does depend on gt(at | ht). This means that the treatment mechanism cannot be treated as a nuisance parameter, and it means that it is unclear whether doubly-robust estimators can be constructed. In this paper and in our companion technical paper Díaz et al. (2021), we have made an assumption called piecewise smooth invertibility on the treatment policy dt(at, ht), and have shown that this assumption is sufficient to allow the construction of doubly-robust estimators. The methods implemented in the package require that the analyst specifies treatment policies that satisfy this assumption. Examples of treatment policies d(at, ht) where doubly-robust estimation is unachievable are given, e.g., in Díaz and van der Laan (2013); Kennedy (2019); Díaz et al. (2022).

3.1 Estimation methods

The lmtp package implements four estimation methods: a targeted minimum-loss based estimator (TMLE), a sequential doubly-robust estimator (SDR), an estimator based on the parametric G-formula, and an inverse probability weighted (IPW) estimator. We will only describe the use of TMLE, lmtp_tmle, and SDR, lmtp_sdr, as their use is strongly encouraged over the others based on their advantageous theoretical properties which allow for machine learning while maintaining the ability to compute valid confidence intervals and p-values.

Targeted minimum-loss based estimation is a general framework for constructing asymptotically linear estimators leveraging machine learning, with an optimal bias-variance tradeoff for the target causal parameter (van der Laan and Rose, 2011, 2018). In general, TMLE is constructed from a factorization of observed data likelihood into an outcome regression [End Page 108] and an intervention mechanism. Using the outcome regression, an initial estimate of the target parameter is constructed and then de-biased by a fluctuation that depends on a function of the intervention mechanism. The sequential doubly-robust estimator is based on a unbiased transformation of the efficient influence function of the target estimand. For a thorough discussion of TMLE and SDR for static, dynamic, and modified treatment policies, we refer the reader to van der Laan and Gruber (2011); Luedtke et al. (2017); Rotnitzky et al. (2017); Díaz et al. (2021).

TMLE and SDR require estimation of two parameters at each time point: an outcome mechanism and an intervention mechanism. Both TMLE and SDR are multiply-robust in that they allow certain configurations of nuisance parameters to be inconsistently estimated. Specifically, TMLE is considered τ + 1-multiply robust in that it allows for inconsistent estimation of all the intervention mechanisms prior to any time point t, as long as all outcome mechanisms after time t are consistently estimated. SDR is 2τ -robust in that at each time point, estimation of at most either the intervention mechanism or outcome mechanism is allowed to be inconsistent. Both TMLE and SDR are efficient when all the treatment mechanism and outcome regression are consistently estimated at a given consistency rate, but the SDR has better protection against model misspecification (see Luedtke et al., 2017; Rotnitzky et al., 2017; Díaz et al., 2021, for more details).

It is important to note that the SDR estimator can produce an estimate θˆ outside of the bounds of the parameter space (e.g., probability estimates outside [0, 1]), while the TMLE guarantees that the estimate is within bounds of the parameter space. With this in mind and because for a single time-point TMLE and SDR are equally robust, we recommend use of TMLE for the case of a single time-point, while we recommend use of SDR for the longitudinal setting.

3.2 Required data structure

Data is passed to lmtp estimators through the data argument. Data should be in wide format with one column per variable per time point under study (i.e., there should be one column for every variable in Z). These columns do not have to be in any specific order and the data set may contain variables that are not used in estimation. We refer the reader to Hoffman et al. (2022) for a complete example of preprocessing data for lmtp. The names of treatment variables, censoring variables, baseline covariates, and time-varying covariates are specified using the trt, cens, baseline, and time_vary arguments respectively. The trt, cens, and baseline arguments accept character vectors and the trt and cens arguments should be ordered according to the time-ordering of the data generating mechanism. The time_vary argument accepts an unnamed list sorted according to the time-ordering of the model with each index containing the name of the time-varying covariates for the given time. The outcome variable is specified through the outcome argument.

Estimators are compatible with continuous, dichotomous and survival outcomes. In the case of a dichotomous or continuous outcome, only a single variable name should be passed to the outcome argument. For survival outcomes, a vector containing the names of the intermediate outcome and final outcome variables, ordered according to time, should be specified with the outcome argument. Dichotomous and survival outcomes should be coded using zero’s and one’s where one indicates the occurrence of an event and zero otherwise. [End Page 109]

Figure 1. Example of how a R data.frame with time-varying treatment, time-varying covariates, and loss-to-followup may be structured when using lmtp. Adapted from .
Click for larger view
View full resolution
Figure 1.

Example of how a R data.frame with time-varying treatment, time-varying covariates, and loss-to-followup may be structured when using lmtp. Adapted from Hoffman et al. (2022).

If working with a survival outcome, once an observation experiences an outcome, all future outcome variables should also be coded with a one. The outcome_type argument should be set to “continuous” for continuous outcomes, “binomial” for dichotomous outcomes, and “survival” for time-to-event outcomes. If the study is subject to loss-to-follow-up, the cens argument must be provided. Censoring indicators should be coded using zero’s and one’s where one indicates an observation is observed at the next time and zero indicates loss-to-follow-up. Once an observation’s censoring status is switched to zero it cannot change back to one. Missing data before an observation is lost-to-follow-up is not allowed. While multiple solutions exist for the problem of missing data, we recommend the use of multiple imputation pending future work on incorporating missingness into the causal model. Assuming the missingness only depends on past variables, multiple imputation is an appropriate approach.

The k argument controls a Markov assumption on the data generating mechanism. When k = Inf, the history Ht will be constructed using all previous time-point variables while setting k to any other value will restrict Ht to time-varying covariates from time tk − 1 until t−1. Baseline confounders are always included in Ht. Making a Markov assumption is particularly helpful when the sample size is smaller than the number of variables in the the full covariate history. The create_node_list function may be used to inspect how variables will be used for estimation. It is specified with the same trt, baseline, time_vary, and k arguments as lmtp estimators and is used internally to create a “node list” that encodes [End Page 110] which variables should be used at each time point of estimation. For example, consider a study with the observed data structure

inline graphic

We can translate this data structure to R with

inline graphic

A list of lists is returned with the names of the variables in Ht to be used for estimation of the outcome regression and the treatment mechanism at every time t. Notice that variables A1 and A2 are included in the list of variables used for estimation of the treatment mechamism (trt). This is due to the fact that the nuisance parameter for the treatment mechanism is the density ratio rt, which is a function of A1 and A2.

The density ratio is estimated based on a classification trick using an auxiliary variable Λ as a pseudo outcome and the treatment as a predictor. We now briefly describe how this density ratio estimation is done; the process is fully automated and hidden from the software user. Specifically, the TMLE and SDR estimation methods require estimation of the ratio of the densities of Adt and At, conditional on the history Ht, defined as rt above. This is achieved through computing the odds in a classification problem in an augmented dataset with 2n observations where the outcome is the auxiliary variable Λ (defined below) and the predictors are the variables At and Ht. In the 2n augmented data set, the data structure at time t is redefined as

inline graphic

where Λλ,i = λi indexes duplicate values. For all duplicated observations λ ∈ {0, 1} with the same i, Hλ,i,t is the same. For λ = 0, Aλ,i,t equals the observed exposure values Ai,t, whereas for λ = 1, Aλ,i,t equals the exposure values under the MTP d, namely Adt. The [End Page 111] classification approach to density ratio estimation proceeds by estimating the conditional probability that ∆ = 1 in this dataset, and dividing it by the corresponding estimate of the conditional probability that ∆ = 0. Specifically, denoting pλ the distribution of the data in the augmented dataset, we have:

inline graphic

Further details on this algorithm may be found in our technical paper (Díaz et al., 2021).

3.3 Specifying Interventions

The treatment rule dt can be specified using one of two arguments: shift or shifted. If using shift, the analyst provides a two-argument function where the first argument should correspond to a dataset containing the variables in Z and the second argument should accept a string corresponding to the variable At in Z. This function should return a size n vector of At modified according to dt. We provide examples of these functions in §4 below. Alternatively, the analyst can supply a modified version of data to shifted where At are replaced with Adt.

3.4 Machine Learnning

An attractive property of multiply-robust estimators is that they can incorporate flexible machine-learning algorithms for the estimation of nuisance parameters mt and rt while remaining inline graphic-consistent. The super learner algorithm is an ensemble learner than incorporates a set of candidate models through a weighted convex-combination based on cross-validation (van der Laan et al., 2007). Asymptotically, this weighted combination of models, called the meta-learner, will outperform any single one of its components.

Our package uses the implementation of the super learner provided by the SuperLearner package (Polley et al., 2021). The algorithms to be used in the super learner with lmtp_tmle and lmtp_sdr calls are specified with the lrnrs_trt and lrnrs_outcome arguments. The outcome variable type should guide users on selecting the appropriate candidate learners for use with the lrnrs_outcome argument. Regardless of whether an exposure is continuous, dichotomous, or categorical, the exposure mechanism is estimated using classification as discussed above, users should thus only include candidate learners capable of binary classification with the lrnrs_trt argument. If the data is high-dimensional, the analyst should restrict the SuperLearner library to candidate algorithms suitable for high-dimensional data.

Candidate learners that rely on cross-validation for the tuning of hyper-parameters should support grouped data if used with lrnrs_trt. Because estimation of the treatment mechanism relies on the augmented 2n duplicated data set, duplicated observations must be put into the same fold during sample-splitting. This is done automatically by the package.

3.5 Additional arguments

Sample-splitting and cross-fitting is used with all methods to avoid certain technical conditions that may not hold for machine learning estimators (Zheng and van der Laan, 2011; [End Page 112] Chernozhukov et al., 2018). Specifically, the data are split in V volds and each nuisance parameter is estimated V times, excluding data in one fold each time. The number of folds V can be set with the folds argument. If data has a hierarchical structure, the id argument is used to indicate the name of a variable in the data set indicating unique groups. These identifiers will be used for generation of cross-validation folds and will be accounted for in standard error calculations. If a continuous outcome has known bounds, these bounds may be specified using the bounds argument with a length two numeric vector where the first index is the lower bound and the second index is the upper bound. Parallel processing is supported using the future package Bengtsson (2021). The analyst can set a seed for reproducibility using the typical set.seed() approach (the L’Ecuyer CMRG algorithm, an algorithm that’s suitable for random number generation with parallel processing is used by future).

4. Examples

Example 1 Longitudinal MTP with no loss-to-follow-up

We have simulated data on n = 5000 observations over a 5-month period. Each observation has a continuous exposure (A_1, A_2, A_3, A_4) and covariate (L_1, L_2, L_3, L_4) recorded at months one through four and a dichotomous outcome (Y) at month five. We assume no loss-to-follow-up and no Markov assumption. This data set is installed with the package and is stored in the object sim_t4.

For this example, we are interested in the effect of a longitudinal MTP where at each month an observation’s exposure decreases by one only if their observed exposure wouldn’t be less than one if modified. Our data structure has no baseline confounders and we will use only GLMs for estimation so the only objects we must specify are the treatment variables, the time-varying covariates, the outcome variable, and the MTP shift function. Because we are only using parametric models for estimation, cross-fitting is not necessary and we may set folds = 1.

inline graphic

Example 2 Longitudinal MTP with right censoring

For this example, we have a simulated dataset of n = 1000 observations. Data was simulated for three time points with a continuous time-varying exposure at times t ∈ {1, 2} (A1, A2), a dichotomous time-varying covariate at times t ∈ {1, 2} (L1, L2), and a dichotomous outcome (Y) at time τ + 1 = 3. Loss-to-follow-up is present after time t = 1 so the data set contains censoring indicators (C1, C2). This data is installed with the package and is stored in the object sim_cens.

Suppose we are interested in the additive effect of an MTP where exposure is increased by 0.5 at every time point for all observations. Instead of using a linear model, we will estimate the outcome regression and treatment mechanism using a super learner composed of a generalized linear model, a random forest (Wright and Ziegler, 2017), and multivariate adaptive regression splines (Milborrow, 2019).

inline graphic

If loss-to-follow-up exists, we can estimate the population mean outcome under the observed exposures adjusting for informative loss-to-follow-up by specifying shift = NULL.

inline graphic

Example 3 Survival analysis and deterministic effects

The lmtp package may also be used to estimate deterministic causal effects, such as the causal relative risk. Suppose we have time-to-event data on n = 2000 observations with a time-invariant dichotomous exposure followed for a period of seven days. We wish to estimate the causal relative risk of experiencing the event by day seven. This data is installed with the package and is stored in the object sim_point_surv.

inline graphic

Example 4 Dynamic treatment regimes

Dynamic treatment regimes are treatment rules where treatment is applied based on a fixed rule that depends on covariate history. The lmtp package is capable of estimating the effects of deterministic dynamic treatment rules and modified treatment policies that depend on covariate history.

For this example we will use the same data from example 1. However, we will extend the longitudinal MTP used in that example so that a shift in the exposure also depends on covariate history and time. Specifically, if t = 1 exposure decreases by one if an observation’s realized exposure wouldn’t be less than one if modified. If t > 1 and the value of the time-varying covariate at that time-point equals one then exposure should decrease by one if an observation’s realized exposure wouldn’t be less than one if modified; otherwise exposure shouldn’t be modified. We also make a Markov assumption and set k = 1, restricting the time-varying covariate history to only variables at t − 1.

inline graphic

Example 5 The shifted argument

In some scenarios, shift functions may be very complex and it may be simpler for the analyst to create a copy of the data with the treatment variables modified by dt themselves. The modified copy of the data can then be supplied as an argument to the shifted function in-lieu of a shift function to the shift argument. Only the variables in trt can be different between the data in data and shifted.

As an example, consider the same analysis from example 4. The example 4 shift function required: identifying the current time-point using regular expressions, checking the value of a time-varying covariate, and defining a function inside of itself. Modifying the data according to the dt prior to calling lmtp_sdr greatly simplifies the code.

inline graphic

5. Summary

The lmtp package provides a unified implementation of four estimators for general parameters in a causal inference setting with longitudinal data. The package is very flexible: it accommodates continuous, multivariate, and categorical exposures, continuous, binary, time-to-event outcomes, and right-censoring. Two estimators in the package allow the use of machine learning model ensembles that mitigate the risk of model misspecification bias, providing confidence intervals and standard errors. We demonstrated the ease of use and flexibility of the package in four examples with data structures of varying complexity. The package provides further flexibility and allows the user to contrast the average of the counterfactual outcomes in any desirable scale: difference, risk ratio, or odds ratio. [End Page 118]

Nicholas Williams
Department of Epidemiology
Mailman School of Public Health
Columbia University
New York, NY 10032, USA
Iván Díaz
Division of Biostatistics
Department of Population Health
New York University Grossman School of Medicine
New York, NY 10016, USA
Submitted Sep 7 2022; Published Mar 2023


Heejung Bang and James M Robins. Doubly robust estimation in missing data and causal inference models. Biometrics, 61(4), 2005.
Henrik Bengtsson. A unifying framework for parallel and distributed processing in r using futures. The R Journal, 13(2):208–227, 2021. doi: 10.32614/RJ-2021-048. URL
David Benkeser and Nima Hejazi. survtmle: Compute Targeted Minimum Loss-Based Estimates in Right-Censored Survival Settings, 2019. URL R package version 1.1.1.
Jonathan Buckley and Ian James. Linear Regression with Censored Data. Biometrika, 66 (3):429–436, 1979. doi: 10.2307/2335161.
Victor Chernozhukov, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins. Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 21(1), 2018. doi: 10.1111/ectj.12097.
Iván Díaz and Mark van der Laan. Population intervention causal effects based on stochastic interventions. Biometrics, 68(2):541–549, 2012. doi: 10.1111/j.1541-0420.2011.01685.x.
Iván Díaz and Mark J van der Laan. Assessing the causal effect of policies: an example using stochastic interventions. The international journal of biostatistics, 9(2):161–174, 2013.
Iván Díaz, Nicholas Williams, Katherine L. Hoffman, and Edward J. Schenck. Nonpara-metric causal effects based on longitudinal modified treatment policies. Journal of the American Statistical Association, 2021. doi: 10.1080/01621459.2021.1955691.
Iván Díaz, Katherine L Hoffman, and Nima S Hejazi. Causal survival analysis under competing risks using longitudinal modified treatment policies. arXiv preprint arXiv:2202.03513, 2022.
Jianqing Fan and Irene Gijbels. Censored Regression: Local Linear Approximations and Their Applications. Journal of the American Statistical Association, 89(426):560–570, 1994. doi: 10.2307/2290859.
S. Haneuse and A. Rotnitzky. Estimation of the effect of interventions that modify the received treatment. Statistics in Medicine, 32(30):5260–5277, 2013. doi: 10.1002/sim.5907.
Nima S Hejazi and David Benkeser. txshift: Efficient estimation of the causal effects of stochastic interventions in R. Journal of Open Source Software, 2020. doi: 10.21105/joss.02447. URL
Katherine L. Hoffman, Edward J. Schenck, Michael J. Satlin, William Whalen, Di Pan, Nicholas Williams, and Iván Díaz. Corticosteroids in covid-19: Optimizing observational research through target trial emulations. medRxiv, 2022. doi: 10.1101/2022.05.27.22275037.
Edward H. Kennedy. Nonparametric Causal Effects Based on Incremental Propensity Score Interventions. Journal of the American Statistical Association, 114(526), 2019. doi: 10.1080/01621459.2017.1422737.
Edward H. Kennedy, Zongming Ma, Matthew D. McHugh, and Dylan S. Small. Nonpara-metric methods for doubly robust estimation of continuous treatment effects. Journal of the Royal Statistical Society. Series B, Statistical methodology, 79(4):1229–1245, 2017. doi: 10.1111/rssb.12212.
Mark J. van der Laan and Sherri Rose. Targeted Learning: Causal Inference for Observational and Experimental Data. Springer Series in Statistics. Springer-Verlag, New York, 2011. ISBN 978-1-4419-9781-4. doi: 10.1007/978-1-4419-9782-1.
Mark J. van der Laan and Daniel Rubin. Targeted Maximum Likelihood Learning. The International Journal of Biostatistics, 2(1), 2006. doi: 10.2202/1557-4679.1043.
Samuel D. Lendle, Joshua Schwab, Maya L. Petersen, and Mark J. van der Laan. ltmle: An r package implementing targeted minimum loss-based estimation for longitudinal data. Journal of Statistical Software, 81(1):1–21, 2017. doi: 10.18637/jss.v081.i01.
Alexander R Luedtke, Oleg Sofrygin, Mark J van der Laan, and Marco Carone. Sequential double robustness in right-censored longitudinal models. arXiv preprint arXiv:1705.02459, 2017.
Sean McGrath, Victoria Lin, Zulu Zhang, Lucia C Petito, Roger W. Logan, Miguel A. Hernán, and Jessica G. Young. gformula: An r package for estimating the effects of sustained treatment strategies via the parametric g-formula. Patterns, 1(3), 2020. doi: 10.1016/j.patter.2020.100008.
Márcio de Almeida Mendes, Inácio da Silva, Virgilio Ramires, Felipe Reichert, Rafaela Martins, Rodrigo Ferreira, and Elaine Tomasi. Metabolic equivalent of task (mets) thresholds as an indicator of physical activity intensity. PloS one, 13(7), 2018.
Stephen Milborrow. earth: Multivariate Adaptive Regression Splines, 2019. R package version 5.1.2.
Maya L Petersen, Linh Tran, Elvin H Geng, Steven J Reynolds, Andrew Kambugu, Robin Wood, David R Bangsberg, Constantin T Yiannoutsos, Steven G Deeks, and Jeffrey N Martin. Delayed switch of antiretroviral therapy after virologic failure associated with elevated mortality among hiv-infected adults in africa. AIDS (London, England), 28(14), 2014.
Eric Polley, Erin LeDell, Chris Kennedy, and Mark van der Laan. SuperLearner: Super Learner Prediction, 2021. URL R package version 2.0-28.
James M Robins, Miguel A Hernán, and UWE SiEBERT. Effects of multiple interventions. Comparative quantification of health risks: global and regional burden of disease attributable to selected major risk factors, 1:2191–2230, 2004.
Andrea Rotnitzky, David Faraggi, and Enrique Schisterman. Doubly Robust Estimation of the Area Under the Receiver-Operating Characteristic Curve in the Presence of Verification Bias. Journal of the American Statistical Association, 101(475):1276–1288, 2006. doi: 10.1198/016214505000001339.
Andrea Rotnitzky, James Robins, and Lucia Babino. On the multiply robust estimation of the mean of the g-functional. arXiv preprint arXiv:1705.08582, 2017.
Daniel Rubin and Mark van der Laan. Doubly Robust Censoring Unbiased Transformations. U.C. Berkeley Division of Biostatistics Working Paper Series, June 2006.
Kara E Rudolph, Catherine Gimbrone, Ellicott C Matthay, Ivan Diaz, Corey S Davis, John R Pamplin II, Katherine Keyes, and Magdalena Cerda. When effects cannot be estimated: redefining estimands to understand the effects of naloxone access laws. Epidemiology, 33:689–698, 2022.
Mark van der Laan and Sandrine Dudoit. Unified Cross-Validation Methodology For Selection Among Estimators and a General Cross-Validated Adaptive Epsilon-Net Estimator: Finite Sample Oracle Inequalities and Examples. U.C. Berkeley Division of Biostatistics Working Paper Series, 2003.
Mark J van der Laan and Susan Gruber. Targeted minimum loss based estimation of an intervention specific mean outcome. 2011.
Mark J van der Laan and Sherri Rose. Targeted Learning: Causal Inference for Observational and Experimental Data. Springer, New York, 2011.
Mark J van der Laan and Sherri Rose. Targeted Learning in Data Science: Causal Inference for Complex longitudinal Studies. Springer, New York, 2018.
Mark J. van der Laan, Eric C. Polley, and Alan E. Hubbard. Super Learner. Statistical Applications in Genetics and Molecular Biology, 6(1), 2007. doi: 10.2202/1544-6115.1309.
Willem M. van der Wal and Ronald B. Geskus. ipw: An r package for inverse probability weighting. Journal of Statistical Software, 43(13):1–23, 2011. doi: 10.18637/jss.v043.i13.
Lan Wen, Julia Marcus, and Jessica Young. Intervention treatment distributions that depend on the observed treatment process and model double robustness in causal survival analysis. arXiv preprint arXiv:2112.00807, 2021.
Marvin N. Wright and Andreas Ziegler. ranger: A fast implementation of random forests for high dimensional data in C++ and R. Journal of Statistical Software, 77(1):1–17, 2017. doi: 10.18637/jss.v077.i01.
Jessica G Young, Miguel A Hernán, and James M Robins. Identification, estimation and approximation of risk under interventions that depend on the natural value of treatment using observational data. Epidemiologic methods, 3(1):1–19, 2014.
Wenjing Zheng and Mark J. van der Laan. Cross-Validated Targeted Minimum-Loss-Based Estimation. In Mark J. van der Laan and Sherri Rose, editors, Targeted Learning: Causal Inference for Observational and Experimental Data, Springer Series in Statistics, pages 459–474. Springer, New York, NY, 2011. ISBN 978-1-4419-9782-1. doi: 10.1007/978-1-4419-9782-1_27.