University of Pennsylvania Press

About forty years ago, in a now–seminal contribution, Rosenbaum and Rubin (1983) introduced a critical characterization of the propensity score as a central quantity for drawing causal inferences in observational study settings. In the decades since, much progress has been made across several research frontiers in causal inference, notably including the re-weighting and matching paradigms. Focusing on the former and specifically on its intersection with machine learning and semiparametric efficiency theory, we re-examine the role of the propensity score in modern methodological developments. As Rosenbaum and Rubin (1983)’s contribution spurred a focus on the balancing property of the propensity score, we re-examine the degree to which and how this property plays a role in the development of asymptotically efficient estimators of causal effects; moreover, we discuss a connection between the balancing property and efficient estimation in the form of score equations and propose a score test for evaluating whether an estimator achieves empirical balance.


Propensity Score, Balancing Score, Semiparametric Efficiency, Score Test, Sieve Estimation, Undersmoothing, Machine Learning, Philosophy

1. The Propensity Score’s Central Role: A Retrospective

In observational studies that aim to evaluate the causal effects of well-defined interventions (e.g., Hernán and Taubman, 2008; Pearl, 2010), a key inferential obstacle arises in the form of potential confounding of the treatment–response relationship by baseline (pre-treatment) covariates. Unlike observational studies, randomized controlled trials (RCTs) feature a built-in safeguard against this form of confounding—specifically, since the treatment’s allocation among study units is, on average, balanced across strata defined by the baseline covariates, confounding of the treatment–response relationship is theoretically expected to be a non-issue. It is this inferential safeguard that is in part responsible for RCTs being considered as “gold standard” tools for generating evidence in biomedical and [End Page 23] health research. As observational studies lack any such built-in protective measure, significant care is required, in both the design and analysis stages, to mitigate confounding by addressing systematic differences between treatment and control groups. Such considerations come in the form of assumptions like that of strong ignorability (an observational analog to the randomization assumption) and adjustment for baseline covariates when estimating nuisance parameters critical for the construction of estimators of the causal effects of interest. Rosenbaum and Rubin (1983) introduced the propensity score (the probability of receiving treatment conditional on baseline covariates) and related this quantity to the issue of balance between treatment groups. In their contribution, these authors outline the propensity score’s membership in a class of balancing scores, which satisfy conditional independence between treatment assignment and baseline covariates. Through their analysis, these authors relate this form of conditional independence to the (untestable) strong ignorability assumption and show that the propensity score plays a critical role in causal inference, with uses in matching study units, in stratification for weighting-based adjustment, and in covariance adjustment in analyses centered on the general linear model. On account of the propensity score’s close correspondence with the notion of balance, a sharp focus has been placed on this particular property in the decades since Rosenbaum and Rubin (1983)’s contribution, yet most ideas have been restricted to viewing balance in terms of the properties of parametric modeling-based estimators of the propensity score. In this case, the balancing property of a propensity score estimator is the finite-sample analogue of the conditional independence assumption that theoretically defines a balancing score. Taking a view rooted outside the traditional culture of parametric modeling, we define a notion of balance for an estimator of the propensity score, with respect to particular functions of baseline covariates, showing that this function-specific balancing property corresponds with solving function-specific score equations, connecting this balancing property with formal criteria for asymptotically efficient estimation.

To formalize our arguments, consider an observational study that collects data on n units, O1, . . . , On, sampled independently and identically from an (unknown) distribution P0, that is, O ~ P0 ℳ, where ℳ is a realistic (nonparametric) statistical model placing only minimal and plausible restrictions on the form of P0. The data available on the ith unit Oi may be partitioned by time-ordering as (Xi, Zi, Ri), for a vector of baseline covariates X, a binary treatment Z ∈ {0, 1}, and a response R (where possible, we borrow the notation of Rosenbaum and Rubin (1983) in homage). For any unit O, R(0) and R(1) are the potential outcomes (Neyman, 1938; Rubin, 1978, 2005) of R, mutually unobservable quantities that arise when the treatment Z takes the values Z = 0 and Z = 1, respectively. The population average treatment effect (ATE) is τ0 = inline graphic0[R(1)−R(0)], a difference of counterfactual means; the naught subscript refers to the true distribution P0. Under standard assumptions of consistency (R(1) ≡ R | Z = 1) (Pearl, 2010), lack of interference (Cox, 1958), positivity of treatment assignment (δ < ℙ (Z = 1 | X) < 1−δ for δ > 0), and strong ignorability of treatment assignment (i.e., randomization of treatment assignment in RCTs), τ0 is identifiable; furthermore, it is estimable by either substitution (plug-in) or inverse probability weighted (IPW) estimators, which arise from differing identification strategies. A substitution estimator takes the form τSUBn = inline graphicn,X{Q̅n(1, X) − Q̅n(0, X)}, where n(Z, X) := inline graphicn(R | Z, X) is an estimator of the true response mechanism’s conditional mean 0(Z, X) := inline graphic0(R | Z, X), while an IPW estimator is inline graphic, where [End Page 24] en(X) := ℙn(Z = 1 | X) is an estimator of the true propensity score e0(X) := ℙ0(Z = 1 | X). In both cases, expectations w.r.t. X are simply computed via an empirical mean. Throughout, as with the naught subscript, the subscript n denotes estimates based on the empirical distribution Pn of the sampled units O1, . . . , On. On occasion, we will rely upon standard notation from empirical process theory; specifically, we will let P f := ʃ f(O)dP and Pnf := n1i=1n f (Oi) whenever convenient.

Salient to arguments about the balancing property, IPW estimators may be viewed as achieving balance across treatment conditions by upweighting (or downweighting) units from strata of X that are underrepresented (or overrepresented) in each of the treatment groups, resulting in an artificially constructed pseudo-population (Horvitz and Thompson, 1952; Hernán and Robins, 2023) in which treatment assignment Z is empirically marginally independent of (or “balanced on”) X. That is, inverse probability weighting by treatment propensity, using e0(X), creates a hypothetical pseudo-population in which empirically XZ (essentially by creating copies of underrepresented units), allowing for inference on the effect of Z on R without confounding by X. This form of balance corresponds with the (conditional) independence condition of Rosenbaum and Rubin (1983) (XZ | e0(X)), which clarifies that the propensity score e0(X) can be used to mitigate confounding of the ZR relationship by X, by enforcing conditional independence of X and Z given e0(X). Rosenbaum and Rubin (1983) use this conditional independence relationship to theoretically characterize balancing scores, defining a balancing score b0(X) as satisfying the condition XZ | b0(X). The propensity score is itself a balancing score, e0(X) = f(b0(X))—in fact, it is the “coarsest” among this class of scores in the sense of minimally satisfying this form of independence. We emphasize that the empirical analogue of this balancing property applied to an estimator en(X) of e0(X) drives the efficiency of estimators of the causal effect of interest; for example, IPW estimators using e0(X) are consistent and asymptotically linear but very inefficient (e.g., van der Laan and Robins, 2003). While the theoretical characterization of balance intuitively explains how this property is central to constructing consistent estimators of causal effects, it ignores a property generally desirable in any estimator—efficiency.

2. The Empirical Balance–Efficiency Tradeoff

Although the propensity score’s use in inducing empirical balance between treatment groups has intuitive appeal, classical estimators relying solely on this property for a specific parametric model, as traditionally formulated, generally fail to achieve asymptotic efficiency, even when the parametric model for the propensity score is correctly specified. In fact, it has been established that neither the IPW estimator nor the propensity score stratification-based estimator (proposed by Rosenbaum and Rubin (1983)) are generally asymptotically efficient. Drawing upon semiparametric efficiency theory, we note that the efficient influence function (EIF) arises as a central object, uniquely represented as the canonical gradient of the pathwise derivative of the target parameter (τ0) at a distribution P in the model M (see, e.g., Bickel et al., 1993; van der Vaart, 2000; van der Laan and Robins, 2003; Tsiatis, 2007; Kennedy, 2016; Hines et al., 2022). The EIF’s importance comes from the fact that it characterizes the best possible asymptotic variance, or nonparametric efficiency bound, among all regular asymptotically linear estimators of a target parameter. For this [End Page 25] reason, the EIF is commonly used as a critical ingredient in strategies for the construction of efficient estimators. For example, efficient estimation frameworks popular in modern practice—such as one-step estimation (Pfanzagl and Wefelmeyer, 1982; Bickel et al., 1993), estimating equations (van der Laan and Robins, 2003; Bang and Robins, 2005), and targeted minimum loss estimation (van der Laan and Rubin, 2006; van der Laan and Rose, 2011)—construct estimators by unique updating procedures that each reference the form of the EIF in different ways, resulting in candidate estimators with desirable asymptotic behavior. Under standard regularity conditions, an estimator τn of the target parameter τ0 is asymptotically linear when

inline graphic

where D(P0) is the EIF at the true data-generating distribution P0. An asymptotically linear estimator τn is generally (asymptotically) efficient when it solves the EIF estimating equation, i.e., PnD(P0) 0, in which case τn has limit distribution N(τ0, P {D(P0)}2 ) with asymptotic variance matching that of the EIF D(P0). As such, representations of the EIF play a key role in constructing efficient estimators.

2.1 Score Equations Characterize Asymptotic Efficiency

In causal inference problems, the EIF is indexed by at least two nuisance quantities, among which the propensity score invariably appears. It is in this way that e0(X) plays a key role in the construction of efficient estimators. Going forward, to simplify notational burden, we focus on a single component of the ATE, the counterfactual mean of the response under treatment Z = 1, i.e., τ0 := inline graphic0R(1). The EIF D(P ) of τ0 at the data-generating distribution P0 ∈ℳ may be expressed

inline graphic

The form of expression (2) for the EIF of τ0 is rather instructive, revealing that the propensity score enters into a score term for the response mechanism h(X)(R − Q̅0(Z, X)), with h(X) = inline graphic(Z = 1)/e0(X), as an inverse weight applied to a residual for the conditional mean of the response given (Z, X). This expression for the EIF places emphasis on estimation of the response mechanism 0(Z, X), highlighting that an efficient estimator must incorporate a nuisance estimator n(Z, X) that suitably solves the EIF estimating equation, PnD(n, en) 0. Owing to this emphasis on 0(Z, X), such expressions for the EIF are most amenable to the construction of efficient estimators relying upon the substitution formula, such as in targeted minimum loss estimation (van der Laan and Rose, 2011), which updates initial estimates of n via a one-dimensional parametric update step that depends on the EIF’s form.

Unlike their substitution-based counterparts, IPW estimators rely solely on the propensity score. Saliently, an alternative expression for the EIF—the Augmented IPW (AIPW) representation of Robins and Rotnitzky (1992, 1995)—is more suitable for their characterization: [End Page 26]

inline graphic

The AIPW representation (3) of the EIF can be shown to be equivalent to expression (2) but stresses instead the importance of the propensity score through the score term for the treatment mechanism h(X)(Z − e0(X)), with h(X) = 0(1, X)/e0(X). As indicated, the form of (3) admits a decomposition: the first term DIPW is the IPW estimating equation (the mapping defining these Z-estimators) while the second term DCAR is a projection (of DIPW onto the space of all functions of (Z, X) that are mean-zero conditional on X) satisfying coarsening-at-random (CAR) (van der Laan and Robins, 2003). Since IPW estimators are defined as solutions to PnDIPW(en) 0, this term is trivially solved by construction of τn; then, expression (3) states that an efficient IPW-type estimator must be a solution to PnDCAR(en, Q̅n) 0 principally through a suitable estimator of the propensity score en(X), suggesting that one should prioritize an estimator en(X) that satisfies this criterion when seeking an efficient estimator τn of τ0. Again, note that, even when e0(X) is known, using e0(X) would fail to solve relevant score equations, including PnDCAR(en, Q̅n) 0. Together, expressions (2) and (3) provide criteria for the construction of efficient estimators—notably, both depend on proper estimation of the propensity score en(X) in the sense that it solves key score equations.

2.2 Score Equations Characterize Directional Empirical Balance

From expressions (2) and (3), we have seen that score terms of the form h(X)(R−Q̅0(Z, X)) and h(X)(Z−e0(X)), for particular choices of the weighting function h(X), play critical roles in criteria for asymptotic efficiency. We now argue that such score terms also characterize the empirical balancing property, though this has not been, to the best of our knowledge, extensively explored to date. Recall that Rosenbaum and Rubin (1983) characterize the theoretical balancing property in terms of a statistical conditional independence condition XZ | e0(X). This view on the balancing property has motivated the development of a host of diagnostic procedures to evaluate the sample-level balance provided by candidate estimators en(X), with graphical procedures (e.g., the “Love plot”) enjoying much popularity and software implementations often accruing many thousands of downloads (e.g., Greifer (2022)’s cobalt R package). Yet, the popularity of such approaches belies their scientific and statistical value: these diagnostic techniques are limited to revealing only whether an estimator en(X) induces empirical balance with respect to the statistical model underlying the estimator. For example, logistic regression remains an exceedingly popular candidate estimator of the propensity score but intrinsically assumes the (logit of) the conditional mean of Z given X to be adequately described as a linear function of X. Of course, this amounts to a significant (and often unrealistic) restriction on the statistical model . Much worse though, this assumption is usually imposed only for the sake of mathematical convenience, hardly ever motivated by domain knowledge. Even when the parametric model for the propensity score is correct, these techniques check only the degree to which a maximum likelihood estimator en(X) satisfies empirical balance within restrictive (small) statistical models, thereby only achieving empirical balance Zf(X), given en(X), for a [End Page 27] very limited set of functions f(X). In so doing, the balancing property w.r.t. the chosen parametric model is emphasized over such fundamental concerns as the efficiency of the estimator τn, ignoring even the fact that the resulting estimator will generally fail to even achieve consistency.

Fortunately, focusing on univariate or multivariate balance under parametric modeling assumptions—convenient as it may be—is hardly the only option. When the propensity score estimator en(X) is selected as a solution to score equations of the form Pns(en; f) 0, for s(Z, X; f) = f(X)(Z−e0(X)), an f-specific form of conditional independence is satisfied, Zf(X) | en(X). Lemma 1 summarizes this.

Lemma 1 (Score-based Balance) Let ℱ contain a rich class of functions and, for an arbitrary f ∈ ℱ, define scores of the form s(Z, X; f) = f(X)(Z − e0(X)). When a corresponding score equation Pns(en; f) 0 is solved for a given f, the null hypothesis H0(f) : inline graphic0(Z | f(X), en(X)) = inline graphic0(Z | en(X)) holds, as the data provide no signal against H0(f); moroever, no valid test of H0(f) will reject this null hypothesis. When H0(f) holds, f(X) contains no information, beyond that captured by en(X), useful for predicting treatment status Z from covariates X—that is, the empirical balance induced by en(X) cannot be improved by f(X).

Lemma 1 provides a score-based criterion for characterizing the empirical balancing property and frames its evaluation in terms of a class of hypothesis tests. When a sequence of such tests uniformly fails to reject a family of f-specific null hypotheses {H0(f) : f ∈ℱ}, there is no empirical evidence to contradict the equality under the null for the family of f ∈ ℱ; then, no such f(X) contains information about Z not already captured by en(X), implying that en(X) enforces balance in this f-specific sense. Since Pns(en; f) may be viewed as a measure of the degree to which Z is independent of f(X), given en(X), this f-specific empirical balance may be enforced via score tests for H0(f) by selecting en(X) so as to ensure Pns(en; f) 0.

To employ such a hypothesis testing strategy, consider a model logit[ inline graphic0(Z | X)] = logit[en(X)]+βf(X) for f ∈ℱ, in which en(X) is taken as an offset. Under this assumption, the null hypothesis may be reframed, for fixed f ∈ ℱ, as H0(f) : β = 0 against the alternative H1(f) : β ≠ 0, so that a hypothesis test of this form corresponds to testing the null of independence. Basing such a hypothesis test on the score of the empirical log-likelihood, given by inline graphicnf(X)(Z − en(X)), at β = 0 leads to a score test that simply evaluates the magnitude of this score as a test statistic, rejecting the null hypothesis when it moves appreciably far from zero. When the estimator en(X) solves the score equation inline graphicnf(X)(Z − en(X)) 0, there cannot be evidence against H0(f). As noted above, when is a rich class, and this type of score is satisfied for many f(X), then this score test fails to reject the null hypothesis H0(f) : β = 0 in the class , implying that inline graphic0(Z | f(X), en(X)) = inline graphic0(Z | en(X))—that is, XZ | en(X). Finally, when is rich enough to include weighting functions appearing in the EIF, e.g., n(1, X)/en(X) in expression (3), then en(X) will both enforce balance in X over and the resultant estimator τn will be asymptotically efficient on account of satisfying the EIF estimating equation PnDCAR(en, Q̅n) 0. Of course, satisfying this score-based balancing criterion for many f ∈ ℱ, providing balance over X w.r.t. the linear span of these f, may also automatically solve the score equation for the f(X) appearing in the relevant EIF estimating equation. [End Page 28]

3. Statistical Techniques for Solving Score Equations

To this point we have reviewed the critical role that score equations play in asymptotic efficiency, through their appearance in the EIF estimating equation, and outlined their connection to the notion of empirical balance proposed by Rosenbaum and Rubin (1983) via Lemma 1 and our proposed score test. Owing to their critical role in constructing efficient estimators, several classes of techniques for solving score equations have been proposed at the interface of causal machine learning and semiparametric efficiency theory. We next selectively review two successful frameworks that have been applied to this end: targeted minimum loss estimation (van der Laan and Rubin, 2006) and nonparametric sieve estimation.

3.1 Targeted Updating of the Propensity Score Estimator

The targeted minimum loss estimation (or targeted learning) framework focuses on constructing efficient substitution estimators, and, as such, features a targeting (or updating) step usually applied only to initial estimates of the response mechanism n(Z, X). For example, a TML estimator τ n of the counterfactual mean τ0 is constructed in two steps, by, first, generating an initial estimate n(Z, X) (for which flexible machine learning strategies (e.g., van der Laan et al., 2007) are recommended) and, second, perturbing the initial estimate using a univariate parametric model logit(Q̅⋆n(Z, X)) = logit(n(Z, X)) + ϵh(X), where h(X) = inline graphic (Z = 1)/en(X) as in expression (2). This correction ensures that Q̅⋆n(Z, X) is free of “plug-in bias,” allowing the TML estimator τ n to achieve asymptotic linearity by acting as an approximate solution to the EIF estimating equation, making the TML estimator τ n asymptotically efficient. In principle, this same perturbation strategy may be applied to updating propensity score estimators en(X) so as to enforce balance (or other such desiderata) while still ensuring that the estimators remain as solutions to the EIF estimating equation. We note that the targeted learning framework has been, and continues to be, the subject of fervent research (van der Laan and Rose, 2011, 2018), and that there are many variations of TML estimators suited to different goals, even including those adjusting specifically for balancing scores (Lendle et al., 2015).

In considering alternative applications of the targeted updating approach, van der Laan (2014, see Theorem 1) proposed a procedure for the construction of targeted IPW estimators. Such IPW estimators are based on applying the one-dimensional parametric update logit(en(X)) = logit(en(X))+ϵh(X) to map an initial propensity score estimator en(X) into an updated version en(X) so as to satisfy asymptotic linearity, though asymptotic efficiency remained elusive. As IPW estimation avoids explicit modeling of the response mechanism, constructing efficient estimators is both philosophically and technically challenging, as such estimators must solve the DCAR component of the EIF, which includes the response mechanism in the numerator of the inverse weight of the score term, i.e., h(X) = n(1, X)/en(X) (as in expression (3)). One may, for example, take an approach based in universal least favorable update models (van der Laan and Gruber, 2016), tracking updates to the estimator en(X) locally along a path in a one-dimensional parametric model formulated to solve the score equation Pnh(X)(Z − en(X)) 0. To circumvent modeling the response mechanism, van der Laan (2014) proposed instead replacing n(1, X) with an estimator of an artificial nuisance parameter r0(1, X) := inline graphic0(R | Z,ēn(X)), in which ēn(X) is a fixed [End Page 29] summary measure of the covariates X. Constructing such artificial nuisance quantities requires mathematically sophisticated dimension reduction efforts, that, while difficult to generalize, avoid the logical contradiction associated with directly modeling the response R. Theoretical investigations revealed these targeted IPW estimators to satisfy asymptotic linearity but fail to satisfy standard regularity conditions, sharply limiting their practical use. These targeted IPW estimators were implemented and are available in the drtmle R package (Benkeser and Hejazi, 2022).

3.2 Undersmoothing of the Propensity Score Estimator

As we have seen, asymptotically efficient estimators can only be constructed as solutions to score equations (e.g., PnDCAR(en, Q̅n) 0) rooted in careful study of the EIF. In the context of IPW estimation, these scores take the general form s(Z, X; h) = h(X)(Z−en(X)), where h(X) is an appropriate weighting function, e.g., h(X) = n(1, X)/en(X) when τ0 is the counterfactual mean under treatment. When en(X) is constructed by way of a sieve MLE (or NP-MLE, when such exists), undersmoothing may be applied to this initial estimator so as to ensure it solves relevant score equations, i.e., Pns(en; h) 0. For such efforts to succeed, not only must en(X) converge to e0(X) at a suitably fast rate (i.e., rate-consistency), but en(X) must act as an MLE over the set of statistical models over which the sieve is applied. In recent work, Ertefaie et al. (2022) demonstrated, both theoretically and practically, that undersmoothing of a highly adaptive lasso (HAL) estimator of en(X) allows for the construction of asymptotically linear and efficient IPW estimators of τ0. Relatedly, van der Laan et al. (2022) proved that substitution estimators using undersmoothed HAL are efficient as well, which, together with Ertefaie et al. (2022), demonstrates the utility of HAL across two important and popular classes of estimators. The properties of these two HAL-based estimators of causal effects rely on key properties of the HAL estimator itself, first investigated by van der Laan (2015, 2017), who showed HAL to be an MLE for a model λ0 indexed by the true sectional variation norm λ0 of the HAL representation of a target functional (e.g., e00 (X)). The HAL estimator assumes that the target functional, e00 (X) in this context, to be càdlàg (i.e., RCLL) and of bounded sectional variation norm (λ0 < ∞); this estimator is available in the open source hal9001 R package (Coyle et al., 2022; Hejazi et al., 2020). Using a HAL-MLE for en,λn(X), Ertefaie et al. (2022) outlined conditions and formulated selection procedures for appropriately undersmoothing en,λn(X) such that the resultant IPW estimator τIPWn,λn would attain the nonparametric efficiency bound, solving the EIF estimating equation by acting as a solution to PnDCAR(en,λn, Q̅n) 0. Contemporaneously, Hejazi et al. (2022) also investigated novel undersmoothing selection criteria agnostic to the EIF’s form for HAL-based estimators of the generalized propensity score (Imbens, 2000; Hirano and Imbens, 2004).

Prior work has considered the undersmoothing of propensity score estimators, notably including Hirano et al. (2003), who proposed the use of a logistic series estimator for en(X). While these authors showed undersmoothing of their en(X) estimator to be capable of generating an efficient IPW estimator of the ATE, their approach is practically limited by requiring the propensity score e0(X) to be a (highly smooth) k-times differentiable function of X. What’s more, this and related approaches fail to appropriately emphasize that efficiency of the estimator τn is a direct result of en(X) solving the score equation arising from the [End Page 30] EIF. By contrast, the more recent developments of Ertefaie et al. (2022) highlight this point; furthermore, based on our Lemma 1, a natural extension of the latter approach would be to focus on solving a broad range of score equations (of the form Pnh(X)(Z − en,λn(X)) 0), ultimately resulting in both the downstream estimator τn attaining asymptotic efficiency and the propensity score estimator en(X) inducing empirical balance without reliance on brittle modeling assumptions. Of course, our proposed score test implies that solving a range of score equations yields an estimator satisfying the empirical balancing property, positioning undersmoothing as a potentially critical tool for achieving empirical balance and, thereby, asymptotic efficiency. Nevertheless, in finite samples, beyond using HAL, one may wish to enforce additional empirical balance in particular f-specific directions, in which case undersmoothing of HAL may be paired with the score-preserving properties of targeted updating to ensure efficiency while accommodating balance w.r.t. user-specified functions of X.

4. The Propensity Score’s Role in Modern Causal Inference

The propensity score holds an integral place in causal inference, playing important roles in terms of both theoretical and empirical balance and asymptotic efficiency as outlined by developments in semiparametric theory. We have argued that this empirical balancing property corresponds to solving particular types of score equations, generally of the form Pnh(X)(Z − en(X)) 0, and we have proposed a score test for evaluating the degree to which a candidate propensity score estimator en(X) achieves balance. By characterizing the empirical balancing property in terms of the solving of particular score equations, we circumvent the diagnostic strategies popular today, which focus on checking empirical balance across only a subset of (necessarily discrete) covariates X or are derived from and reliant upon the brittle assumptions underlying parametric estimators of e0(X), e.g., logistic regression. This form of empirical balance corresponds with approximately solving a limited set of score equations derived from focusing upon particular subsets of covariates.

This characterization bridges the gap between satisfaction of the empirical balancing property and well-studied strategies for asymptotically efficient estimation based on the solving of efficient score equations (e.g., the DCAR component of the EIF). By demonstrating that there need not be a disconnect between the dual desiderata of empirical balance and asymptotic efficiency, we reduce enforcing the empirical balancing property (and diagnosing deviations from it) to the solving of score equations—achievable within well-studied frameworks like targeted learning or nonparametric sieve estimation. We argue that modern propensity score-based estimation strategies in causal inference should prioritize solving efficient score equations and, secondarily, aim to solve a variety of other score equations so as to enforce the empirical balancing property for a large class of functions of the baseline covariates. Conveniently, satisfaction of the latter criterion may lead to automatically satisfying the former. This unified focus on efficiency and empirical balance in terms of score equations of the form Pnh(X)(Z − en(X)) 0 allows for both criteria to be satisfied without reliance upon historically popular but unrealistic modeling assumptions whose utility is severely limited in the complex observational studies common in today’s biomedical, health, and social sciences research. [End Page 31]

Nima S. Hejazi
Department of Biostatistics, T.H. Chan School of Public Health, Harvard University
Boston, MA 02115
Mark J. van der Laan
Division of Biostatistics, School of Public Health, University of California, Berkeley Berkeley, CA 94720


We thank an anonymous associate editor and two peer referees of Biometrics who inspired our careful consideration of the interplay between the balancing property and asymptotically efficient estimation in modern causal inference through their reviews of Ertefaie et al. (2022). MJvdL was partially supported by a grant from the National Institute of Allergy and Infectious Diseases (award no. R01 AI074345).


Heejung Bang and James M Robins. Doubly robust estimation in missing data and causal inference models. Biometrics, 61(4):962–973, 2005.
David Benkeser and Nima S Hejazi. Doubly-robust inference in R using drtmle. Under review at Observational Studies, 2022.
Peter J Bickel, Chris AJ Klaassen, YA’Acov Ritov, and Jon A Wellner. Efficient and Adaptive Estimation for Semiparametric Models. Johns Hopkins University Press, 1993.
David R Cox. Planning of Experiments. Wiley, 1958.
Jeremy R Coyle, Nima S Hejazi, Rachael V Phillips, Lars WP van der Laan, and Mark J van der Laan. hal9001: The scalable highly adaptive lasso, 2022. URL R package version 0.4.3.
Ashkan Ertefaie, Nima S Hejazi, and Mark J van der Laan. Nonparametric inverse-probability-weighted estimators based on the highly adaptive lasso. Biometrics (in press), 2022. URL
Noah Greifer. cobalt: Covariate Balance Tables and Plots, 2022. URL R package version 4.3.2.
Nima S Hejazi, Jeremy R Coyle, and Mark J van der Laan. hal9001: Scalable highly adaptive lasso regression in R. Journal of Open Source Software, 5(53):2526, 2020. URL
Nima S Hejazi, David Benkeser, Iván Díaz, and Mark J van der Laan. Efficient estimation of modified treatment policy effects based on the generalized propensity score. 2022. URL
Miguel A Hernán and James M Robins. Causal Inference: What If. CRC Press, 2023.
Miguel A Hernán and Sarah L Taubman. Does obesity shorten life? the importance of well-defined interventions to answer causal questions. International Journal of Obesity, 32(S3):S8, 2008.
Oliver Hines, Oliver Dukes, Karla Diaz-Ordaz, and Stijn Vansteelandt. Demystifying statistical learning based on efficient influence functions. The American Statistician, 76: 292–304, 2022. URL
Keisuke Hirano and Guido W Imbens. The propensity score with continuous treatments. Applied Bayesian Modeling and Causal Inference from Incomplete-data Perspectives, 226164: 73–84, 2004.
Keisuke Hirano, Guido W Imbens, and Geert Ridder. Efficient estimation of average treatment effects using the estimated propensity score. Econometrica, 71(4):1161–1189, 2003.
Daniel G Horvitz and Donovan J Thompson. A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47(260): 663–685, 1952.
Guido W Imbens. The role of the propensity score in estimating dose-response functions. Biometrika, 87(3):706–710, 2000.
Edward H Kennedy. Semiparametric theory and empirical processes in causal inference. In Statistical Causal Inferences and Their Applications in Public Health Research, pages 141–167. Springer, 2016.
Samuel D Lendle, Bruce Fireman, and Mark J van der Laan. Balancing score adjusted targeted minimum loss-based estimation. Journal of Causal Inference, 3(2):139–155, 2015.
Jerzy Neyman. Contribution to the theory of sampling human populations. Journal of the American Statistical Association, 33(201):101–116, 1938.
Judea Pearl. Brief report: On the consistency rule in causal inference: “Axiom, definition, assumption, or theorem?”. Epidemiology, pages 872–875, 2010.
Johann Pfanzagl and Wolfgang Wefelmeyer. Contributions to a General Asymptotic Statistical Theory. Springer, 1982.
James M Robins and Andrea Rotnitzky. Recovery of information and adjustment for dependent censoring using surrogate markers. In AIDS Epidemiology, pages 297–331. Springer, 1992.
James M Robins and Andrea Rotnitzky. Semiparametric efficiency in multivariate regression models with missing data. Journal of the American Statistical Association, 90(429):122–129, 1995.
Paul R Rosenbaum and Donald B Rubin. The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1):41–55, 1983.
Donald B Rubin. Bayesian inference for causal effects: The role of randomization. Annals of Statistics, pages 34–58, 1978.
Donald B Rubin. Causal inference using potential outcomes: Design, modeling, decisions. Journal of the American Statistical Association, 100(469):322–331, 2005.
Anastasios Tsiatis. Semiparametric Theory and Missing Data. Springer, 2007.
Mark J van der Laan. Targeted estimation of nuisance parameters to obtain valid statistical inference. International Journal of Biostatistics, 10(1):29–57, 2014.
Mark J van der Laan. A generally efficient targeted minimum loss based estimator. Technical Report 343, University of California, Berkeley, December 2015. URL UC Berkeley Division of Biostatistics Working Paper Series.
Mark J van der Laan. A generally efficient targeted minimum loss based estimator based on the highly adaptive lasso. International Journal of Biostatistics, 13(2), 2017.
Mark J van der Laan and Susan Gruber. One-step targeted minimum loss-based estimation based on universal least favorable one-dimensional submodels. International Journal of Biostatistics, 12(1):351–378, 2016.
Mark J van der Laan and James M Robins. Unified Methods for Censored Longitudinal Data and Causality. Springer, 2003.
Mark J van der Laan and Sherri Rose. Targeted Learning: Causal Inference for Observational and Experimental Data. Springer, 2011.
Mark J van der Laan and Sherri Rose. Targeted Learning in Data Science: Causal Inference for Complex Longitudinal Studies. Springer, 2018.
Mark J van der Laan and Daniel Rubin. Targeted maximum likelihood learning. International Journal of Biostatistics, 2(1), 2006.
Mark J van der Laan, Eric C Polley, and Alan E Hubbard. Super Learner. Statistical Applications in Genetics and Molecular Biology, 6(1), 2007.
Mark J van der Laan, David Benkeser, and Weixin Cai. Efficient estimation of pathwise differentiable target parameters with the undersmoothed highly adaptive lasso. International Journal of Biostatistics, (in press), 2022. URL
Aad W van der Vaart. Asymptotic Statistics. Cambridge University Press, 2000.