
Revisiting the Propensity Score’s Central Role: Towards Bridging Balance and Efficiency in the Era of Causal Machine Learning
About forty years ago, in a now–seminal contribution, Rosenbaum and Rubin (1983) introduced a critical characterization of the propensity score as a central quantity for drawing causal inferences in observational study settings. In the decades since, much progress has been made across several research frontiers in causal inference, notably including the reweighting and matching paradigms. Focusing on the former and specifically on its intersection with machine learning and semiparametric efficiency theory, we reexamine the role of the propensity score in modern methodological developments. As Rosenbaum and Rubin (1983)’s contribution spurred a focus on the balancing property of the propensity score, we reexamine the degree to which and how this property plays a role in the development of asymptotically efficient estimators of causal effects; moreover, we discuss a connection between the balancing property and efficient estimation in the form of score equations and propose a score test for evaluating whether an estimator achieves empirical balance.
Propensity Score, Balancing Score, Semiparametric Efficiency, Score Test, Sieve Estimation, Undersmoothing, Machine Learning, Philosophy
1. The Propensity Score’s Central Role: A Retrospective
In observational studies that aim to evaluate the causal effects of welldefined interventions (e.g., Hernán and Taubman, 2008; Pearl, 2010), a key inferential obstacle arises in the form of potential confounding of the treatment–response relationship by baseline (pretreatment) covariates. Unlike observational studies, randomized controlled trials (RCTs) feature a builtin safeguard against this form of confounding—specifically, since the treatment’s allocation among study units is, on average, balanced across strata defined by the baseline covariates, confounding of the treatment–response relationship is theoretically expected to be a nonissue. It is this inferential safeguard that is in part responsible for RCTs being considered as “gold standard” tools for generating evidence in biomedical and [End Page 23] health research. As observational studies lack any such builtin protective measure, significant care is required, in both the design and analysis stages, to mitigate confounding by addressing systematic differences between treatment and control groups. Such considerations come in the form of assumptions like that of strong ignorability (an observational analog to the randomization assumption) and adjustment for baseline covariates when estimating nuisance parameters critical for the construction of estimators of the causal effects of interest. Rosenbaum and Rubin (1983) introduced the propensity score (the probability of receiving treatment conditional on baseline covariates) and related this quantity to the issue of balance between treatment groups. In their contribution, these authors outline the propensity score’s membership in a class of balancing scores, which satisfy conditional independence between treatment assignment and baseline covariates. Through their analysis, these authors relate this form of conditional independence to the (untestable) strong ignorability assumption and show that the propensity score plays a critical role in causal inference, with uses in matching study units, in stratification for weightingbased adjustment, and in covariance adjustment in analyses centered on the general linear model. On account of the propensity score’s close correspondence with the notion of balance, a sharp focus has been placed on this particular property in the decades since Rosenbaum and Rubin (1983)’s contribution, yet most ideas have been restricted to viewing balance in terms of the properties of parametric modelingbased estimators of the propensity score. In this case, the balancing property of a propensity score estimator is the finitesample analogue of the conditional independence assumption that theoretically defines a balancing score. Taking a view rooted outside the traditional culture of parametric modeling, we define a notion of balance for an estimator of the propensity score, with respect to particular functions of baseline covariates, showing that this functionspecific balancing property corresponds with solving functionspecific score equations, connecting this balancing property with formal criteria for asymptotically efficient estimation.
To formalize our arguments, consider an observational study that collects data on n units, O_{1}, . . . , O_{n}, sampled independently and identically from an (unknown) distribution P_{0}, that is, O ~ P_{0} ∈ ℳ, where ℳ is a realistic (nonparametric) statistical model placing only minimal and plausible restrictions on the form of P_{0}. The data available on the i^{th} unit O_{i} may be partitioned by timeordering as (X_{i}, Z_{i}, R_{i}), for a vector of baseline covariates X, a binary treatment Z ∈ {0, 1}, and a response R (where possible, we borrow the notation of Rosenbaum and Rubin (1983) in homage). For any unit O, R(0) and R(1) are the potential outcomes (Neyman, 1938; Rubin, 1978, 2005) of R, mutually unobservable quantities that arise when the treatment Z takes the values Z = 0 and Z = 1, respectively. The population average treatment effect (ATE) is τ_{0} = _{0}[R(1)−R(0)], a difference of counterfactual means; the naught subscript refers to the true distribution P_{0}. Under standard assumptions of consistency (R(1) ≡ R  Z = 1) (Pearl, 2010), lack of interference (Cox, 1958), positivity of treatment assignment (δ < ℙ (Z = 1  X) < 1−δ for δ > 0), and strong ignorability of treatment assignment (i.e., randomization of treatment assignment in RCTs), τ_{0} is identifiable; furthermore, it is estimable by either substitution (plugin) or inverse probability weighted (IPW) estimators, which arise from differing identification strategies. A substitution estimator takes the form τ^{SUB}_{n} = _{n,X}{Q̅_{n}(1, X) − Q̅_{n}(0, X)}, where Q̅_{n}(Z, X) := _{n}(R  Z, X) is an estimator of the true response mechanism’s conditional mean Q̅_{0}(Z, X) := _{0}(R  Z, X), while an IPW estimator is , where [End Page 24] e_{n}(X) := ℙ_{n}(Z = 1  X) is an estimator of the true propensity score e_{0}(X) := ℙ_{0}(Z = 1  X). In both cases, expectations w.r.t. X are simply computed via an empirical mean. Throughout, as with the naught subscript, the subscript n denotes estimates based on the empirical distribution P_{n} of the sampled units O_{1}, . . . , O_{n}. On occasion, we will rely upon standard notation from empirical process theory; specifically, we will let P f := ʃ f(O)dP and P_{n}f := n^{−}^{1} ∑_{i}_{=1}^{n} f (O_{i}) whenever convenient.
Salient to arguments about the balancing property, IPW estimators may be viewed as achieving balance across treatment conditions by upweighting (or downweighting) units from strata of X that are underrepresented (or overrepresented) in each of the treatment groups, resulting in an artificially constructed pseudopopulation (Horvitz and Thompson, 1952; Hernán and Robins, 2023) in which treatment assignment Z is empirically marginally independent of (or “balanced on”) X. That is, inverse probability weighting by treatment propensity, using e_{0}(X), creates a hypothetical pseudopopulation in which empirically X╨Z (essentially by creating copies of underrepresented units), allowing for inference on the effect of Z on R without confounding by X. This form of balance corresponds with the (conditional) independence condition of Rosenbaum and Rubin (1983) (X╨Z  e_{0}(X)), which clarifies that the propensity score e_{0}(X) can be used to mitigate confounding of the Z–R relationship by X, by enforcing conditional independence of X and Z given e_{0}(X). Rosenbaum and Rubin (1983) use this conditional independence relationship to theoretically characterize balancing scores, defining a balancing score b_{0}(X) as satisfying the condition X╨Z  b_{0}(X). The propensity score is itself a balancing score, e_{0}(X) = f(b_{0}(X))—in fact, it is the “coarsest” among this class of scores in the sense of minimally satisfying this form of independence. We emphasize that the empirical analogue of this balancing property applied to an estimator e_{n}(X) of e_{0}(X) drives the efficiency of estimators of the causal effect of interest; for example, IPW estimators using e_{0}(X) are consistent and asymptotically linear but very inefficient (e.g., van der Laan and Robins, 2003). While the theoretical characterization of balance intuitively explains how this property is central to constructing consistent estimators of causal effects, it ignores a property generally desirable in any estimator—efficiency.
2. The Empirical Balance–Efficiency Tradeoff
Although the propensity score’s use in inducing empirical balance between treatment groups has intuitive appeal, classical estimators relying solely on this property for a specific parametric model, as traditionally formulated, generally fail to achieve asymptotic efficiency, even when the parametric model for the propensity score is correctly specified. In fact, it has been established that neither the IPW estimator nor the propensity score stratificationbased estimator (proposed by Rosenbaum and Rubin (1983)) are generally asymptotically efficient. Drawing upon semiparametric efficiency theory, we note that the efficient influence function (EIF) arises as a central object, uniquely represented as the canonical gradient of the pathwise derivative of the target parameter (τ_{0}) at a distribution P in the model M (see, e.g., Bickel et al., 1993; van der Vaart, 2000; van der Laan and Robins, 2003; Tsiatis, 2007; Kennedy, 2016; Hines et al., 2022). The EIF’s importance comes from the fact that it characterizes the best possible asymptotic variance, or nonparametric efficiency bound, among all regular asymptotically linear estimators of a target parameter. For this [End Page 25] reason, the EIF is commonly used as a critical ingredient in strategies for the construction of efficient estimators. For example, efficient estimation frameworks popular in modern practice—such as onestep estimation (Pfanzagl and Wefelmeyer, 1982; Bickel et al., 1993), estimating equations (van der Laan and Robins, 2003; Bang and Robins, 2005), and targeted minimum loss estimation (van der Laan and Rubin, 2006; van der Laan and Rose, 2011)—construct estimators by unique updating procedures that each reference the form of the EIF in different ways, resulting in candidate estimators with desirable asymptotic behavior. Under standard regularity conditions, an estimator τ_{n} of the target parameter τ_{0} is asymptotically linear when
where D^{⋆}(P_{0}) is the EIF at the true datagenerating distribution P_{0}. An asymptotically linear estimator τ_{n} is generally (asymptotically) efficient when it solves the EIF estimating equation, i.e., P_{n}D^{⋆}(P_{0}) ≈ 0, in which case τ_{n} has limit distribution N(τ_{0}, P {D^{⋆}(P_{0})}^{2} ) with asymptotic variance matching that of the EIF D^{⋆}(P_{0}). As such, representations of the EIF play a key role in constructing efficient estimators.
2.1 Score Equations Characterize Asymptotic Efficiency
In causal inference problems, the EIF is indexed by at least two nuisance quantities, among which the propensity score invariably appears. It is in this way that e_{0}(X) plays a key role in the construction of efficient estimators. Going forward, to simplify notational burden, we focus on a single component of the ATE, the counterfactual mean of the response under treatment Z = 1, i.e., τ_{0} := _{0}R(1). The EIF D^{⋆}(P ) of τ_{0} at the datagenerating distribution P_{0} ∈ℳ may be expressed
The form of expression (2) for the EIF of τ_{0} is rather instructive, revealing that the propensity score enters into a score term for the response mechanism h(X)(R − Q̅_{0}(Z, X)), with h(X) = (Z = 1)/e_{0}(X), as an inverse weight applied to a residual for the conditional mean of the response given (Z, X). This expression for the EIF places emphasis on estimation of the response mechanism Q̅_{0}(Z, X), highlighting that an efficient estimator must incorporate a nuisance estimator Q̅_{n}(Z, X) that suitably solves the EIF estimating equation, P_{n}D^{⋆}(Q̅_{n}, e_{n}) ≈ 0. Owing to this emphasis on Q̅_{0}(Z, X), such expressions for the EIF are most amenable to the construction of efficient estimators relying upon the substitution formula, such as in targeted minimum loss estimation (van der Laan and Rose, 2011), which updates initial estimates of Q̅_{n} via a onedimensional parametric update step that depends on the EIF’s form.
Unlike their substitutionbased counterparts, IPW estimators rely solely on the propensity score. Saliently, an alternative expression for the EIF—the Augmented IPW (AIPW) representation of Robins and Rotnitzky (1992, 1995)—is more suitable for their characterization: [End Page 26]
The AIPW representation (3) of the EIF can be shown to be equivalent to expression (2) but stresses instead the importance of the propensity score through the score term for the treatment mechanism h(X)(Z − e_{0}(X)), with h(X) = Q̅_{0}(1, X)/e_{0}(X). As indicated, the form of (3) admits a decomposition: the first term D_{IPW} is the IPW estimating equation (the mapping defining these Zestimators) while the second term D_{CAR} is a projection (of D_{IPW} onto the space of all functions of (Z, X) that are meanzero conditional on X) satisfying coarseningatrandom (CAR) (van der Laan and Robins, 2003). Since IPW estimators are defined as solutions to P_{n}D_{IPW}(e_{n}) ≈ 0, this term is trivially solved by construction of τ_{n}; then, expression (3) states that an efficient IPWtype estimator must be a solution to P_{n}D_{CAR}(e_{n}, Q̅_{n}) ≈ 0 principally through a suitable estimator of the propensity score e_{n}(X), suggesting that one should prioritize an estimator e_{n}(X) that satisfies this criterion when seeking an efficient estimator τ_{n} of τ_{0}. Again, note that, even when e_{0}(X) is known, using e_{0}(X) would fail to solve relevant score equations, including P_{n}D_{CAR}(e_{n}, Q̅_{n}) ≈ 0. Together, expressions (2) and (3) provide criteria for the construction of efficient estimators—notably, both depend on proper estimation of the propensity score e_{n}(X) in the sense that it solves key score equations.
2.2 Score Equations Characterize Directional Empirical Balance
From expressions (2) and (3), we have seen that score terms of the form h(X)(R−Q̅_{0}(Z, X)) and h(X)(Z−e_{0}(X)), for particular choices of the weighting function h(X), play critical roles in criteria for asymptotic efficiency. We now argue that such score terms also characterize the empirical balancing property, though this has not been, to the best of our knowledge, extensively explored to date. Recall that Rosenbaum and Rubin (1983) characterize the theoretical balancing property in terms of a statistical conditional independence condition X╨Z  e_{0}(X). This view on the balancing property has motivated the development of a host of diagnostic procedures to evaluate the samplelevel balance provided by candidate estimators e_{n}(X), with graphical procedures (e.g., the “Love plot”) enjoying much popularity and software implementations often accruing many thousands of downloads (e.g., Greifer (2022)’s cobalt R package). Yet, the popularity of such approaches belies their scientific and statistical value: these diagnostic techniques are limited to revealing only whether an estimator e_{n}(X) induces empirical balance with respect to the statistical model underlying the estimator. For example, logistic regression remains an exceedingly popular candidate estimator of the propensity score but intrinsically assumes the (logit of) the conditional mean of Z given X to be adequately described as a linear function of X. Of course, this amounts to a significant (and often unrealistic) restriction on the statistical model ℳ. Much worse though, this assumption is usually imposed only for the sake of mathematical convenience, hardly ever motivated by domain knowledge. Even when the parametric model for the propensity score is correct, these techniques check only the degree to which a maximum likelihood estimator e_{n}(X) satisfies empirical balance within restrictive (small) statistical models, thereby only achieving empirical balance Z╨f(X), given e_{n}(X), for a [End Page 27] very limited set of functions f(X). In so doing, the balancing property w.r.t. the chosen parametric model is emphasized over such fundamental concerns as the efficiency of the estimator τ_{n}, ignoring even the fact that the resulting estimator will generally fail to even achieve consistency.
Fortunately, focusing on univariate or multivariate balance under parametric modeling assumptions—convenient as it may be—is hardly the only option. When the propensity score estimator e_{n}(X) is selected as a solution to score equations of the form P_{n}s(e_{n}; f) ≈ 0, for s(Z, X; f) = f(X)(Z−e_{0}(X)), an fspecific form of conditional independence is satisfied, Z╨f(X)  e_{n}(X). Lemma 1 summarizes this.
Lemma 1 (Scorebased Balance) Let ℱ contain a rich class of functions and, for an arbitrary f ∈ ℱ, define scores of the form s(Z, X; f) = f(X)(Z − e_{0}(X)). When a corresponding score equation P_{n}s(e_{n}; f) ≈ 0 is solved for a given f, the null hypothesis H_{0}(f) : _{0}(Z  f(X), e_{n}(X)) = _{0}(Z  e_{n}(X)) holds, as the data provide no signal against H_{0}(f); moroever, no valid test of H_{0}(f) will reject this null hypothesis. When H_{0}(f) holds, f(X) contains no information, beyond that captured by e_{n}(X), useful for predicting treatment status Z from covariates X—that is, the empirical balance induced by e_{n}(X) cannot be improved by f(X).
Lemma 1 provides a scorebased criterion for characterizing the empirical balancing property and frames its evaluation in terms of a class of hypothesis tests. When a sequence of such tests uniformly fails to reject a family of fspecific null hypotheses {H_{0}(f) : f ∈ℱ}, there is no empirical evidence to contradict the equality under the null for the family of f ∈ ℱ; then, no such f(X) contains information about Z not already captured by e_{n}(X), implying that e_{n}(X) enforces balance in this fspecific sense. Since P_{n}s(e_{n}; f) may be viewed as a measure of the degree to which Z is independent of f(X), given e_{n}(X), this fspecific empirical balance may be enforced via score tests for H_{0}(f) by selecting e_{n}(X) so as to ensure P_{n}s(e_{n}; f) ≈ 0.
To employ such a hypothesis testing strategy, consider a model logit[ _{0}(Z  X)] = logit[e_{n}(X)]+βf(X) for f ∈ℱ, in which e_{n}(X) is taken as an offset. Under this assumption, the null hypothesis may be reframed, for fixed f ∈ ℱ, as H_{0}(f) : β = 0 against the alternative H_{1}(f) : β ≠ 0, so that a hypothesis test of this form corresponds to testing the null of independence. Basing such a hypothesis test on the score of the empirical loglikelihood, given by _{n}f(X)(Z − e_{n}(X)), at β = 0 leads to a score test that simply evaluates the magnitude of this score as a test statistic, rejecting the null hypothesis when it moves appreciably far from zero. When the estimator e_{n}(X) solves the score equation _{n}f(X)(Z − e_{n}(X)) ≈ 0, there cannot be evidence against H_{0}(f). As noted above, when ℱ is a rich class, and this type of score is satisfied for many f(X), then this score test fails to reject the null hypothesis H_{0}(f) : β = 0 in the class ℱ, implying that _{0}(Z  f(X), e_{n}(X)) = _{0}(Z  e_{n}(X))—that is, X╨Z  e_{n}(X). Finally, when ℱ is rich enough to include weighting functions appearing in the EIF, e.g., Q̅_{n}(1, X)/e_{n}(X) in expression (3), then e_{n}(X) will both enforce balance in X over ℱ and the resultant estimator τ_{n} will be asymptotically efficient on account of satisfying the EIF estimating equation P_{n}D_{CAR}(e_{n}, Q̅_{n}) ≈ 0. Of course, satisfying this scorebased balancing criterion for many f ∈ ℱ, providing balance over X w.r.t. the linear span of these f, may also automatically solve the score equation for the f(X) appearing in the relevant EIF estimating equation. [End Page 28]
3. Statistical Techniques for Solving Score Equations
To this point we have reviewed the critical role that score equations play in asymptotic efficiency, through their appearance in the EIF estimating equation, and outlined their connection to the notion of empirical balance proposed by Rosenbaum and Rubin (1983) via Lemma 1 and our proposed score test. Owing to their critical role in constructing efficient estimators, several classes of techniques for solving score equations have been proposed at the interface of causal machine learning and semiparametric efficiency theory. We next selectively review two successful frameworks that have been applied to this end: targeted minimum loss estimation (van der Laan and Rubin, 2006) and nonparametric sieve estimation.
3.1 Targeted Updating of the Propensity Score Estimator
The targeted minimum loss estimation (or targeted learning) framework focuses on constructing efficient substitution estimators, and, as such, features a targeting (or updating) step usually applied only to initial estimates of the response mechanism Q̅_{n}(Z, X). For example, a TML estimator τ ^{⋆}_{n} of the counterfactual mean τ_{0} is constructed in two steps, by, first, generating an initial estimate Q̅_{n}(Z, X) (for which flexible machine learning strategies (e.g., van der Laan et al., 2007) are recommended) and, second, perturbing the initial estimate using a univariate parametric model logit(Q̅⋆_{n}(Z, X)) = logit(Q̅_{n}(Z, X)) + ϵh(X), where h(X) = (Z = 1)/e_{n}(X) as in expression (2). This correction ensures that Q̅⋆_{n}(Z, X) is free of “plugin bias,” allowing the TML estimator τ ^{⋆}_{n} to achieve asymptotic linearity by acting as an approximate solution to the EIF estimating equation, making the TML estimator τ ^{⋆}_{n} asymptotically efficient. In principle, this same perturbation strategy may be applied to updating propensity score estimators e_{n}(X) so as to enforce balance (or other such desiderata) while still ensuring that the estimators remain as solutions to the EIF estimating equation. We note that the targeted learning framework has been, and continues to be, the subject of fervent research (van der Laan and Rose, 2011, 2018), and that there are many variations of TML estimators suited to different goals, even including those adjusting specifically for balancing scores (Lendle et al., 2015).
In considering alternative applications of the targeted updating approach, van der Laan (2014, see Theorem 1) proposed a procedure for the construction of targeted IPW estimators. Such IPW estimators are based on applying the onedimensional parametric update logit(e^{⋆}_{n}(X)) = logit(e_{n}(X))+ϵh(X) to map an initial propensity score estimator e_{n}(X) into an updated version e^{⋆}_{n}(X) so as to satisfy asymptotic linearity, though asymptotic efficiency remained elusive. As IPW estimation avoids explicit modeling of the response mechanism, constructing efficient estimators is both philosophically and technically challenging, as such estimators must solve the D_{CAR} component of the EIF, which includes the response mechanism in the numerator of the inverse weight of the score term, i.e., h(X) = Q̅_{n}(1, X)/e_{n}(X) (as in expression (3)). One may, for example, take an approach based in universal least favorable update models (van der Laan and Gruber, 2016), tracking updates to the estimator e^{⋆}_{n}(X) locally along a path in a onedimensional parametric model formulated to solve the score equation P_{n}h(X)(Z − e_{n}(X)) ≈ 0. To circumvent modeling the response mechanism, van der Laan (2014) proposed instead replacing Q̅_{n}(1, X) with an estimator of an artificial nuisance parameter Q̅^{r}_{0}(1, X) := _{0}(R  Z,ē_{n}(X)), in which ē_{n}(X) is a fixed [End Page 29] summary measure of the covariates X. Constructing such artificial nuisance quantities requires mathematically sophisticated dimension reduction efforts, that, while difficult to generalize, avoid the logical contradiction associated with directly modeling the response R. Theoretical investigations revealed these targeted IPW estimators to satisfy asymptotic linearity but fail to satisfy standard regularity conditions, sharply limiting their practical use. These targeted IPW estimators were implemented and are available in the drtmle R package (Benkeser and Hejazi, 2022).
3.2 Undersmoothing of the Propensity Score Estimator
As we have seen, asymptotically efficient estimators can only be constructed as solutions to score equations (e.g., P_{n}D_{CAR}(e_{n}, Q̅_{n}) ≈ 0) rooted in careful study of the EIF. In the context of IPW estimation, these scores take the general form s(Z, X; h) = h(X)(Z−e_{n}(X)), where h(X) is an appropriate weighting function, e.g., h(X) = Q̅_{n}(1, X)/e_{n}(X) when τ_{0} is the counterfactual mean under treatment. When e_{n}(X) is constructed by way of a sieve MLE (or NPMLE, when such exists), undersmoothing may be applied to this initial estimator so as to ensure it solves relevant score equations, i.e., P_{n}s(e_{n}; h) ≈ 0. For such efforts to succeed, not only must e_{n}(X) converge to e_{0}(X) at a suitably fast rate (i.e., rateconsistency), but e_{n}(X) must act as an MLE over the set of statistical models over which the sieve is applied. In recent work, Ertefaie et al. (2022) demonstrated, both theoretically and practically, that undersmoothing of a highly adaptive lasso (HAL) estimator of e_{n}(X) allows for the construction of asymptotically linear and efficient IPW estimators of τ_{0}. Relatedly, van der Laan et al. (2022) proved that substitution estimators using undersmoothed HAL are efficient as well, which, together with Ertefaie et al. (2022), demonstrates the utility of HAL across two important and popular classes of estimators. The properties of these two HALbased estimators of causal effects rely on key properties of the HAL estimator itself, first investigated by van der Laan (2015, 2017), who showed HAL to be an MLE for a model ℳ_{λ}_{0} indexed by the true sectional variation norm λ_{0} of the HAL representation of a target functional (e.g., e_{0}_{,λ0} (X)). The HAL estimator assumes that the target functional, e_{0}_{,λ0} (X) in this context, to be càdlàg (i.e., RCLL) and of bounded sectional variation norm (λ_{0} < ∞); this estimator is available in the open source hal9001 R package (Coyle et al., 2022; Hejazi et al., 2020). Using a HALMLE for e_{n,λn}(X), Ertefaie et al. (2022) outlined conditions and formulated selection procedures for appropriately undersmoothing e_{n,λn}(X) such that the resultant IPW estimator τ^{IPW}_{n,λn} would attain the nonparametric efficiency bound, solving the EIF estimating equation by acting as a solution to P_{n}D_{CAR}(e_{n,λn}, Q̅_{n}) ≈ 0. Contemporaneously, Hejazi et al. (2022) also investigated novel undersmoothing selection criteria agnostic to the EIF’s form for HALbased estimators of the generalized propensity score (Imbens, 2000; Hirano and Imbens, 2004).
Prior work has considered the undersmoothing of propensity score estimators, notably including Hirano et al. (2003), who proposed the use of a logistic series estimator for e_{n}(X). While these authors showed undersmoothing of their e_{n}(X) estimator to be capable of generating an efficient IPW estimator of the ATE, their approach is practically limited by requiring the propensity score e_{0}(X) to be a (highly smooth) ktimes differentiable function of X. What’s more, this and related approaches fail to appropriately emphasize that efficiency of the estimator τ_{n} is a direct result of e_{n}(X) solving the score equation arising from the [End Page 30] EIF. By contrast, the more recent developments of Ertefaie et al. (2022) highlight this point; furthermore, based on our Lemma 1, a natural extension of the latter approach would be to focus on solving a broad range of score equations (of the form P_{n}h(X)(Z − e_{n,λn}(X)) ≈ 0), ultimately resulting in both the downstream estimator τ_{n} attaining asymptotic efficiency and the propensity score estimator e_{n}(X) inducing empirical balance without reliance on brittle modeling assumptions. Of course, our proposed score test implies that solving a range of score equations yields an estimator satisfying the empirical balancing property, positioning undersmoothing as a potentially critical tool for achieving empirical balance and, thereby, asymptotic efficiency. Nevertheless, in finite samples, beyond using HAL, one may wish to enforce additional empirical balance in particular fspecific directions, in which case undersmoothing of HAL may be paired with the scorepreserving properties of targeted updating to ensure efficiency while accommodating balance w.r.t. userspecified functions of X.
4. The Propensity Score’s Role in Modern Causal Inference
The propensity score holds an integral place in causal inference, playing important roles in terms of both theoretical and empirical balance and asymptotic efficiency as outlined by developments in semiparametric theory. We have argued that this empirical balancing property corresponds to solving particular types of score equations, generally of the form P_{n}h(X)(Z − e_{n}(X)) ≈ 0, and we have proposed a score test for evaluating the degree to which a candidate propensity score estimator e_{n}(X) achieves balance. By characterizing the empirical balancing property in terms of the solving of particular score equations, we circumvent the diagnostic strategies popular today, which focus on checking empirical balance across only a subset of (necessarily discrete) covariates X or are derived from and reliant upon the brittle assumptions underlying parametric estimators of e_{0}(X), e.g., logistic regression. This form of empirical balance corresponds with approximately solving a limited set of score equations derived from focusing upon particular subsets of covariates.
This characterization bridges the gap between satisfaction of the empirical balancing property and wellstudied strategies for asymptotically efficient estimation based on the solving of efficient score equations (e.g., the D_{CAR} component of the EIF). By demonstrating that there need not be a disconnect between the dual desiderata of empirical balance and asymptotic efficiency, we reduce enforcing the empirical balancing property (and diagnosing deviations from it) to the solving of score equations—achievable within wellstudied frameworks like targeted learning or nonparametric sieve estimation. We argue that modern propensity scorebased estimation strategies in causal inference should prioritize solving efficient score equations and, secondarily, aim to solve a variety of other score equations so as to enforce the empirical balancing property for a large class of functions of the baseline covariates. Conveniently, satisfaction of the latter criterion may lead to automatically satisfying the former. This unified focus on efficiency and empirical balance in terms of score equations of the form P_{n}h(X)(Z − e_{n}(X)) ≈ 0 allows for both criteria to be satisfied without reliance upon historically popular but unrealistic modeling assumptions whose utility is severely limited in the complex observational studies common in today’s biomedical, health, and social sciences research. [End Page 31]
Boston, MA 02115
nhejazi@hsph.harvard.edu
laan@berkeley.edu
Acknowledgments
We thank an anonymous associate editor and two peer referees of Biometrics who inspired our careful consideration of the interplay between the balancing property and asymptotically efficient estimation in modern causal inference through their reviews of Ertefaie et al. (2022). MJvdL was partially supported by a grant from the National Institute of Allergy and Infectious Diseases (award no. R01 AI074345).