Agnostic Notes on Regression Adjustments to Experimental data: Reexamining Freedman’s Critique
In the usual set up of estimating average causal effect in a randomized experiment, Freedman criticized using OLS coefficient of treatment as an estimate when regressing observed outcome on treatment and covariates. To reiterate, based on Neyman’s model, we assume there are n subjects (finite population of interest), assignment (Z) is the only source of randomness, covariates (X), potential outcomes Y(1) and Y(0) are fixed, observed outcomes (Y) are random because they depend on assignment.
Lin overall agrees that just because you have random assignment does not mean you will automatically get necessarily better estimates if you regress with covariates but shows that including a full interaction term will make the estimate at least as good as the two popular ones, ATEunadf (simple difference in means, not considering covariates) and ATEadf (regressing outcome on covariates all at once). In short, Lin’s proposal is to add the interaction term: under the same regularity conditions, OLS adjustment will not 'worsen’ efficiency (as contended by Freedman), Hubert White error estimate is still consistent and asymptotically normal (regardless of whether the interaction term is included) and finally the bias of ATEinteract tends to zero with large enough sample sizes. He creates an intuitive explanation by drawing an analogy between survey sampling and randomised experiments. To summarise, Lin’s main point is that little needs to be changed in the conventional finite population inference to address freedman’s criticism.
Lin’s argument as to why we should include covariates is by analogising the estimation of a population mean. Intuitively, suppose you want to predict the mean of a population Y, but you only have a sample, say, Ys. But you have the whole population of some covariate, say X, where X is something that may or may not have correlation with Y. Mean(YS) will be unbiased but will ignore this correlation. A 'better’ estimate would incorporate this association (or lack thereof): estimating the population mean by including the sample mean, but also the covariate X, whose association with Y we can estimate from the sample.
So the respective argument for randomized experiment is that we have data (covariates) for the entire population (N) so including it to estimate ATE will not worsen the estimate. If there is no association between Y and X, the estimate will just boil down to ATEadj. If there is, ATEinteract will have lower variance because it 'adjusts’ the estimate by giving more importance to the group with fewer units, because this group will have a mean estimate that more biased than the group with more units. Note that if the groups have equal units, then even under ATEadj, the 'overestimation’ and 'underestimation’ will cancel out, leading to the same estimate as ATEinteract.
This naturally explains Lin’s argument for separate regression because for the treatment group, we consider total population to be n and to get average treatment effect we consider it to be a sample mean of n1 units - for control we consider a sample mean of n0 units. We estimate the means of Y(1) and Y(0) separately, and then estimate ATE. If the covariates have exact opposite correlation, ATEinteract reduces to ATEunadj - (the differences cancel out), if not, ATEinteract has lower variance. Note that ATEunadj is a specific case of ATEinteract, if we consider ATEinteract to be the difference of two 'fixed slope’ regression estimators, in which case ATEunadj places a value of zero in the scaling factor.
Lin further argues and demonstrates that Freedman’s contention that the OLS error estimate is inconsistent can be addressed by using the Hubert White error instead: This estimate is also agnostic, meaning that it shows nice asymptotic properties even if the regression model is incorrect. Intuitively, ATEinteract has standard error which includes the variance in Y(1) and Y(0) unexplained by the covariates but 'inversely’ scaled. This also leads to a discussion of an intrinsic parallel between Lin’s estimator and a weighted OLS of Y on X and Z with these 'inverse’ scales as weight.
The difference between variances of ATEinteract and ATEadj can be thought of as the 'paired variance' - 'how better the OLS coefficients are when regression is done separately' - whereas the difference between variances of ATEinteract and ATEunadj can be thought of as 'pooled variance’ - 'how much of Y the covariates are able to explain’.
Lastly, it is also notable that in case the covariate is categorical, ATEinteract gives the same estimate as ATEstra, that is, post-stratification ATE.