Search
Duplicate
Try Notion free
4.5 Inference
Bias correction, estimating equations, and TMLE all give efficient estimators that are asymptotically equivalent. Precisely, ψ^ψ=Pnϕ+oP(n1/2)\hat\psi - \psi = \mathbb P_n \phi + o_P(n^{-1/2}) where ϕ\phi is the efficient influence function.
In large samples, remember that we have by the CLT applied to the above that n(ψ^ψ)N(0,σ2)\sqrt n (\hat\psi-\psi) \rightsquigarrow \mathcal N(0, \sigma^2) where σ2=V[ϕ]\sigma^2 = V[\phi]. We can therefore do inference by computing σ2V^[ϕ^]=Pnϕ^2=σ^2\sigma^2 \approx \hat V[\hat\phi] = \mathbb P_n\hat\phi^2 = \hat\sigma^2, which justifies building 95% confidence intervals of the form ψ^±1.96σ^/n\hat\psi \pm 1.96 \hat\sigma/\sqrt n. Since nψ^/σ^\sqrt{n} \hat\psi/\hat\sigma is approximately N(0,1)\mathcal N(0,1), a Z-test on this statistic is appropriate to obtain a p-value.
A brief proof that Pnϕ^2PPϕ2\mathbb P_n \hat\phi^2 \overset{P}{\to} P \phi^2 (i.e. that this estimate is consistent) under the same assumptions we've been operating under is given below and relies on techniques similar to those discussed in the section on naive plug-in estimation.
Proof: Pnϕ^2PPϕ2\mathbb P_n \hat\phi^2 \overset{P}{\to} P \phi^2
We are trying to show that Pnϕ^2Pϕ2=oP(1)\mathbb P_n \hat\phi^2 - P \phi^2 = o_P(1). Decompose the difference as
Pnϕ^2Pϕ2=(PnP)(ϕ^2ϕ2)+(PnP)ϕ2+P(ϕ^2ϕ2)\mathbb P_n \hat\phi^2 - P \phi^2 = (\mathbb P_n - P)(\hat\phi^2 - \phi^2) + (\mathbb P_n - P) \phi^2 + P(\hat\phi^2 - \phi^2)
The first term is an empirical process term that we know is oP(n1/2)o_P(n^{-1/2}) and therefore oP(1)o_P(1) as long as ϕ^2\hat\phi^2 is L2\mathcal L_2-consistent and we have either used sample splitting or satisfied some Donsker conditions. The L2\mathcal L_2-consistency of ϕ^2\hat\phi^2 is indeed guaranteed by the presumed L2\mathcal L_2-consistency of ϕ^\hat\phi as long as ϕ^\hat\phi is not permitted to blow up (e.g. we bound π^a\hat\pi_a, as previously required for efficiency of ψ^\hat\psi). The proof of this is an application of the bounded convergence theorem.
The second term times n\sqrt n converges in distribution to N(0,V[ϕ2])\mathcal N(0,V[\phi^2]) by the central limit theorem and is thus OP(n1/2)O_P(n^{-1/2}) and thus oP(1)o_P(1) (think: if this term "blown up" by n\sqrt n stabilizes to a normal with finite variance, then if we take away the n\sqrt n it must collapse down to its mean, which is 0).
The last term is bounded by ϕ^2ϕ2\|\hat\phi^2 - \phi^2\| according to Cauchy-Schwarz (again assuming ϕ^\hat\phi doesn't blow up) and therefore by the L2\mathcal L_2-consistency of ϕ^2\hat\phi^2 is also oP(1)o_P(1).
Since all three terms are oP(1)o_P(1), their sum is as well, which is what we wanted.
It's also possible to construct alternative estimators for the variance (e.g. bootstrap) or even targeted, efficient estimators using the same methodologies used to estimate ψ\psi in the first place! However, for the most part, the sample variance of the estimated influence function (Pnϕ^2\mathbb P_n \hat\phi^2) suffices.