4.5 Inference

4.5 Inference 

Bias correction, estimating equations, and TMLE all give efficient estimators that are asymptotically equivalent. Precisely, ψ^−ψ=Pnϕ+oP(n−1/2)\hat\psi - \psi = \mathbb P_n \phi + o_P(n^{-1/2})ψ^​−ψ=Pn​ϕ+oP​(n−1/2)﻿ where ϕ\phiϕ﻿ is the efficient influence function. 

In large samples, remember that we have by the CLT applied to the above that n(ψ^−ψ)⇝N(0,σ2)\sqrt n (\hat\psi-\psi) \rightsquigarrow \mathcal N(0, \sigma^2)n​(ψ^​−ψ)⇝N(0,σ2)﻿ where σ2=V[ϕ]\sigma^2 = V[\phi]σ2=V[ϕ]﻿. We can therefore do inference by computing σ2≈V^[ϕ^]=Pnϕ^2=σ^2\sigma^2 \approx \hat V[\hat\phi] = \mathbb P_n\hat\phi^2 = \hat\sigma^2σ2≈V^[ϕ^​]=Pn​ϕ^​2=σ^2﻿, which justifies building 95% confidence intervals of the form ψ^±1.96σ^/n\hat\psi \pm 1.96 \hat\sigma/\sqrt nψ^​±1.96σ^/n​﻿. Since nψ^/σ^\sqrt{n} \hat\psi/\hat\sigman​ψ^​/σ^﻿ is approximately N(0,1)\mathcal N(0,1)N(0,1)﻿, a Z-test on this statistic is appropriate to obtain a p-value.

A brief proof that Pnϕ^2→PPϕ2\mathbb P_n \hat\phi^2 \overset{P}{\to} P \phi^2Pn​ϕ^​2→PPϕ2﻿ (i.e. that this estimate is consistent) under the same assumptions we've been operating under is given below and relies on techniques similar to those discussed in the section on naive plug-in estimation.

Proof: Pnϕ^2→PPϕ2\mathbb P_n \hat\phi^2 \overset{P}{\to} P \phi^2Pn​ϕ^​2→PPϕ2﻿

We are trying to show that Pnϕ^2−Pϕ2=oP(1)\mathbb P_n \hat\phi^2
-
P \phi^2
= 
o_P(1)Pn​ϕ^​2−Pϕ2=oP​(1)﻿. Decompose the difference as

\mathbb P_n \hat\phi^2 - P \phi^2 = (\mathbb P_n - P)(\hat\phi^2 - \phi^2) + (\mathbb P_n - P) \phi^2 + P(\hat\phi^2 - \phi^2)

The first term is an empirical process term that we know is oP(n−1/2)o_P(n^{-1/2})oP​(n−1/2)﻿ and therefore oP(1)o_P(1)oP​(1)﻿ as long as ϕ^2\hat\phi^2ϕ^​2﻿ is L2\mathcal L_2L2​﻿-consistent and we have either used sample splitting or satisfied some Donsker conditions. The L2\mathcal L_2L2​﻿-consistency of ϕ^2\hat\phi^2ϕ^​2﻿ is indeed guaranteed by the presumed L2\mathcal L_2L2​﻿-consistency of ϕ^\hat\phiϕ^​﻿ as long as ϕ^\hat\phiϕ^​﻿ is not permitted to blow up (e.g. we bound π^a\hat\pi_aπ^a​﻿, as previously required for efficiency of ψ^\hat\psiψ^​﻿). The proof of this is an application of the bounded convergence theorem.

The second term times n\sqrt nn​﻿ converges in distribution to N(0,V[ϕ2])\mathcal N(0,V[\phi^2])N(0,V[ϕ2])﻿ by the central limit theorem and is thus OP(n−1/2)O_P(n^{-1/2})OP​(n−1/2)﻿ and thus oP(1)o_P(1)oP​(1)﻿ (think: if this term "blown up" by n\sqrt nn​﻿ stabilizes to a normal with finite variance, then if we take away the n\sqrt nn​﻿ it must collapse down to its mean, which is 0).

The last term is bounded by ∥ϕ^2−ϕ2∥\|\hat\phi^2 - \phi^2\|∥ϕ^​2−ϕ2∥﻿ according to Cauchy-Schwarz (again assuming ϕ^\hat\phiϕ^​﻿ doesn't blow up) and therefore by the L2\mathcal L_2L2​﻿-consistency of ϕ^2\hat\phi^2ϕ^​2﻿ is also oP(1)o_P(1)oP​(1)﻿.

Since all three terms are oP(1)o_P(1)oP​(1)﻿, their sum is as well, which is what we wanted.

It's also possible to construct alternative estimators for the variance (e.g. bootstrap) or even targeted, efficient estimators using the same methodologies used to estimate ψ\psiψ﻿ in the first place! However, for the most part, the sample variance of the estimated influence function (Pnϕ^2\mathbb P_n \hat\phi^2Pn​ϕ^​2﻿) suffices. 

⬅️

BACK: 4.4 Targeted Maximum Likelihood