Importance Sampling: Estimators, Properties, Optimal Proposals, and ESS
Context
You want to estimate an expectation under a target distribution p over X:
mu = E_p[f(X)] = ∫ f(x) p(x) dx.
Direct sampling from p may be hard, but you can sample from an easier proposal q whose support covers that of p.
Tasks
-
Derive the importance sampling identity and the estimator for mu when sampling X_i ~ q.
-
Show both:
-
The unnormalized estimator using weights w(x) = p(x)/q(x).
-
The self-normalized estimator.
-
Discuss bias and variance of each estimator and conditions for finite variance.
-
Derive how the variance depends on q and why the ideal proposal is proportional to |f(x)| p(x).
-
Define and interpret effective sample size (ESS).
-
Provide a concrete numerical example (e.g., target Gaussian, different Gaussian proposal), with pseudocode.
-
Discuss pitfalls: weight degeneracy, heavy-tailed proposals, high-variance tails. Optionally relate to off-policy evaluation (RL) and resampling in particle filters.