Will Rosenbaum (Amherst College)
Joint with: Talya Eden, Jakob Hauen, Shyam Narayanan, and Jakub Tetek
Population of $N$ voters deciding a proposition (yes/no)
Goal. Estimate the number of “yes” votes.
Typical Assumption (unfounded?). We can sample voters uniformly at random.
Procedure:
return
\[\bar\mu = \frac{1}{m} \sum_{i = 1}^{m} N X_i.\]Guarantee:
$N = 108$, $m = 20$, $\mu = 58$
$\bar\mu = \frac 1 m \sum_i N X_i = 59.4$
“Yes” predicted to win!
Samples are not exactly uniform
$N = 108$, $m = 20$, $\mu = 58$
“Yes” voters 10% more likely to be sampled
$\bar\mu = \frac 1 m \sum_i N X_i = 64.8$
“Yes” predicted to win!
$N = 108$, $m = 20$, $\mu = 58$
“No” voters 10% more likely to be sampled
$\bar\mu = \frac 1 m \sum_i N X_i = 48.6$
“Yes” predicted to lose!
Question. Can we estimate $\mu = \sum_i x_i$ to within additive error, say, $0.02N$ using a sub-linear number of samples?
Our Results (special case):
In many contexts, it is assumed that samples can be generated exactly from some distribution:
General Questions
$\mathcal{Q}$ is pointwise $\gamma$-close to $\mathcal{P}$:
\[(1 - \gamma) \mathcal{P}(i) \leq \mathcal{Q}(i) \leq (1 + \gamma) \mathcal{P}(i) \qquad \forall i\]Goal. Estimate $\mu = \sum_i x_i$ to within additive error $\varepsilon_1 \mu_+ + \varepsilon_2$, where
Hansen-Hurwitz Estimator
Sample $m$ elements with replacement
Apply Hansen-Hurwitz to $h$-wise collision statistics
Get $h$-wise collision estimators:
Note. $\xi_1 = \mu_{HH}$.
$h$-wise collision estimators alone are no better than Hansen-Hurwitz…
…but an appropriate linear combination of these estimators result in lower-order errors (in $\gamma$) to cancel out…
We define the bias reducing estimator $\zeta_h$:
Main Result. $\zeta_k$ has bias at most $\gamma^k \mu_+$.
Special Case: $\mathcal{P}$ is uniform, all values $x_i \in \{0, 1\}$
take
\[m = O\left(\sqrt[k]{n^{k-1} \varepsilon_2^{-2} \mathrm{Var}_{HH}}\right)\]then $m$ samples are sufficient to estimate $\mu$ to additive error $\gamma^k \mu + \varepsilon_2$ with probability $2/3$
In original example
$\implies m = O(\sqrt{N})$ samples are sufficient
Main Result (lower bound). Suppose algorithm $A$ guarantees:
Then: for every positive integer $k$, there exists a constant $c_k < 1$ such that for $\varepsilon \leq c_k \gamma^k$, $A$ requires $\Omega(N^{1 - 1 / (k + 1)})$ samples.
Tighten values of $c_k$
Sample correction: can we reduce bias of individual samples? Can we do this in an amortized fashion?
Can other algorithms/estimation tasks be made robust to sample bias?
General case:
Case $k = 2$:
Suppose $\mathcal{Q}(i) = (1 + \alpha_i) / N$, $\vert\alpha_i\vert \leq \gamma$
Then:
So:
Theorem (Raskhodnikova et al. 2009). If two distribution $\mathcal{D}_1$ and $\mathcal{D}_2$ have same $p$th frequency moments for $p = 1, 2,\ldots,k$, then $\Omega(N^{1 - 1/(k+1)})$ samples are required to distinguish $\mathcal{D}_1$ from $\mathcal{D}_2$.
We Construct. Distributions $\mathcal{D}_1$ and $\mathcal{D}_2$ with:
Given $\mathcal{D}_1$ and $\mathcal{D}_2$, set $N = n_1 + n_2$, $\mathcal{Q} = \frac 1 2 \mathcal{D}_1 + \frac 1 2 \mathcal{D}_2$
Scenario 1:
\[x_i = \begin{cases} 1 & i \in \mathrm{supp}(\mathcal{D}_1)\\ 0 & i \in \mathrm{supp}(\mathcal{D}_2) \end{cases}\]Scenario 2:
\[x_i = \begin{cases} 0 & i \in \mathrm{supp}(\mathcal{D}_1)\\ 1 & i \in \mathrm{supp}(\mathcal{D}_2) \end{cases}\]Then estimating $\sum_i x_i$ with error $O(\gamma^k)$
$\implies$ distinguishing scenarios 1 and 2
$\implies$ distinguishing $\mathcal{D}_1$ and $\mathcal{D}_2$