Talya Eden and **Will Rosenbaum**

Tel Aviv University

Suppose $G = (V, E)$ is a very large graph

Access to $G$ is provided via queries:

- sample a (uniformly random) vertex $v \in V$
*degree query*: return the degree of vertex, $d(v)$*neighbor query*: return the $i$-th neighbor of $v$*pair query*: return whether or not $(u, v) \in E$

**Main Question:** How many such queries are necessary and sufficient to sample an edge $e \in E$ from a(n almost) uniform distribution?

Query Access Model(s)

- introduced in the context of property testing
- standard for sublinear (time) graph algorithms

Random Edge Samples

- used in many sublinear graph algorithms (testing bipartiteness, counting triangles & cliques, estimating moments of the degree distribution)
- used as a basic query by Aliakbarpour et al. (2016) to estimate number of star subgraphs

**Theorem 1.** There exists an algorithm $\mathcal{A}$ that given query access to any graph $G = (V, E)$ with $n = |V|$ and $m = |E|$ returns a random edge $e \in E$ such that:

- each edge is returned with probability $(1 \pm \varepsilon) / m$
- $\mathcal{A}$ uses $O{\big(}n\, {\big /} \sqrt{\varepsilon m}\big)$ degree and neighbor queries in expectation.

**Thoerem 2.** Any algorithm that samples edges from an almost-uniform distribution requires $\Omega\big(n\, \big / \sqrt{m}\big)$ degree, neighbor, or pair queries.

**Note.** In this talk, we'll focus on proving Theorem 1.

Suppose the max degree of $G$ is bounded by $\Delta$

- pick vertex $v \in V$ uniformly at random
- pick $i \in \{1, 2, \ldots, \Delta\}$ at random
- if $u$ is $i$th neighbor of $v$, return $(v, u)$
- otherwise, repeat 1–3

Notes:

- each $e$ is returned with probability $2 \big{/} \Delta\, n$
- success probability is $2 m \big{/} \Delta\,n$
- worst-case (expected) number of queries is $\Theta(n^2 / m)$

- Partition vertices (and edges) into sets of
*heavy*and*light*elements according to degree. - Sample edges from light vertices by using simple (but inefficient) protocol, $\Delta = \theta$
- To sample edges from heavy vertices:
- pick a random edge $(u, v)$ where $u$ is light,
- if $v$ is heavy, pick a random neighbor $w$ of $v$,
- return $(v, w)$.
- Invoke subroutines for sampling light and heavy edges with equal probability. Repeat until an edge is returned.

replace edges with directed edges

Replace each edge $\{u, v\} \in E$ with a pair of directed edges $(u, v), (v, u)$

Set threshold $\theta = \sqrt{m / \varepsilon}$:

- $u$ is
*light*if $d(u) \leq \theta$ and*heavy*otherwise - $(u, v)$ is light (resp. heavy) if $u$ is light (resp. heavy)
- $E_L$ (resp. $E_H$) is the set of light (resp. heavy) edges

**Threshold Lemma.** If $u$ is heavy, then at most an $\varepsilon$—fraction of $u$'s neighbors are heavy.

**Proof.** There can be at most $\sqrt{\varepsilon m} = \varepsilon \theta$ nodes with degree at least $\sqrt{m / \varepsilon}$. $\Box$

$\implies$ at least $(1 - \varepsilon) d(u)$ of $u$'s neighbors are light.

**SLE algorithm**

- pick $u \in V$ uniformly at random
- if $u$ is light, pick $i \in [\theta]$ uniformly (else, fail)
- if $v$ is $u$'s $i$-th neighbor, return $(u, v)$ (else, fail)

**SLE Lemma.** SLE succeeds (i.e., returns an edge) with probability $|E_L| \big/ n \theta$. Further, for every light edge $e$, we have
$$
\Pr(\text{SLE returns } e) = \frac{1}{n \theta}
$$

**SHE algorithm**

- use SLE to sample edge $(u, v)$ (or fail)
- if $v$ is heavy, pick $i \in [d(v)]$ uniformly
- return $(v, w)$ where $w$ is $v$'s $i$th neighbor

**SHE Lemma.** SHE succeeds with probability $p$ satsifying
$$
(1 - \varepsilon) \frac{|E_H|}{n \theta} \leq p \leq \frac{|E_H|}{n \theta}.
$$
Further, for every heavy edge $e$ we have
$$
(1 - \varepsilon) \frac{1}{n \theta} \leq \Pr(\text{SHE returns } e) \leq \frac{1}{n \theta}.
$$

Suppose $v$ is a heavy vertex, $d_L(v)$ the number of its light neighbors.

By Threshold Lemma, $d_L(v) \geq (1 - \varepsilon) d(v)$

By SLE Lemma the probability that some $(u, v)$ is returned in Step 1 of SHE is $$ \frac{d_L(v)}{n \theta} \geq (1 - \varepsilon) \frac{d(v)}{n \theta}, $$ in which case SHE succeeds.

The probability that SHE returns $(v, w)$ is then $$ \frac{d_L(v)}{n \theta} \cdot \frac{1}{d(v)} \geq (1 - \varepsilon) \frac{1}{n \theta}. \qquad\qquad \Box $$

**Sample Edge (SE)** Repeat until an edge $e$ is returned

- with probability $1/2$ invoke SLE
- otherwise invoke SHE

Analysis follows from SLE/SHE lemmas:

- probability that SE succeeds is at least $$ \frac{1}{2} \cdot \frac{|E_L|}{n\theta} + \frac{1}{2} \cdot (1 - \varepsilon) \frac{|E_H|}{n \theta} = \Theta\left(\frac{m}{n \theta}\right) $$
- the expected number of queries until success is $O(n \theta / m) = O(n \big/ \sqrt{\varepsilon m})$
- each edge is returned with roughly equal probability

- If $m$ is not known in advance, it can be estimated using $O(n \big / \sqrt{m})$ queries [Feige 2005, Goldreich & Ron 2008]
- Optimality of SE algorithm (Theorem 2)
- start with any graph on $n$ vertices, $m$ edges
- add a clique with $m$ edges ($\Theta(\sqrt{m})$ nodes)
- clique edge must be returned with probability $1/2 - \varepsilon$, but the probability of "hitting" the clique is $\Theta(\sqrt{m} / n)$

- Lower bound proof can be formalized simply using communication complexity [Eden & Rosenbaum 2018]