On Sampling Edges Almost Uniformly

Talya Eden and Will Rosenbaum

Tel Aviv University

Introduction

Suppose $G = (V, E)$ is a very large graph

Access to $G$ is provided via queries:

  • sample a (uniformly random) vertex $v \in V$
  • degree query: return the degree of vertex, $d(v)$
  • neighbor query: return the $i$-th neighbor of $v$
  • pair query: return whether or not $(u, v) \in E$

Main Question: How many such queries are necessary and sufficient to sample an edge $e \in E$ from a(n almost) uniform distribution?

Motivation

Query Access Model(s)

  • introduced in the context of property testing
  • standard for sublinear (time) graph algorithms

Random Edge Samples

  • used in many sublinear graph algorithms (testing bipartiteness, counting triangles & cliques, estimating moments of the degree distribution)
  • used as a basic query by Aliakbarpour et al. (2016) to estimate number of star subgraphs

Main Results

Theorem 1. There exists an algorithm $\mathcal{A}$ that given query access to any graph $G = (V, E)$ with $n = |V|$ and $m = |E|$ returns a random edge $e \in E$ such that:

  1. each edge is returned with probability $(1 \pm \varepsilon) / m$
  2. $\mathcal{A}$ uses $O{\big(}n\, {\big /} \sqrt{\varepsilon m}\big)$ degree and neighbor queries in expectation.

Thoerem 2. Any algorithm that samples edges from an almost-uniform distribution requires $\Omega\big(n\, \big / \sqrt{m}\big)$ degree, neighbor, or pair queries.

Note. In this talk, we'll focus on proving Theorem 1.

A simple (but inefficient) protocol

Suppose the max degree of $G$ is bounded by $\Delta$

  1. pick vertex $v \in V$ uniformly at random
  2. pick $i \in \{1, 2, \ldots, \Delta\}$ at random
  3. if $u$ is $i$th neighbor of $v$, return $(v, u)$
  4. otherwise, repeat 1–3

Notes:

  • each $e$ is returned with probability $2 \big{/} \Delta\, n$
  • success probability is $2 m \big{/} \Delta\,n$
  • worst-case (expected) number of queries is $\Theta(n^2 / m)$

Algorithm Idea

  • Partition vertices (and edges) into sets of heavy and light elements according to degree.
  • Sample edges from light vertices by using simple (but inefficient) protocol, $\Delta = \theta$
  • To sample edges from heavy vertices:
    1. pick a random edge $(u, v)$ where $u$ is light,
    2. if $v$ is heavy, pick a random neighbor $w$ of $v$,
    3. return $(v, w)$.
  • Invoke subroutines for sampling light and heavy edges with equal probability. Repeat until an edge is returned.

An Example Graph

demo A graph demo!

replace edges with directed edges

Formalities

Replace each edge $\{u, v\} \in E$ with a pair of directed edges $(u, v), (v, u)$

Set threshold $\theta = \sqrt{m / \varepsilon}$:

  • $u$ is light if $d(u) \leq \theta$ and heavy otherwise
  • $(u, v)$ is light (resp. heavy) if $u$ is light (resp. heavy)
  • $E_L$ (resp. $E_H$) is the set of light (resp. heavy) edges

Threshold Lemma. If $u$ is heavy, then at most an $\varepsilon$—fraction of $u$'s neighbors are heavy.

Proof. There can be at most $\sqrt{\varepsilon m} = \varepsilon \theta$ nodes with degree at least $\sqrt{m / \varepsilon}$. $\Box$

$\implies$ at least $(1 - \varepsilon) d(u)$ of $u$'s neighbors are light.

Sample Light Edge (SLE)

SLE algorithm

  1. pick $u \in V$ uniformly at random
  2. if $u$ is light, pick $i \in [\theta]$ uniformly (else, fail)
  3. if $v$ is $u$'s $i$-th neighbor, return $(u, v)$ (else, fail)

SLE Lemma. SLE succeeds (i.e., returns an edge) with probability $|E_L| \big/ n \theta$. Further, for every light edge $e$, we have $$ \Pr(\text{SLE returns } e) = \frac{1}{n \theta} $$

Sample Heavy Edge (SHE)

SHE algorithm

  1. use SLE to sample edge $(u, v)$ (or fail)
  2. if $v$ is heavy, pick $i \in [d(v)]$ uniformly
  3. return $(v, w)$ where $w$ is $v$'s $i$th neighbor

SHE Lemma. SHE succeeds with probability $p$ satsifying $$ (1 - \varepsilon) \frac{|E_H|}{n \theta} \leq p \leq \frac{|E_H|}{n \theta}. $$ Further, for every heavy edge $e$ we have $$ (1 - \varepsilon) \frac{1}{n \theta} \leq \Pr(\text{SHE returns } e) \leq \frac{1}{n \theta}. $$

Proof of SHE Lemma

Suppose $v$ is a heavy vertex, $d_L(v)$ the number of its light neighbors.

By Threshold Lemma, $d_L(v) \geq (1 - \varepsilon) d(v)$

By SLE Lemma the probability that some $(u, v)$ is returned in Step 1 of SHE is $$ \frac{d_L(v)}{n \theta} \geq (1 - \varepsilon) \frac{d(v)}{n \theta}, $$ in which case SHE succeeds.

The probability that SHE returns $(v, w)$ is then $$ \frac{d_L(v)}{n \theta} \cdot \frac{1}{d(v)} \geq (1 - \varepsilon) \frac{1}{n \theta}. \qquad\qquad \Box $$

Main Algorithm

Sample Edge (SE) Repeat until an edge $e$ is returned

  • with probability $1/2$ invoke SLE
  • otherwise invoke SHE

Analysis follows from SLE/SHE lemmas:

  • probability that SE succeeds is at least $$ \frac{1}{2} \cdot \frac{|E_L|}{n\theta} + \frac{1}{2} \cdot (1 - \varepsilon) \frac{|E_H|}{n \theta} = \Theta\left(\frac{m}{n \theta}\right) $$
  • the expected number of queries until success is $O(n \theta / m) = O(n \big/ \sqrt{\varepsilon m})$
  • each edge is returned with roughly equal probability

Final Remarks

  1. If $m$ is not known in advance, it can be estimated using $O(n \big / \sqrt{m})$ queries [Feige 2005, Goldreich & Ron 2008]
  2. Optimality of SE algorithm (Theorem 2)
    • start with any graph on $n$ vertices, $m$ edges
    • add a clique with $m$ edges ($\Theta(\sqrt{m})$ nodes)
    • clique edge must be returned with probability $1/2 - \varepsilon$, but the probability of "hitting" the clique is $\Theta(\sqrt{m} / n)$
  3. Lower bound proof can be formalized simply using communication complexity [Eden & Rosenbaum 2018]

Thank You!