Randomized near-neighbor graphs, giant components and applications in data science

Ariel Jaffe*, Yuval Kluger, George C. Linderman, Gal Mishne, Stefan Steinerberger

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

4 Scopus citations

Abstract

If we pick n random points uniformly in and connect each point to its nearest neighbors, where is the dimension and is a constant depending on the dimension, then it is well known that the graph is connected with high probability. We prove that it suffices to connect every point to points chosen randomly among its nearest neighbors to ensure a giant component of size with high probability. This construction yields a much sparser random graph with instead of edges that has comparable connectivity properties. This result has non-trivial implications for problems in data science where an affinity matrix is constructed: instead of connecting each point to its k nearest neighbors, one can often pick random points out of the k nearest neighbors and only connect to those without sacrificing quality of results. This approach can simplify and accelerate computation; we illustrate this with experimental results in spectral clustering of large-scale datasets.

Original languageAmerican English
Pages (from-to)458-476
Number of pages19
JournalJournal of Applied Probability
Volume57
Issue number2
DOIs
StatePublished - 1 Jun 2020
Externally publishedYes

Bibliographical note

Publisher Copyright:
©

Keywords

  • Keywords: k-nn graph
  • connectivity
  • random graph
  • sparsification

Fingerprint

Dive into the research topics of 'Randomized near-neighbor graphs, giant components and applications in data science'. Together they form a unique fingerprint.

Cite this