How to approximate a set without knowing its size in advance

Rasmus Pagh, Gil Segev, Udi Wieder

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

19 Scopus citations

Abstract

The dynamic approximate membership problem asks to represent a set S of size n, whose elements are provided in an on-line fashion, supporting membership queries without false negatives and with a false positive rate at most ε. That is, the membership algorithm must be correct on each x ∈ S, and may err with probability at most ε on each x /∈ S. We study a well-motivated, yet insufficiently explored, variant of this problem where the size n of the set is not known in advance. Existing optimal approximate membership data structures require that the size is known in advance, but in many practical scenarios this is not a realistic assumption. Moreover, even if the eventual size n of the set is known in advance, it is desirable to have the smallest possible space usage also when the current number of inserted elements is smaller than n. Our contribution consists of the following results: • We show a super-linear gap between the space complexity when the size is known in advance and the space complexity when the size is not known in advance. When the size is known in advance, it is well-known that Θ(n log(1/ε)) bits of space are necessary and sufficient (Bloom '70, Carter et al. '78). However, when the size is not known in advance, we prove that at least (1-o(1))n log(1/ε)+Ω(n log log n) bits of space must be used. In particular, the average number of bits per element must depend on the size of the set. • We show that our space lower bound is tight, and can even be matched by a highly efficient data structure. We present a data structure that uses (1+o(1))n log(1/ε)+O(n log log n) bits of space for approximating any set of any size n, without having to know n in advance. Our data structure supports membership queries in constant time in the worst case with high probability, and supports insertions in expected amortized constant time. Moreover, it can be "de-amortized" to support also insertions in constant time in the worst case with high probability by only increasing its space usage to O(n log(1/ε) + n log log n) bits.

Original languageEnglish
Title of host publicationProceedings - 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, FOCS 2013
Pages80-89
Number of pages10
DOIs
StatePublished - 2013
Event2013 IEEE 54th Annual Symposium on Foundations of Computer Science, FOCS 2013 - Berkeley, CA, United States
Duration: 27 Oct 201329 Oct 2013

Publication series

NameProceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS
ISSN (Print)0272-5428

Conference

Conference2013 IEEE 54th Annual Symposium on Foundations of Computer Science, FOCS 2013
Country/TerritoryUnited States
CityBerkeley, CA
Period27/10/1329/10/13

Fingerprint

Dive into the research topics of 'How to approximate a set without knowing its size in advance'. Together they form a unique fingerprint.

Cite this