TY - GEN
T1 - How to approximate a set without knowing its size in advance
AU - Pagh, Rasmus
AU - Segev, Gil
AU - Wieder, Udi
PY - 2013
Y1 - 2013
N2 - The dynamic approximate membership problem asks to represent a set S of size n, whose elements are provided in an on-line fashion, supporting membership queries without false negatives and with a false positive rate at most ε. That is, the membership algorithm must be correct on each x ∈ S, and may err with probability at most ε on each x /∈ S. We study a well-motivated, yet insufficiently explored, variant of this problem where the size n of the set is not known in advance. Existing optimal approximate membership data structures require that the size is known in advance, but in many practical scenarios this is not a realistic assumption. Moreover, even if the eventual size n of the set is known in advance, it is desirable to have the smallest possible space usage also when the current number of inserted elements is smaller than n. Our contribution consists of the following results: • We show a super-linear gap between the space complexity when the size is known in advance and the space complexity when the size is not known in advance. When the size is known in advance, it is well-known that Θ(n log(1/ε)) bits of space are necessary and sufficient (Bloom '70, Carter et al. '78). However, when the size is not known in advance, we prove that at least (1-o(1))n log(1/ε)+Ω(n log log n) bits of space must be used. In particular, the average number of bits per element must depend on the size of the set. • We show that our space lower bound is tight, and can even be matched by a highly efficient data structure. We present a data structure that uses (1+o(1))n log(1/ε)+O(n log log n) bits of space for approximating any set of any size n, without having to know n in advance. Our data structure supports membership queries in constant time in the worst case with high probability, and supports insertions in expected amortized constant time. Moreover, it can be "de-amortized" to support also insertions in constant time in the worst case with high probability by only increasing its space usage to O(n log(1/ε) + n log log n) bits.
AB - The dynamic approximate membership problem asks to represent a set S of size n, whose elements are provided in an on-line fashion, supporting membership queries without false negatives and with a false positive rate at most ε. That is, the membership algorithm must be correct on each x ∈ S, and may err with probability at most ε on each x /∈ S. We study a well-motivated, yet insufficiently explored, variant of this problem where the size n of the set is not known in advance. Existing optimal approximate membership data structures require that the size is known in advance, but in many practical scenarios this is not a realistic assumption. Moreover, even if the eventual size n of the set is known in advance, it is desirable to have the smallest possible space usage also when the current number of inserted elements is smaller than n. Our contribution consists of the following results: • We show a super-linear gap between the space complexity when the size is known in advance and the space complexity when the size is not known in advance. When the size is known in advance, it is well-known that Θ(n log(1/ε)) bits of space are necessary and sufficient (Bloom '70, Carter et al. '78). However, when the size is not known in advance, we prove that at least (1-o(1))n log(1/ε)+Ω(n log log n) bits of space must be used. In particular, the average number of bits per element must depend on the size of the set. • We show that our space lower bound is tight, and can even be matched by a highly efficient data structure. We present a data structure that uses (1+o(1))n log(1/ε)+O(n log log n) bits of space for approximating any set of any size n, without having to know n in advance. Our data structure supports membership queries in constant time in the worst case with high probability, and supports insertions in expected amortized constant time. Moreover, it can be "de-amortized" to support also insertions in constant time in the worst case with high probability by only increasing its space usage to O(n log(1/ε) + n log log n) bits.
UR - http://www.scopus.com/inward/record.url?scp=84893441780&partnerID=8YFLogxK
U2 - 10.1109/FOCS.2013.17
DO - 10.1109/FOCS.2013.17
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:84893441780
SN - 9780769551357
T3 - Proceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS
SP - 80
EP - 89
BT - Proceedings - 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, FOCS 2013
T2 - 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, FOCS 2013
Y2 - 27 October 2013 through 29 October 2013
ER -