TY - JOUR

T1 - Balls and bins

T2 - Smaller hash families and faster evaluation

AU - Celis, L. Elisa

AU - Reingold, Omer

AU - Segev, Gil

AU - Wieder, Udi

PY - 2013

Y1 - 2013

N2 - A fundamental fact in the analysis of randomized algorithms is that when n balls are hashed into n bins independently and uniformly at random, with high probability each bin contains at most O(log n/log log n) balls. In various applications, however, the assumption that a truly random hash function is available is not always valid, and explicit functions are required. In this paper we study the size of families (or, equivalently, the description length of their functions) that guarantee a maximal load of O(log n/log log n) with high probability, as well as the evaluation time of their functions. Whereas such functions must be described using ω(log n) bits, the best upper bound was formerly O(log2 n/log log n) bits, which is attained by O(log n/log log n)-wise independent functions. Traditional constructions of O(log n/log log n)-wise independent functions offer an evaluation time of O(log n/log log n), which according to Siegel's lower bound [A. Siegel, Proceedings of the 30th Annual IEEE Symposium on Foundations of Computer Science, 1989, pp. 20.25] can be reduced only at the cost of significantly increasing the description length. We construct two families that guarantee a maximal load of O(log n/log log n) with high probability. Our constructions are based on two different approaches and exhibit different trade-offs between the description length and the evaluation time. The first construction shows that O(log n/log log n)-wise independence can in fact be replaced by "gradually increasing independence," resulting in functions that are described using O(log n log log n) bits and evaluated in time O(log n log log n). The second construction is based on derandomization techniques for space-bounded computations combined with a tailored construction of a pseudorandom generator, resulting in functions that are described using O(log3/2 n) bits and evaluated in time O(√ log n). Our second construction can be compared to Siegel's lower bound stating that O(log n/log log n)-wise independent functions that are evaluated in time O(√ log n) must be described using ω(2√ log n) bits.

AB - A fundamental fact in the analysis of randomized algorithms is that when n balls are hashed into n bins independently and uniformly at random, with high probability each bin contains at most O(log n/log log n) balls. In various applications, however, the assumption that a truly random hash function is available is not always valid, and explicit functions are required. In this paper we study the size of families (or, equivalently, the description length of their functions) that guarantee a maximal load of O(log n/log log n) with high probability, as well as the evaluation time of their functions. Whereas such functions must be described using ω(log n) bits, the best upper bound was formerly O(log2 n/log log n) bits, which is attained by O(log n/log log n)-wise independent functions. Traditional constructions of O(log n/log log n)-wise independent functions offer an evaluation time of O(log n/log log n), which according to Siegel's lower bound [A. Siegel, Proceedings of the 30th Annual IEEE Symposium on Foundations of Computer Science, 1989, pp. 20.25] can be reduced only at the cost of significantly increasing the description length. We construct two families that guarantee a maximal load of O(log n/log log n) with high probability. Our constructions are based on two different approaches and exhibit different trade-offs between the description length and the evaluation time. The first construction shows that O(log n/log log n)-wise independence can in fact be replaced by "gradually increasing independence," resulting in functions that are described using O(log n log log n) bits and evaluated in time O(log n log log n). The second construction is based on derandomization techniques for space-bounded computations combined with a tailored construction of a pseudorandom generator, resulting in functions that are described using O(log3/2 n) bits and evaluated in time O(√ log n). Our second construction can be compared to Siegel's lower bound stating that O(log n/log log n)-wise independent functions that are evaluated in time O(√ log n) must be described using ω(2√ log n) bits.

KW - Balls and bins

KW - Hash functions

KW - Pseudorandom generators

UR - http://www.scopus.com/inward/record.url?scp=84882977697&partnerID=8YFLogxK

U2 - 10.1137/120871626

DO - 10.1137/120871626

M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???

AN - SCOPUS:84882977697

SN - 0097-5397

VL - 42

SP - 1030

EP - 1050

JO - SIAM Journal on Computing

JF - SIAM Journal on Computing

IS - 3

ER -