TY - GEN
T1 - Sketching in adversarial environments
AU - Mironov, Ilya
AU - Naor, Moni
AU - Segev, Gil
PY - 2008
Y1 - 2008
N2 - We formalize a realistic model for computations over massive data sets. The model, referred to as the adversarial sketch model, unifies the well-studied sketch and data stream models together with a cryptographic flavor that considers the execution of protocols in "hostile environments", and provides a framework for studying the complexity of many tasks involving massive data sets. The adversarial sketch model consists of several participating parties: honest parties, whose goal is to compute a pre-determined function of their inputs, and an adversarial party. Computation in this model proceeds in two phases, in the first phase, the adversarial party chooses the inputs of the honest parties. These inputs are sets of elements taken from a large universe, and provided to the honest parties in an on-line manner in the form of a sequence of insert and delete operations. Once an operation horn the sequence has been processed it is discarded and cannot be retrieved unless explicitly stored. During this phase the honest parties are not allowed to communicate. Moreover, they do not share any secret information and any public information they share is known to the adversary in advance, in the second phase, the honest parties engage in a protocol in order to compute a pre-determined function of their inputs. In this paper we settle the complexity (up to logarithmic factors) of two fundamental problems in this model: testing whether two massive data sets are equal, and approximating the size of their symmetric difference. We construct explicit and efficient protocols with sublinear sketches of essentially optimal size, poly-logarithmic update time during the first phase, and poly-logarithmic communication and computation during the second phase. Our main technical contribution is an explicit and deterministic encoding scheme that enjoys two seemingly conflicting properties: incrementality and high distance, which may be of independent interest.
AB - We formalize a realistic model for computations over massive data sets. The model, referred to as the adversarial sketch model, unifies the well-studied sketch and data stream models together with a cryptographic flavor that considers the execution of protocols in "hostile environments", and provides a framework for studying the complexity of many tasks involving massive data sets. The adversarial sketch model consists of several participating parties: honest parties, whose goal is to compute a pre-determined function of their inputs, and an adversarial party. Computation in this model proceeds in two phases, in the first phase, the adversarial party chooses the inputs of the honest parties. These inputs are sets of elements taken from a large universe, and provided to the honest parties in an on-line manner in the form of a sequence of insert and delete operations. Once an operation horn the sequence has been processed it is discarded and cannot be retrieved unless explicitly stored. During this phase the honest parties are not allowed to communicate. Moreover, they do not share any secret information and any public information they share is known to the adversary in advance, in the second phase, the honest parties engage in a protocol in order to compute a pre-determined function of their inputs. In this paper we settle the complexity (up to logarithmic factors) of two fundamental problems in this model: testing whether two massive data sets are equal, and approximating the size of their symmetric difference. We construct explicit and efficient protocols with sublinear sketches of essentially optimal size, poly-logarithmic update time during the first phase, and poly-logarithmic communication and computation during the second phase. Our main technical contribution is an explicit and deterministic encoding scheme that enjoys two seemingly conflicting properties: incrementality and high distance, which may be of independent interest.
KW - Data stream model
KW - Massive data sets
KW - Sketch model
UR - http://www.scopus.com/inward/record.url?scp=57049113296&partnerID=8YFLogxK
U2 - 10.1145/1374376.1374471
DO - 10.1145/1374376.1374471
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:57049113296
SN - 9781605580470
T3 - Proceedings of the Annual ACM Symposium on Theory of Computing
SP - 651
EP - 660
BT - STOC'08
PB - Association for Computing Machinery (ACM)
T2 - 40th Annual ACM Symposium on Theory of Computing, STOC 2008
Y2 - 17 May 2008 through 20 May 2008
ER -