TY - JOUR
T1 - Sketching in adversarial environments
AU - Mironov, Ilya
AU - Naor, Moni
AU - Segev, Gil
PY - 2011
Y1 - 2011
N2 - We formalize a realistic model for computations over massive data sets. The model, referred to as the adversarial sketch model, unifies the well-studied sketch and data stream models together with a cryptographic flavor that considers the execution of protocols in "hostile environments," and provides a framework for studying the complexity of tasks involving massive data sets. In the adversarial sketch model several parties are interested in computing a joint function in the presence of an adversary that dynamically chooses their inputs. These inputs are provided to the parties in an on-line manner, and each party incrementally updates a compressed sketch of its input. The parties are not allowed to communicate, they do not share any secret information, and any public information they share is known to the adversary in advance. Then, the parties engage in a protocol in order to evaluate the function on their current inputs using only their sketches. In this paper we settle the complexity of two fundamental problems in this model: testing whether two massive data sets are equal, and approximating the size of their symmetric difference. For these problems we construct explicit protocols that are optimal up to polylogarithmic factors. Our main technical contribution is an explicit and deterministic encoding scheme that enjoys two seemingly conflicting properties: incrementality and high distance, which may be of independent interest.
AB - We formalize a realistic model for computations over massive data sets. The model, referred to as the adversarial sketch model, unifies the well-studied sketch and data stream models together with a cryptographic flavor that considers the execution of protocols in "hostile environments," and provides a framework for studying the complexity of tasks involving massive data sets. In the adversarial sketch model several parties are interested in computing a joint function in the presence of an adversary that dynamically chooses their inputs. These inputs are provided to the parties in an on-line manner, and each party incrementally updates a compressed sketch of its input. The parties are not allowed to communicate, they do not share any secret information, and any public information they share is known to the adversary in advance. Then, the parties engage in a protocol in order to evaluate the function on their current inputs using only their sketches. In this paper we settle the complexity of two fundamental problems in this model: testing whether two massive data sets are equal, and approximating the size of their symmetric difference. For these problems we construct explicit protocols that are optimal up to polylogarithmic factors. Our main technical contribution is an explicit and deterministic encoding scheme that enjoys two seemingly conflicting properties: incrementality and high distance, which may be of independent interest.
KW - Communication complexity
KW - Data stream model
KW - Sketch model
UR - http://www.scopus.com/inward/record.url?scp=84855568491&partnerID=8YFLogxK
U2 - 10.1137/080733772
DO - 10.1137/080733772
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:84855568491
SN - 0097-5397
VL - 40
SP - 1845
EP - 1870
JO - SIAM Journal on Computing
JF - SIAM Journal on Computing
IS - 6
ER -