Abstract
We deal with the issue of risk estimation in a sample frequency table to be released by an agency. Risk arises from non-empty sample cells which represent small population cells and from population uniques in particular. Therefore risk estimation requires assessing which of the relevant population cells are indeed small. Various methods have been proposed for this task, and we present a new method in which estimation of a population cell frequency is based on smoothing using a local neighborhood of this cell, that is, cells having similar or close values in all attributes. The statistical model we use is a generalized Negative Binomial model which subsumes the Poisson and Negative Binomial models. We provide some preliminary results and experiments with this method. Comparisons of the new approach are made to a method based on Poisson regression log-linear hierarchical model, in which inference on a given cell is based on classical models of contingency tables. Such models connect each cell to a ‘neighborhood’ of cells with one or several common attributes, but some other attributes may differ significantly. We also compare to the Argus Negative Binomial method in which inference on a given cell is based only on sampling weights, without learning from any type of ‘neighborhood’ of the given cell and without making use of the structure of the table.
Original language | English |
---|---|
Title of host publication | Privacy in Statistical Databases - CENEX-SDC Project International Conference, PSD 2006, Proceedings |
Editors | Josep Domingo-Ferrer, Luisa Franconi |
Publisher | Springer Verlag |
Pages | 82-93 |
Number of pages | 12 |
ISBN (Print) | 9783540493303 |
DOIs | |
State | Published - 2006 |
Event | CENEX-SDC Project of International Conference on Privacy in Statistical Databases, PSD2006 - Rome, Italy Duration: 13 Dec 2006 → 15 Dec 2006 |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 4302 |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | CENEX-SDC Project of International Conference on Privacy in Statistical Databases, PSD2006 |
---|---|
Country/Territory | Italy |
City | Rome |
Period | 13/12/06 → 15/12/06 |
Bibliographical note
Publisher Copyright:© Springer-Verlag Berlin Heidelberg 2006.