Optimal histograms with outliers

Rachel Behar, Sara Cohen

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations


Histograms are a well studied and simple way to summarize data. As such, they are used extensively in a variety of applications that require estimates of data frequency values. Significant previous work has studied the problem of finding optimal histograms with respect to an error measure. In this paper we study the classic problem of finding an optimal histogram for a dataset, with a new twist: The histogram must contain at least n − k of the n data points. The k excluded data points are considered outliers. We consider two notions of excluding data items, by allowing arbitrary items to be excluded, or only removing items while retaining a consistent histogram. Polynomial algorithms are presented for these problems. Significant experimentation demonstrates that our algorithms work well in practice to reduce the histogram error.

Original languageAmerican English
Title of host publicationAdvances in Database Technology - EDBT 2020
Subtitle of host publication23rd International Conference on Extending Database Technology, Proceedings
EditorsAngela Bonifati, Yongluan Zhou, Marcos Antonio Vaz Salles, Alexander Bohm, Dan Olteanu, George Fletcher, Arijit Khan, Bin Yang
Number of pages12
ISBN (Electronic)9783893180837
StatePublished - 2020
Externally publishedYes
Event23rd International Conference on Extending Database Technology, EDBT 2020 - Copenhagen, Denmark
Duration: 30 Mar 20202 Apr 2020

Publication series

NameAdvances in Database Technology - EDBT
ISSN (Electronic)2367-2005


Conference23rd International Conference on Extending Database Technology, EDBT 2020

Bibliographical note

Publisher Copyright:
© 2020 Copyright held by the owner/author(s).


Dive into the research topics of 'Optimal histograms with outliers'. Together they form a unique fingerprint.

Cite this