Abstract
Finding a subset of representative items from a large set of data items has been studied extensively, under a variety of conditions and constraints. In our setting, data items belong to a metric space and also have a sensitive attribute (e.g., gender, race). Our focus is on effectively choosing a set of representatives while taking into consideration two distinct notions of fairness. First, each data item in the dataset should be similar to a representative (while precisely how similar depends on data distributions). Second, representatives should satisfy a given social equity constraint specifying the number of representatives with each attribute value. To satisfy these two fairness requirements, we build upon previous results in fair facility location, extending this work to allow for social equity constraints. Our extension is parameterized by requirements on the neighborhood of data items, and we show lower and upper bounds for an optimal algorithm for some cases, and NP-completeness results for others. We then further extend this work to ensure that representatives should be similar, in their attribute values, to the set of data that they represent. To this end, we develop methods to choose items that are highly representative of their surrounding data items, while still satisfying a social equity constraint. Combining these results yields a method that can be leveraged to choose representative data items while simultaneously meeting several fairness requirements. Experimental results show the quality of our results and demonstrate that, in practice, the cost for social equity (in terms of increased distance to representatives) is low.
| Original language | English |
|---|---|
| Title of host publication | Proceedings - 2025 IEEE 41st International Conference on Data Engineering, ICDE 2025 |
| Publisher | IEEE Computer Society |
| Pages | 1153-1165 |
| Number of pages | 13 |
| ISBN (Electronic) | 9798331536039 |
| DOIs | |
| State | Published - 2025 |
| Event | 41st IEEE International Conference on Data Engineering, ICDE 2025 - Hong Kong, China Duration: 19 May 2025 → 23 May 2025 |
Publication series
| Name | Proceedings - International Conference on Data Engineering |
|---|---|
| ISSN (Print) | 1084-4627 |
| ISSN (Electronic) | 2375-0286 |
Conference
| Conference | 41st IEEE International Conference on Data Engineering, ICDE 2025 |
|---|---|
| Country/Territory | China |
| City | Hong Kong |
| Period | 19/05/25 → 23/05/25 |
Bibliographical note
Publisher Copyright:© 2025 IEEE.
Keywords
- diversity
- facility location
- fairness
- representativeness