Identification of Mixtures of Discrete Product Distributions in Near-Optimal Sample and Time Complexity

Spencer L. Gordon, Erik Jahn, Bijan Mazaheri, Yuval Rabani, Leonard J. Schulman

Research output: Contribution to journalConference articlepeer-review

Abstract

We consider the problem of identifying, from statistics, a distribution of discrete random variables X1, ..., Xn that is a mixture of k product distributions. The best previous sample complexity for n ∈ O(k) was (1/ζ)O(k2 logk) (under a mild separation assumption parameterized by ζ). The best known lower bound was exp(Ω(k)). It is known that n ≥ 2k - 1 is necessary and sufficient for identification. We show, for any n ≥ 2k - 1, how to achieve sample complexity and run-time complexity (1/ζ)O(k). We also extend the known lower bound of eΩ(k) to match our upper bound across a broad range of ζ. Our results are obtained by combining (a) a classic method for robust tensor decomposition, (b) a novel way of bounding the condition number of key matrices called Hadamard extensions, by studying their action only on flattened rank-1 tensors.

Original languageEnglish
Pages (from-to)2071-2091
Number of pages21
JournalProceedings of Machine Learning Research
Volume247
StatePublished - 2024
Event37th Annual Conference on Learning Theory, COLT 2024 - Edmonton, Canada
Duration: 30 Jun 20243 Jul 2024

Bibliographical note

Publisher Copyright:
© 2024 S.L. Gordon, E. Jahn, B. Mazaheri, Y. Rabani & L.J. Schulman.

Fingerprint

Dive into the research topics of 'Identification of Mixtures of Discrete Product Distributions in Near-Optimal Sample and Time Complexity'. Together they form a unique fingerprint.

Cite this