Abstract
We consider the problem of identifying, from statistics, a distribution of discrete random variables X1, ..., Xn that is a mixture of k product distributions. The best previous sample complexity for n ∈ O(k) was (1/ζ)O(k2 logk) (under a mild separation assumption parameterized by ζ). The best known lower bound was exp(Ω(k)). It is known that n ≥ 2k - 1 is necessary and sufficient for identification. We show, for any n ≥ 2k - 1, how to achieve sample complexity and run-time complexity (1/ζ)O(k). We also extend the known lower bound of eΩ(k) to match our upper bound across a broad range of ζ. Our results are obtained by combining (a) a classic method for robust tensor decomposition, (b) a novel way of bounding the condition number of key matrices called Hadamard extensions, by studying their action only on flattened rank-1 tensors.
Original language | English |
---|---|
Pages (from-to) | 2071-2091 |
Number of pages | 21 |
Journal | Proceedings of Machine Learning Research |
Volume | 247 |
State | Published - 2024 |
Event | 37th Annual Conference on Learning Theory, COLT 2024 - Edmonton, Canada Duration: 30 Jun 2024 → 3 Jul 2024 |
Bibliographical note
Publisher Copyright:© 2024 S.L. Gordon, E. Jahn, B. Mazaheri, Y. Rabani & L.J. Schulman.