Abstract
We give an algorithm for source identification of a mixture of k product distributions on n bits. This is a fundamental problem in machine learning with many applications. Our algorithm identifies the source parameters of an identifiable mixture, given, as input, approximate values of multilinear moments (derived, for instance, from a sufficiently large sample), using 2O(k2)nO(k) arithmetic operations. Our result is the first explicit bound on the computational complexity of source identification of such mixtures. The running time improves previous results by Feldman, O’Donnell, and Servedio (FOCS 2005) and Chen and Moitra (STOC 2019) that guaranteed only learning the mixture (without parametric identification of the source). Our analysis gives a quantitative version of a qualitative characterization of identifiable sources that is due to Tahmasebi, Motahari, and Maddah-Ali (ISIT 2018).
Original language | English |
---|---|
Pages (from-to) | 2193-2216 |
Number of pages | 24 |
Journal | Proceedings of Machine Learning Research |
Volume | 134 |
State | Published - 2021 |
Event | 34th Conference on Learning Theory, COLT 2021 - Boulder, United States Duration: 15 Aug 2021 → 19 Aug 2021 |
Bibliographical note
Funding Information:Research supported in part by NSFC-ISF grant 2553-17, NSF-BSF grant 2018687, and NSF grants CCF-1618795 and 1909972. Part of this work was done while the third author visited Caltech. Thanks to anonymous reviewers for helpful comments.
Publisher Copyright:
© 2021 S.L. Gordon, B. Mazaheri, Y. Rabani & L.J. Schulman.