The Domain Adaptation problem in machine learning occurs when the distribution generating the test data differs from the one that generates the training data. A common approach to this issue is to train a standard learner for the learning task with the available training sample (generated by a distribution that is different from the test distribution). In this work we address this approach, investigating whether there exist successful learning methods for which learning of a target task can be achieved by substituting the standard target-distribution generated sample by a (possibly larger) sample generated by a different distribution without worsening the error guarantee on the learned classifier. We give a positive answer, showing that this is possible when using a Nearest Neighbor algorithm.We show this under the assumptions of covariate shift as well as a bound on the ratio of the probability weights between the source (training) and target (test) distribution. We further show that these assumptions are not always sufficient to allow such a replacement of the training sample: For proper learning, where the output classifier has to come from a predefined class, we prove that any learner needs access to data generated from the target distribution.
|Original language||American English|
|State||Published - 2012|
|Event||International Symposium on Artificial Intelligence and Mathematics, ISAIM 2012 - Fort Lauderdale, FL, United States|
Duration: 9 Jan 2012 → 11 Jan 2012
|Conference||International Symposium on Artificial Intelligence and Mathematics, ISAIM 2012|
|City||Fort Lauderdale, FL|
|Period||9/01/12 → 11/01/12|