Abstract
In this paper, we perform an in-depth study of the properties and applications of aligned generative models.
We refer to two models as aligned if they share the same architecture, and one of them (the child) is obtained from the other (the parent) via fine-tuning to another domain, a common practice in transfer learning. Several works already utilize some basic properties of aligned StyleGAN models to perform image-to-image translation. Here, we perform the first detailed exploration of model alignment, also focusing on StyleGAN. First, we empirically analyze aligned models and provide answers to important questions regarding their nature. In particular, we find that the child model's latent spaces are semantically aligned with those of the parent, inheriting incredibly rich semantics, even for distant data domains such as human faces and churches. Second, equipped with this better understanding, we leverage aligned models to solve a diverse set of tasks. In addition to image translation, we demonstrate fully automatic cross-domain image morphing. We further show that zero-shot vision tasks may be performed in the child domain, while relying exclusively on supervision in the parent domain. We demonstrate qualitatively and quantitatively that our approach yields state-of-the-art results, while requiring only simple fine-tuning and inversion.
We refer to two models as aligned if they share the same architecture, and one of them (the child) is obtained from the other (the parent) via fine-tuning to another domain, a common practice in transfer learning. Several works already utilize some basic properties of aligned StyleGAN models to perform image-to-image translation. Here, we perform the first detailed exploration of model alignment, also focusing on StyleGAN. First, we empirically analyze aligned models and provide answers to important questions regarding their nature. In particular, we find that the child model's latent spaces are semantically aligned with those of the parent, inheriting incredibly rich semantics, even for distant data domains such as human faces and churches. Second, equipped with this better understanding, we leverage aligned models to solve a diverse set of tasks. In addition to image translation, we demonstrate fully automatic cross-domain image morphing. We further show that zero-shot vision tasks may be performed in the child domain, while relying exclusively on supervision in the parent domain. We demonstrate qualitatively and quantitatively that our approach yields state-of-the-art results, while requiring only simple fine-tuning and inversion.
Original language | English |
---|---|
Title of host publication | ICLR 2022 |
Subtitle of host publication | 10th International Conference on Learning Representations |
Publisher | OpenReview |
Number of pages | 44 |
State | Published - 2022 |
Event | 10th International Conference on Learning Representations, ICLR 2022 - Virtual Duration: 25 Apr 2022 → … Conference number: 10 https://openreview.net/group?id=ICLR.cc/2022/Conference |
Conference
Conference | 10th International Conference on Learning Representations, ICLR 2022 |
---|---|
Abbreviated title | ICLR 2022 |
Period | 25/04/22 → … |
Internet address |
Keywords
- StyleGAN
- Transfer learning
- Model alignment
- Image-to-image translation
- image morphing