The Chosen One: Consistent Characters in Text-to-Image Diffusion Models

Omri Avrahami, Amir Hertz, Yael Vinker, Moab Arar, Shlomi Fruchter, Ohad Fried, Daniel Cohen-Or, Dani Lischinski

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Recent advances in text-to-image generation models have unlocked vast potential for visual creativity. However, the users that use these models struggle with the generation of consistent characters, a crucial aspect for numerous real-world applications such as story visualization, game development, asset design, advertising, and more. Current methods typically rely on multiple pre-existing images of the target character or involve labor-intensive manual processes. In this work, we propose a fully automated solution for consistent character generation, with the sole input being a text prompt. We introduce an iterative procedure that, at each stage, identifies a coherent set of images sharing a similar identity and extracts a more consistent identity from this set. Our quantitative analysis demonstrates that our method strikes a better balance between prompt alignment and identity consistency compared to the baseline methods, and these findings are reinforced by a user study. To conclude, we showcase several practical applications of our approach.

Original languageEnglish
Title of host publicationProceedings - SIGGRAPH 2024 Conference Papers
EditorsStephen N. Spencer
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9798400705250
DOIs
StatePublished - 13 Jul 2024
EventSIGGRAPH 2024 Conference Papers - Denver, United States
Duration: 28 Jul 20241 Aug 2024

Publication series

NameProceedings - SIGGRAPH 2024 Conference Papers

Conference

ConferenceSIGGRAPH 2024 Conference Papers
Country/TerritoryUnited States
CityDenver
Period28/07/241/08/24

Bibliographical note

Publisher Copyright:
© 2024 Owner/Author.

Keywords

  • Consistent characters generation

Fingerprint

Dive into the research topics of 'The Chosen One: Consistent Characters in Text-to-Image Diffusion Models'. Together they form a unique fingerprint.

Cite this