DiffUHaul: A Training-Free Method for Object Dragging in Images

Omri Avrahami, Rinon Gal, Gal Chechik, Ohad Fried, Dani Lischinski, Arash Vahdat, Weili Nie

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Text-to-image diffusion models have proven effective for solving many image editing tasks. However, the seemingly straightforward task of seamlessly relocating objects within a scene remains surprisingly challenging. Existing methods addressing this problem often struggle to function reliably in real-world scenarios due to lacking spatial reasoning. In this work, we propose a training-free method, dubbed DiffUHaul, that harnesses the spatial understanding of a localized text-to-image model, for the object dragging task. Blindly manipulating layout inputs of the localized model tends to cause low editing performance due to the intrinsic entanglement of object representation in the model. To this end, we first apply attention masking in each denoising step to make the generation more disentangled across different objects and adopt the self-attention sharing mechanism to preserve the high-level object appearance. Furthermore, we propose a new diffusion anchoring technique: in the early denoising steps, we interpolate the attention features between source and target images to smoothly fuse new layouts with the original appearance; in the later denoising steps, we pass the localized features from the source images to the interpolated images to retain fine-grained object details. To adapt DiffUHaul to real-image editing, we apply a DDPM self-attention bucketing that can better reconstruct real images with the localized model. Finally, we introduce an automated evaluation pipeline for this task and showcase the efficacy of our method. Our results are reinforced through a user preference study.

Original languageEnglish
Title of host publicationProceedings - SIGGRAPH Asia 2024 Conference Papers, SA 2024
EditorsStephen N. Spencer
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9798400711312
DOIs
StatePublished - 3 Dec 2024
Event2024 SIGGRAPH Asia 2024 Conference Papers, SA 2024 - Tokyo, Japan
Duration: 3 Dec 20246 Dec 2024

Publication series

NameProceedings - SIGGRAPH Asia 2024 Conference Papers, SA 2024

Conference

Conference2024 SIGGRAPH Asia 2024 Conference Papers, SA 2024
Country/TerritoryJapan
CityTokyo
Period3/12/246/12/24

Bibliographical note

Publisher Copyright:
© 2024 Copyright held by the owner/author(s).

Keywords

  • Image Editing
  • Object Draggining

Fingerprint

Dive into the research topics of 'DiffUHaul: A Training-Free Method for Object Dragging in Images'. Together they form a unique fingerprint.

Cite this