MeshUp: Multi-Target Mesh Deformation via Blended Score Distillation

1University of Chicago 2Adobe research 3University of Montreal

MeshUp is capable of deforming a source mesh into various concepts and into their weighted blends. The target objectives can be text prompts, images, or even meshes. Users can also input a set of control vertices to explicitly define where on the mesh each concept should be expressed.

Abstract

We propose MeshUp, a technique that deforms a 3D mesh towards multiple target concepts, and intuitively controls the region where each concept is expressed. Conveniently, the concepts can be defined as either text queries, e.g., "a dog" and "a turtle," or inspirational images, and the local regions can be selected as any number of vertices on the mesh. We can effectively control the influence of the concepts and mix them together using a novel score distillation approach, referred to as the Blended Score Distillation (BSD). BSD operates on each attention layer of the denoising U-Net of a diffusion model as it extracts and injects the per-objective activations into a unified denoising pipeline from which the deformation gradients are calculated. To localize the expression of these activations, we create a probabilistic Region of Interest (ROI) map on the surface of the mesh, and turn it into 3D-consistent masks that we use to control the expression of these activations. We demonstrate the effectiveness of BSD empirically and show that it can deform various meshes towards multiple objectives.


Deformation into Multiple Concepts

MeshUp takes as input a 3D mesh and several target objectives, and deforms the source mesh by optimizing the per-triangle Jacobians of the mesh. MeshUp produces a deformation that blends multiple concepts together, respective of the user-defined weights for each concept. Notice how the deformation smoothly transitions the source mesh into a mixture of different concepts, and weights dictate the magnitude of their expression. MeshUp supports as many targets as desired.


Localized Deformation

MeshUp can also localize the deformation to specific regions on the mesh. The user can select a set of control vertices to define where on the mesh each concept should be expressed. The deformation (of one or multiple concepts) is then constrained to these regions, producing a more controlled and localized result.


Method Overview

At each iteration, it creates parallel UNET branches, assigned each to the input target objectives (we call these the "Target Branches"). Then, it passes the same noised renderings and the corresponding text input through the UNet of a pretrained text-to-image model. On a different branch (the "Blending Branch"), it interpolates the activations extracted from the Target Branches, and interpolates them, respective to the assigned weights. Replacing the activations in the blending branch with the interpolated activations, MeshUp backpropagates the gradients from the blending branch via Score Distillation Sampling (SDS), updating the Jacobians of the mesh accordingly.


Blended Score Distillation Overview

The key idea of Blended Score Distillation (BSD) is to perform a weighted interpolation of the activations that represent different concepts. We first extract the activations from the Target Branches, interpolate their activations, and then inject them into the Blending Branch. As we perform this operation on each attention layer of the UNet, the blending branch yields a score that effectively represents the weighted blend of the target objectives.


Localiztion Overview

To locaalize the deformation, MeshUp creates a probabilistic Region of Interest (ROI) map on the surface of the mesh using the self-attention maps of the selected control vertices. It then uses this 3D ROI map to restrain the jacobians of the mesh, ensuring that the deformation is localized to the selected regions. For localization objectives with multiple targets, we rasterize the 3D ROI map of each target from the same viewpoint as the renderings, and use them as binary masks to control the expression of the activations.


Comparison with Various Methods

Compared to the state-of-the-art text-to-3D generative models, MeshUp produces results that have both better triangulation and geometric details. Our method is also first to support local deformation capabilities, which not only gurantees the preservation of unselected regions, but also greatly surpasses the quality of meshes generated solely using text descriptions.

BibTeX

@misc{kim2024meshupmultitargetmeshdeformation,
      title={MeshUp: Multi-Target Mesh Deformation via Blended Score Distillation}, 
      author={Hyunwoo Kim and Itai Lang and Noam Aigerman and Thibault Groueix and Vladimir G. Kim and Rana Hanocka},
      year={2024},
      eprint={2408.14899},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2408.14899}, 
}