In this work we develop 3D Paintbrush, a technique for automatically texturing local semantic regions on meshes via text descriptions. Our method is designed to operate directly on meshes, producing texture maps which seamlessly integrate into standard graphics pipelines. We opt to simultaneously produce a localization map (to specify the edit region) and a texture map which conforms to it. This synergistic approach improves the quality of both the localization and the stylization. To enhance the details and resolution of the textured area, we leverage multiple stages of a cascaded diffusion model to supervise our local editing technique with generative priors learned from images at different resolutions. Our technique, referred to as Cascaded Score Distillation (CSD), simultaneously distills scores at multiple resolutions in a cascaded fashion, enabling control over both the granularity and global understanding of the supervision. We demonstrate the effectiveness of 3D Paintbrush to locally texture a variety of shapes within different semantic regions.
3D Paintbrush produces highly-detailed textures that effectively adhere to the predicted localizations. This enables seamlessly compositing local textures without unwanted fringes (right).
Each point on the surface of the mesh is passed into three different branches to produce a localization probability, texture map, and background map. We texture three different variants of the same mesh with the localization, texture, and background maps and render them from the same viewpoint. Each image along with the corresponding text condition is used to compute the CSD loss.
3D Paintbrush produces highly detailed textures and localizations for a diverse range of meshes and prompts. Our method synthesizes meaningful local edits on shapes, demonstrating both global and local part-level understanding.
We distill scores across multiple stages of a cascaded diffusion model simultaneously in order to leverage both the global awareness of the first stage and the higher level of detail contained in later stages. The difference between the predicted noise and sampled noise is the image gradient for each stage.
3D Paintbrush is capable of producing a variety of local textures on the same mesh. Each result contains an accurate localization map (to specify the edit region) and a texture map that conforms to it.
Using stage 1 only (equivalent to SDS) lacks fine-grained details. Incorporating the second super-resolution cascaded stage from our CSD increases the resolution and detail. Input text prompts (from left to right): Colorful crochet shell, Cactus base, Tiger stripe shirt.
Different stages of the cascaded model provide different levels of granularity and global understanding. Using only the (low resolution) stage 1 model gives a low-resolution result in roughly the correct location. While the (high resolution) stage 2 model gives a high-resolution result, it is placed in the incorrect location. Our CSD simultaneously uses stage 1 and 2, resulting in a high-detailed texture in the appropriate location.
Varying the weight between stage 1 and stage 2 results in control over the details and corresponding localization. Only using stage 1 (leftmost) is rather coarse; only using stage 2 (rightmost) is highly detailed with an incorrect localization. Increasing the stage 2 weight (moving left to right) progressively increases the detail and granularity of the supervision, enabling smooth and meaningful interpolation between stage 1 and 2.
Simultaneously optimizing the localization and texture (left) results in higher-detailed textures which effectively conform to the predicted localization. If we first optimize the localization, then optimize the texture within the localization region (in series, middle), both the localization and texture are less detailed. Independent (right): if we optimize the localization independently (independent: left) and the texture independently (independent: middle), the texture does not align with the localization and thus the masked texture contains fringe artifacts (independent: right).
@article{decatur2023paintbrush,
author = {Decatur, Dale and Lang, Itai and Aberman, Kfir and Hanocka, Rana},
title = {3D Paintbrush: Local Stylization of 3D Shapes with Cascaded Score Distillation},
journal = {arXiv preprint arXiv:2311.09571},
year = {2023},
}