Headphones
Necklace
Arms
Poncho
Belt
Necklace
Hat
Hair
We present 3D Highlighter, a technique for localizing semantic regions on a mesh using text as input. A key feature of our system is the ability to interpret “out-of-domain” localizations. Our system demonstrates the ability to reason about where to place non-obviously related concepts on an input 3D shape, such as adding clothing to a bare 3D animal model. Our method contextualizes the text description using a neural field and colors the corresponding region of the shape using a probability-weighted blend. Our neural optimization is guided by a pre-trained CLIP encoder, which bypasses the need for any 3D datasets or 3D annotations. Thus, 3D Highlighter is highly flexible, general, and capable of producing localizations on a myriad of input shapes.
3D Highlighter is able to reason about where to highlight a geometrically-absent region on shape. The resulting localizations demonstrate global understanding and localized part-awareness.
The Neural Highlighter maps each point on the input mesh to a probability. The mesh is colored using a probability-weighted blend and then rendered from multiple views. The neural highlighter weights are guided by the similarity between the CLIP embeddings of the 2D augmented images and the input text.
@article{decatur2022highlighter,
author = {Decatur, Dale and Lang, Itai and Hanocka, Rana},
title = {3D Highlighter: Localizing Regions on 3D Shapes via Text Descriptions},
journal = {arXiv},
year = {2022},
}