3D Highlighter: Localizing Regions on 3D Shapes via Text Descriptions

University of Chicago

3D Highlighter localizes semantic regions on a shape using text as input. Our technique reasons about where to place seemingly unrelated concepts in semantically meaningful locations on the 3D shape, such as a 'necklace' on a horse or 'shoes' on an alien.

Abstract

We present 3D Highlighter, a technique for localizing semantic regions on a mesh using text as input. A key feature of our system is the ability to interpret “out-of-domain” localizations. Our system demonstrates the ability to reason about where to place non-obviously related concepts on an input 3D shape, such as adding clothing to a bare 3D animal model. Our method contextualizes the text description using a neural field and colors the corresponding region of the shape using a probability-weighted blend. Our neural optimization is guided by a pre-trained CLIP encoder, which bypasses the need for any 3D datasets or 3D annotations. Thus, 3D Highlighter is highly flexible, general, and capable of producing localizations on a myriad of input shapes.


Semantic Highlighting

3D Highlighter is able to reason about where to highlight a geometrically-absent region on shape. The resulting localizations demonstrate global understanding and localized part-awareness.


Network Overview

The Neural Highlighter maps each point on the input mesh to a probability. The mesh is colored using a probability-weighted blend and then rendered from multiple views. The neural highlighter weights are guided by the similarity between the CLIP embeddings of the 2D augmented images and the input text.

BibTeX

@article{decatur2022highlighter,
  author    = {Decatur, Dale and Lang, Itai and Hanocka, Rana},
  title     = {3D Highlighter: Localizing Regions on 3D Shapes via Text Descriptions},
  journal   = {arXiv},
  year      = {2022},
}