Our method includes three phases: initial creation, automatic refinement, and user-guided refinement. These are conceptual phases in the creation process; each phase involves different agent roles of its own. The first phase creates an initial shape, where implausible configurations, like a disconnected backrest, as well as simplistic geometry are automatically corrected and improved upon by the second phase. Afterwards, our system can accept additional edit instructions from the user, allowing for interactive and iterative 3D asset generation.
Generating and refining iteratively is thus the native mode of operation for LL3M. More than just error correction, the pipeline realizes an iterative, coarse-to-fine creation process, involving both automatic and user-guided refinement.
LL3M is capable of diverse shape generation. The results showcase detailed parts (e.g. the windmill architectural features) in intricate arrangements (e.g. the piano keys, the drum kit), and even a rich appearance (the skateboard) and material properties (the glossy lamp base). A notable feature of our approach is that each mesh is generated through interpretable, editable Blender code.
Starting from different initial meshes produced by LL3M and the same refinement prompt change the style to steampunk LL3M successfully interprets and applies the same style concept to each hat. Each stylized mesh produces distinct variations, including both geometric modifications and appearance changes.
Given an initial mesh produced by our system, our system is capable of editing the materials on a specific part of the mesh (the blade of the knife), by creating comprehensive procedural materials via shader nodes.
LL3M enables multiple successive edits of the same 3D asset. The modifications are faithful to the user's instructions, editing only the specified element while preserving the character's identity.
Our method generates Blender code that is easy to understand and follow. The code is well-documented with descriptive comments, clear variable names, and structured logic. This interpretable code makes it easy to potentially change variables (e.g. the key width) or even algorithmic logic (e.g. the keyboard pattern).
By generating shapes through Blender code, LL3M allows intuitive user edits by virtue of the interpretable parameters transparent in the code and in the generated Blender nodes and structures. For example, when generating a material, our system creates a full set of shader nodes. Users can then easily adjust visual attributes, such as tuning the color or stripe pattern directly in Blender to achieve the desired output.
Despite visual differences, shapes often share high-level code patterns (such as loops, modifiers, and node setups) that recur across categories. This shared structure allows the model to transfer knowledge and generate diverse, editable, and modular code from a wide range of prompts.
LL3M is capable of generating multiple objects and arranging them with appropriate spatial relationships
within a single scene. Our system achieves this task using complex operations such as instancing and parenting
relationships to build the scene hierarchy.
The coding agent can also use parenting for more complex single objects, such as a lamp, when explicitly prompted
to.
Doing so generates shapes with a human-readable hierarchical structure with parent-child
relationships between parts within the scene. This enables scene graph behavior in Blender, where transformations
applied to a parent propagate to its children. Each part in the graph is also assigned a meaningful semantic name.
@misc{lu2025ll3m, title={LL3M: Large Language 3D Modelers}, author={Sining Lu and Guan Chen and Nam Anh Dinh and Itai Lang and Ari Holtzman and Rana Hanocka}, year={2025}, eprint={2508.08228}, archivePrefix={arXiv}, primaryClass={cs.GR}, url={https://arxiv.org/abs/2508.08228}, }