DreamMesh: Jointly Manipulating and Texturing Triangle Meshes for Text-to-3D Generation


ECCV 2024

Haibo Yang, Yang Chen, Yingwei Pan, Ting Yao, Zhineng Chen, Zuxuan Wu, Yu-Gang Jiang, Tao Mei

Fudan University   HiDream.ai Inc.  



Abstract

Learning radiance fields (NeRF) with powerful 2D diffusion models has garnered popularity for text-to-3D generation. Nevertheless, the implicit 3D representations of NeRF lack explicit modeling of meshes and textures over surfaces, and such surface-undefined way may suffer from the issues, e.g., noisy surfaces with ambiguous texture details or cross-view inconsistency. To alleviate this, we present DreamMesh, a novel text-to-3D architecture that pivots on well-defined surfaces (triangle meshes) to generate high-fidelity explicit 3D model. Technically, DreamMesh capitalizes on a distinctive coarse-to-fine scheme. In the coarse stage, the mesh is first deformed by text-guided Jacobians and then DreamMesh textures the mesh with an interlaced use of 2D diffusion models in a tuning free manner from multiple viewpoints. In the fine stage, DreamMesh jointly manipulates the mesh and refines the texture map, leading to high-quality triangle meshes with high-fidelity textured materials. Extensive experiments demonstrate that DreamMesh significantly outperforms state-of-the-art text-to-3D methods in faithfully generating 3D content with richer textual details and enhanced geometry.


1. Qualitative Comparison Against Various Methods

Our DreamMesh pivots on completely explicit 3D representation for text-to-3D generation, yielding high-quality 3D meshes that exhibit clean, organized topology, devoid of any redundant vertices & faces.





2. Application in 3D Rendering Pipeline

It is worthy to note that the synthesized 3D assets by our DreamMesh can be directly applied into existing 3D rendering pipelines (e.g., Blender). We show several interesting application results as follows:

(1) Rigging and Animating 3D Assets

Here we directly import the synthesized camel of our DreamMesh (input prompt: "A high quality photo of a camel") into Blender. Furthermore, we show how to rig and animate this camel in Blender.


(2) Rendering Animations of Multiple 3D Assets

Next, we synthesize two animals via our DreamMesh (input prompts: "A high quality photo of a camel" and "A high quality photo of a giraffe"), and further render a walking animation of both giraffe and camel on a grassy field.




3. More Qualitative Results of Our DreamMesh




Citation

@InProceedings{yang2024dreammesh,
      title={DreamMesh: Jointly Manipulating and Texturing Triangle Meshes for Text-to-3D Generation},
      author={Haibo Yang and Yang Chen and Yingwei Pan and Ting Yao and Zhineng Chen and Zuxuan Wu and Yu-Gang Jiang and Tao Mei},
      booktitle={ECCV},
      year={2024}
    }