Reconstructing three-dimensional models from two-dimensional images is a central challenge in computer vision. Advances in machine learning, particularly through so-called Large Reconstruction Models (LRM), have achieved impressive results in recent years. These models are trained with massive datasets and can generate complex 3D structures from different viewpoints. Despite this progress, difficulties remain, particularly in reconstructing from few images or in separating the geometry and texture of the object.
New research introduces an innovative model called DiMeR (Disentangled Mesh Reconstruction Model) that addresses these challenges. The core of DiMeR lies in the decoupling of geometry and texture, both in the input and in the model's architecture. This separation allows for specialized processing of the respective information and reduces the complexity of the learning process, in line with Occam's razor.
In contrast to previous approaches, which often use RGB images directly, DiMeR uses Normal Maps for geometry determination. These maps represent the surface normals and provide a precise description of the geometric structure, independent of color and lighting. By exclusively using Normal Maps for geometry reconstruction, the influence of color and lighting variations is minimized, leading to more robust and accurate geometry determination.
The object's texture is processed in a separate branch of the model, which uses RGB images as input. This two-pronged architecture allows for independent optimization of both sub-areas, leading to improved overall performance.
Another important aspect of DiMeR is the integration of 3D ground-truth data into the mesh extraction algorithm. This allows for more direct supervision of the learning process and leads to a more accurate reconstruction of the 3D models.
The developers of DiMeR have tested the model in various scenarios, including reconstruction from few images, generation of 3D models from single images, and text-to-3D synthesis. The results show that DiMeR significantly outperforms existing methods. In particular, on the GSO and OmniObject3D datasets, an improvement in Chamfer Distance of over 30% was achieved. This highlights the potential of DiMeR for a variety of applications in areas such as virtual reality, augmented reality, and 3D printing.
The decoupling of geometry and texture, the use of Normal Maps, and the integration of 3D ground-truth data represent important innovations in 3D reconstruction. DiMeR demonstrates that significant improvements can be achieved by the targeted use of specific input data and the adaptation of the model architecture to the respective task. These developments contribute to pushing the boundaries of 3D modeling and open up new possibilities for the creation and use of digital 3D content.
Bibliography: - Jiang, L., et al. "DiMeR: Disentangled Mesh Reconstruction Model." arXiv preprint arXiv:2504.17670 (2025). - https://paperreading.club/page?id=301625 - https://github.com/ashawkey/dimr - https://huggingface.co/papers - https://arxiv.org/list/cs.CV/recent - https://me.kiui.moe/dimr/ - GShell: Geometry-guided Shell for 3D Shape Generation. https://gshell3d.github.io/static/paper/gshell.pdf - https://openreview.net/forum?id=R1rNN22IoP - https://conferences.miccai.org/2024/files/downloads/MICCAI2024-Accepted-paper-slotting.pdf