Semantic Dust3r

Il y a 4 mois


Grenoble, France NAVER LABS Europe Temps plein

Dense and Unconstrained Stereo 3D Reconstruction (DUSt3R [1]) is a novel breakthrough approach in 3D reconstruction which, contrary to most existing methods, doesn’t require any camera parameters to perform direct 3D reconstruction from image content. DUSt3R is robust and fast, works on any number of images, even when they do not overlap. It represents a major advancement in 3D geometric vision, offering a substantial simplification over traditional methods and huge potential and versatility in the handling of diverse 3D vision challenges.

The goal of this internship is to make DUSt3R semantic-aware, namely to endow the model with semantic and contextual understanding of the 3D scene. DUSt3R is already able to efficiently decode 3D point-maps with rich geometric details from pairs of images. Our goal would thus be to enrich the model such that it can directly output semantically labeled point maps. In this internship we intend to explore several possibilities to jointly learn the geometry and the semantics. As a baseline we can refine the geometric model with supervised data, however more interesting paths can be explored such as self-supervised from Foundation models as in [2,3] where the model learns to jointly decode point maps and semantic feature maps (e.g. CLIP). Finally, we can also go a step further and instead of predicting semantically enhanced 3D points, we could explore if the model can predict semantically enhanced 3D Gaussian Splatting [3,4].

The applicants to this internship are required to have strong knowledge in Computer Vision with Solid Deep Learning Background, including experience with Visual Transformers and preferably good knowledge about semantic segmentation.

**Supervisors**: Gabriela Csurka, Yohann Cabon

References

**[1] Wang et al, DUSt3R**: Geometric 3D Vision Made Easy, CVPR’24

**[2] Kerr et al, LERF**: Language Embedded Radiance Fields, ICCV'23

**[3] Zuo et al, FMGS**: Foundation Model Embedded 3D Gaussian Splatting for Holistic 3D Scene Understanding, arXiv: 2401.01970

[4] Kerbl et al, 3D Gaussian Splatting for real-time radiance field rendering, ACM Transactions on Graphics (ToG), 42(4): 1-14, 2023.

Application instructions

Please note that applicants must be registered students at a university or other academic institution and that this establishment will need to sign an 'Internship Convention' with NAVER LABS Europe before the student is accepted.

About NAVER LABS

NAVER is the #1 Internet portal in Korea with activities that span a wide range of businesses including search, commerce, content, financial and cloud platforms.

NAVER LABS, co-located in Korea and France, is the organization dedicated to preparing NAVER’s future. NAVER LABS Europe is located in a spectacular setting in Grenoble, in the heart of the French Alps. Scientists at NAVER LABS Europe are empowered to pursue long-term research problems that, if successful, can have significant impact and transform NAVER. We take our ideas as far as research can to create the best technology of its kind. Active participation in the academic community and collaborations with world-class public research groups are, among others, important tools to achieve these goals. Teamwork, focus and persistence are important values for us.

NAVER LABS Europe is an equal opportunity employer.