Internship: Person-Centric Multimodal Large Language Model F/M

il y a 2 semaines


Meylan, Auvergne-Rhône-Alpes, France NAVER LABS Europe Temps plein

About NAVER LABS Europe

NAVER LABS Europe is part of the R&D division of NAVER, Korea's leading Internet portal and a global tech company with a range of services that include search, commerce, content, fintech, robotics and cloud.

The position

Multimodal Large Language Models (MLLMs) [1,2] have recently made substantial progress in linking visual and linguistic understanding.

However, most current approaches treat visual content as a set of global features, overlooking fine-grained human-centered information that is essential for true multimodal reasoning. MLLMs typically offer limited opportunities for explanation or reasoning about the tasks they are given, due to an absence of explicit representation of multimodal objects. This lack of transparency is particularly problematic in human-centered applications, especially in robotics.

This internship aims to explore person-centric multimodal language modeling, building on recent progress in Video Question Answering (VQA) [7] and social interaction understanding.

The core idea is to extend Vision-Language Models (VLMs) with structured, per-person features that encode geometry, motion, and audio cues to enable a richer understanding of human activity and interaction.

This internship will investigate the integration of low-dimensional 3D pose embeddings, per-person audio streams, and inter-person relational cues into multimodal transformers.

The objective is to evaluate these architectures on VQA-style and captioning benchmarks that involve social and multimodal reasoning, such as Social-IQ [4-5], STAR [6], and NExt-QA [3].

This internship will contribute toward developing multimodal reasoning systems capable of understanding not just what happens in a scene, but who does it, how, and why.

Your mission:

  • Propose an approach for efficiently adapting person-oriented representations from visual and audio modalities to an MLLM
  • Create a benchmark based on existing datasets, focusing on challenging person-centric instances
  • Propose an evaluation methodology outlining the benefits of the added features in terms of explainability

Details:

  • Duration: six months
  • Start date: as soon as possible
About the research team

In the Interactive Systems group, we develop AI capabilities that enable robots to interact safely with humans, other robots, and systems. For a robot to be truly useful, it must represent its knowledge of the world, share what it learns, and interact with other agents, particularly humans. Our research integrates expertise in human-robot interaction, natural language processing, speech, information retrieval, data management, and low-code/no-code programming to create AI components that empower next-generation robots to perform complex real-world tasks.

What we're looking for
  • Master's student with an excellent profile or already enrolled in a PhD program
  • Strong background in computer vision and deep learning
  • Good understanding of multimodal transformers, VLMs/LLMs, and 3D human representations
  • Experience with frameworks such as PyTorch, torchvision, and Hugging Face Transformers
  • Familiarity with LoRA fine-tuning, feature fusion, and multimodal training strategies
  • Curiosity for human-centered AI and an interest in modeling social interactions in videos

What we offer

  • We foster a collaborative environment dedicated to ambitious, multidisciplinary projects that translate advanced research into impactful, real-world solutions, supported by 30+ years of experience in AI and related fields.
  • Flexible work/life balance.
  • We are an equal opportunity employer that hires based on skills, experience, and merit. We foster an inclusive and diverse workplace where all qualified candidates are considered fairly, regardless of background.
  • We're based in Meylan, close to Grenoble, a city that offers the perfect balance of urban life, cutting-edge research and technology, and spectacular mountain landscapes that provide countless opportunities to relax, recharge, and enjoy the outdoors.

All applications will be carefully considered, even if not all required skills are met. We value diverse backgrounds and the potential of each candidate, and we offer training to support the development of necessary skills.

NAVER LABS, co-located in Korea and France, is the organization dedicated to preparing NAVER's future. Scientists at NAVER LABS Europe are empowered to pursue long-term research problems that, if successful, can have significant impact and transform NAVER. We take our ideas as far as research can to create the best technology of its kind. Active participation in the academic community and collaborations with world-class public research groups are, among others, important ways to achieve these goals. Teamwork, focus and persistence are important values for us.

When applying for this position online, please don't forget to upload your CV and cover letter. Incomplete applications will not be considered.

NAVER LABS Europe is subject to French jurisdiction requiring organisations to stipulate that a job/internship is open to both women and men. None of our jobs/internships are gender specific.

References
  • LLaVa: Large Language and Vision Assistant, Lui et al, NeurIPS'23

  • Qwen2.5-VL, Bai et al, arXiv

  • NExt-QA: Next Phase of Question-Answering for Explaining Temporal Actions, Xiao et al, CVPR'21

  • Social-IQ: A Question Answering Benchmark for Artificial Social Intelligence, Zadeh et al, CVPR'19

  • Social-IQ 2.0 Challenge, ICCV'23

  • STAR: A Benchmark for Situated Reasoning in Real-World Videos, Bo et al, NeurIPS'21

  • MoReVQA: Exploring Modular Reasoning Models for Video Question Answering, Min et al, CVPR'24



  • Meylan, Auvergne-Rhône-Alpes, France NAVER LABS Europe Temps plein

    About NAVER LABS Europe NAVER LABS Europe is part of the R&D division of NAVER, Korea's leading Internet portal and a global tech company with a range of services that include search, commerce, content, fintech, robotics and cloud. The positionWe're looking for a highly experienced and exceptionally talented senior scientist in machine learning and...

  • Developpeur Fullstack

    il y a 1 semaine


    Meylan, Auvergne-Rhône-Alpes, France lehibou Temps plein

    Nous recherchons un Développeur Full Stack pour participer au développement d'un système de chatbot. Il/elle sera responsable de la construction des composants front-end et back-end du système, garantissant une expérience utilisateur fluide et des capacités back-end robustes.Ce rôle implique la mise en ?uvre d'interfaces conversationnelles et...


  • Meylan, Auvergne-Rhône-Alpes, France NAVER LABS Europe Temps plein

    About NAVER LABS Europe NAVER LABS Europe is part of the R&D division of NAVER, Korea's leading Internet portal and a global tech company with a range of services that include search, commerce, content, fintech, robotics and cloud. The positionWe are looking for a research scientist to join the Visual Representation Learning (VRL) team at NAVER LABS...

  • Acheteur commodités f/h

    il y a 6 jours


    Meylan, Auvergne-Rhône-Alpes, France The businesses of Merck KGaA, Darmstadt, Germany Temps plein

    Exprimez votre talent avec nous  Vous voulez explorer, franchir des obstacles, faire des découvertes ? Nous savons que vos projets sont ambitieux. Les nôtres aussi Dans le monde entier, nos collègues ont la passion de l'innovation scientifique et technologique qui enrichit les vies humaines grâce à nos solutions dans les domaines Healthcare, Life...


  • Meylan, Auvergne-Rhône-Alpes, France Dolphin Semiconductor Temps plein

    À propos du postePour accompagner la croissance de notre activité Audio et répondre aux exigences de nos clients en matière de densité, de consommation d'énergie et de fréquence de fonctionnement de leurs circuits intégrés, nous recherchons un Ingénieur Vérification Conception (H/F) basé à Meylan (38).ResponsabilitésÊtre responsable de la...

  • Leader Technique Python

    il y a 2 semaines


    Meylan, Auvergne-Rhône-Alpes, France Blue Ortho Temps plein 45 000 € - 55 000 €

    Dans le cadre de notre développement, nous recrutons un(e) Leader Technique Python  H/F en CDI.MissionsRattaché(e) au Manager Infra & Cloud, vous serez un acteur clé dans l'intégration, le déploiement et la maintenance des modèles IA/ML dans nos produits médicaux. Vous travaillerez en étroite collaboration avec les chefs de produit, les chefs de...


  • Meylan, Auvergne-Rhône-Alpes, France NAVER LABS Europe Temps plein

    About NAVER LABS Europe NAVER LABS Europe is part of the R&D division of NAVER, Korea's leading Internet portal and a global tech company with a range of services that include search, commerce, content, fintech, robotics and cloud. The positionThe Human-Robot Interaction (HRI) team is seeking a full-time researcher to contribute to the technical...

  • Leader Technique Python

    il y a 2 semaines


    Meylan, Auvergne-Rhône-Alpes, France BLUE ORTHO Temps plein

    Dans le cadre de notre développement, nous recrutonsun(e) Leader Technique PythonH/Fen CDI.MissionsRattaché(e) au Manager Infra & Cloud,vous serez un acteur clé dans l'intégration, le déploiement et la maintenance des modèles IA/ML dans nos produits médicaux.Vous travaillerez en étroite collaboration avec les chefs de produit, les chefs de projet et...

  • Leader Technique Python

    il y a 2 semaines


    Meylan, Auvergne-Rhône-Alpes, France BLUE ORTHO Temps plein

    Dans le cadre de notre développement, nous recrutonsun(e) Leader Technique PythonH/Fen CDI.MissionsRattaché(e) au Manager Infra & Cloud,vous serez un acteur clé dans l'intégration, le déploiement et la maintenance des modèles IA/ML dans nos produits médicaux.Vous travaillerez en étroite collaboration avec les chefs de produit, les chefs de projet et...


  • Meylan, Auvergne-Rhône-Alpes, France Siemens Temps plein

    Job ID485259Posted since20-Nov-2025OrganizationDigital IndustriesField of workInternal ServicesExperience levelStudent (Not Yet Graduated)Job typeFull-timeWork modeOffice/Site onlyEmployment typeFixed TermLocation(s)Meylan - Auvergne-Rhône-Alpes - FranceContinuité numérique, simulation multiphysique, jumeaux numériques, exploitation de la donnée,...