Phd Position F/m Data Selection Techniques for Llms Reasoning Improvement

il y a 3 jours


Villeneuved'Ascq, France Inria Temps plein

Le descriptif de l’offre ci-dessous est en Anglais_

**Type de contrat**: CDD

**Niveau de diplôme exigé**: Bac + 5 ou équivalent

**Fonction**: Doctorant

**A propos du centre ou de la direction fonctionnelle**:
The Inria University of Lille centre, created in 2008, employs 360 people including 305 scientists in 15 research teams. Recognised for its strong involvement in the socio-economic development of the Hauts-De-France region, the Inria University of Lille centre pursues a close relationship with large companies and SMEs. By promoting synergies between researchers and industrialists, Inria participates in the transfer of skills and expertise in digital technologies and provides access to the best European and international research for the benefit of innovation and companies, particularly in the region.

For more than 10 years, the Inria University of Lille centre has been located at the heart of Lille's university and scientific ecosystem, as well as at the heart of Frenchtech, with a technology showroom based on Avenue de Bretagne in Lille, on the EuraTechnologies site of economic excellence dedicated to information and communication technologies (ICT).

**Contexte et atouts du poste**:
Large Language Models (LLMs) have demonstrated remarkable capabilities, with reasoning models highlighting the critical role of high-quality training data. While procedural generation offers infinite training datasets in domains like logical reasoning, games, and retrieval, not all synthetic data contributes equally. Generated examples often suffer from redundancy, inappropriate difficulty, or lack meaningful signal—for instance, large number arithmetic may appear challenging but provides mínimal educational value.

This PhD research addresses **optimal data selection from infinite procedural sources**, moving beyond ad-hoc metrics like diversity and difficulty. The work will develop principled methodologies for assessing training data **impact profiles** using influence techniques (influence functions, Shapley values) to quantify how individual examples contribute to model capabilities, with connections to curriculum learning principles.

**Keywords**: Large Language Models, Data Selection, Procedural Generation, Influence Functions, Training Efficiency

**Mission confiée**:
This PhD student will collaborate with Damien Sileo and the Adada consortium (engineers and interns) to develop **intelligent data selection methods** for procedurally generated datasets. The research focuses on extracting high-value training examples from massive synthetic data pools, moving beyond simple similarity metrics to downstream tasks toward principled selection criteria that optimize model performance and learning efficiency.

***:
**Principales activités**:
**Data Generation & Filtering**:

- Contribute marginally to synthetic problem generators to understand generation mechanisms
- Develop large-scale data filtering pipelines for procedurally generated datasets
- Explore data representation techniques for effective sample characterization

**Core Research Focus**:

- Extract optimal coresets from massive synthetic datasets tailored to specific downstream tasks
- Design adaptive curriculum strategies accounting for model scale (larger models requiring more challenging examples)
- Develop hyperparameter modulation techniques for controlled generation diversity and difficulty calibration
- Move beyond similarity-based metrics to develop principled selection criteria optimizing learning outcomes

**Validation & Dissemination**:

- Evaluate coreset extraction and curriculum strategies across diverse reasoning tasks
- Assess scalability and computational efficiency of proposed filtering methods
- Conduct controlled experiments measuring downstream performance improvements
- Write and disseminate research findings through publications and presentations

**Compétences**:
Languages : English (french not mandatory)

Programming language: Python

Deep learning and statistics background

Knowledge of logic and symbolic AI is a plus

**Avantages**:

- Subsidized meals
- Partial reimbursement of public transport costs
- Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
- Possibility of teleworking and flexible organization of working hours
- Professional equipment available (videoconferencing, loan of computer equipment, etc.)
- Social, cultural and sports events and activities
- Access to vocational training
- Social security coverage

**Rémunération**:
2100 € (gross monthly salary)

**Informations générales**:

- **Thème/Domaine**: Représentation et traitement des données et des connaissances
Statistiques (Big data) (BAP E)
- **Ville**: Villeneuve d'Ascq
- **Centre Inria**: Centre Inria de l'Université de Lille
- **Date de prise de fonction souhaitée**: 2025-09-01
- **Durée de contrat**: 3 ans
- **Date limite pour postuler**:


  • Engineer Position F/m

    il y a 1 semaine


    Villeneuve-d'Ascq, France Inria Temps plein

    Le descriptif de l’offre ci-dessous est en Anglais_ **Type de contrat **:CDD **Niveau de diplôme exigé **:Bac + 5 ou équivalent **Fonction **:Ingénieur scientifique contractuel **A propos du centre ou de la direction fonctionnelle**: The Inria University of Lille centre, created in 2008, employs 360 people including 305 scientists in 15 research...


  • Villeneuve-d'Ascq, France Inria Temps plein

    Le descriptif de l’offre ci-dessous est en Anglais_ **Type de contrat**: CDD **Niveau de diplôme exigé**: Bac + 5 ou équivalent **Fonction**: Doctorant **A propos du centre ou de la direction fonctionnelle**: Created in 2008, the Inria center at the University of Lille employs 360 people, including 305 scientists in 15 research teams. Recognized for...


  • Villeneuve-d'Ascq, Hauts-de-France Inria Temps plein

    Le descriptif de l'offre ci-dessous est en AnglaisType de contrat : CDDNiveau de diplôme exigé : Bac + 5 ou équivalentFonction : DoctorantA propos du centre ou de la direction fonctionnelleThe Inria University of Lille centre, created in 2008, employs 360 people including 305 scientists in 16 research teams. Recognised for its strong involvement in the...


  • Villeneuve-d'Ascq, France Inria Temps plein

    Le descriptif de l’offre ci-dessous est en Anglais_ **Type de contrat **:Stage **Niveau de diplôme exigé **:Bac + 4 ou équivalent **Fonction **:Stagiaire de la recherche **A propos du centre ou de la direction fonctionnelle**: The Inria University of Lille centre, created in 2008, employs 360 people including 305 scientists in 15 research teams....


  • Villeneuve-d'Ascq, France Université Gustave Eiffel Temps plein

    Postdoctoral researcher in AI-based channel estimation and spectrum detection algorithms for realistic channel Join to apply for the Postdoctoral researcher in AI-based channel estimation and spectrum detection algorithms for realistic channel role at Université Gustave Eiffel. À propos de nous L'Université Gustave Eiffel, modèle innovant d’université...


  • Villeneuve-d'Ascq, France Inria Temps plein

    Le descriptif de l’offre ci-dessous est en Anglais_ **Type de contrat**: CDD **Niveau de diplôme exigé**: Thèse ou équivalent **Fonction**: Post-Doctorant **A propos du centre ou de la direction fonctionnelle**: - The Inria University of Lille centre, created in 2008, employs 360 people including 305 scientists in 15 research teams. Recognised for...


  • Villeneuve-d'Ascq, France Inria Temps plein

    Le descriptif de l’offre ci-dessous est en Anglais_ **Type de contrat**: CDD **Niveau de diplôme exigé**: Bac + 5 ou équivalent **Fonction**: Doctorant **A propos du centre ou de la direction fonctionnelle**: Created in 2008, the Inria center at the University of Lille employs 360 people, including 305 scientists in 15 research teams. Recognized for...


  • Villeneuve-d'Ascq, Hauts-de-France Université Gustave Eiffel Temps plein

    À propos de nousL'Université Gustave Eiffel, modèle innovant d'université rassemblant le triptyque université, écoles et organisme de recherche, dispose de plusieurs campus de formation et de recherche implantés sur le territoire national.L'établissement compte plus de 15000 étudiants et plus de 3000 personnels enseignant (e)s-chercheur(e)s,...


  • Villeneuve-d'Ascq, France Inria Temps plein

    Le descriptif de l’offre ci-dessous est en Anglais_ **Type de contrat**: CDD **Niveau de diplôme exigé**: Bac + 5 ou équivalent **Fonction**: Doctorant **A propos du centre ou de la direction fonctionnelle**: Created in 2008, the Inria center at the University of Lille employs 360 people, including 305 scientists in 15 research teams. Recognized for...


  • Villeneuve-d'Ascq, France Cofidis Temps plein

    à la liste des offres **Responsable d'Equipe Sélections et Outils CRM h/f (H/F)** **COFIDIS - Type de contrat- CDI _ - Statut- Cadre_ - Métier- Management_ - Localisation- VILLENEUVE D ASCQ (59)_ Niveau d études- BAC + 4 validé, BAC + 5 validé ou en cours_ - Niveau d expérience- Confirmé_ - Salaire- 48-58 k € brut annuel fixe + intéressement et...