Internship - Comparative analysis of diffusion models and variational autoencoders as data-driven priors for speech enhancement
il y a 2 semaines
Le descriptif de l'offre ci-dessous est en Anglais
Type de contrat : Convention de stage
Niveau de diplôme exigé : Bac + 4 ou équivalent
Fonction : Stagiaire de la recherche
Contexte et atouts du posteThis master internship is part of the REAVISE project: "Robust and Efficient Deep Learning based Audiovisual Speech Enhancement" funded by the French National Research Agency (ANR). The general objective of REAVISE is to develop a unified audio-visual speech enhancement (AVSE) framework that leverages recent methodological breakthroughs in statistical signal processing, machine learning, and deep neural networks in order to design a robust and efficient AVSE framework.
The intern will be supervised by Mostafa Sadeghi (researcher, Inria), Romain Serizel (associate professor, University of Lorraine), as members of the MULTISPEECH team, and Xavier Alameda-Pineda (Inria Grenoble), member of the RobotLearn team. The intern will benefit from the research environment, expertise, and powerful computational resources (GPUs & CPUs) of the team.
Mission confiéeGenerative models have increasingly become a fundamental tool in solving several inverse problems in an unsupervised way [1, 2]. This technique relies on the ability of generative models to learn the inherent characteristics of target clean data. Specifically, for speech enhancement, establishing a generative model acts as a data-driven speech prior, enabling the estimation of high-quality speech from noisy recordings without the direct need for corresponding pairs of clean and noisy data [2, 3, 4]. This unsupervised learning approach is particularly advantageous as it eliminates the dependency on extensive labeled datasets, which are often challenging and costly to procure, as done in supervised methods [5]. Moreover, training with only clean speech allows these models to better generalize to a variety of noisy environments they have never encountered, thus offering potentially broader applications in real-world scenarios where noise conditions are not predictable.
Principales activitésThe use of variational autoencoders (VAEs) [3, 4] and diffusion models [2] represents the forefront of research in generative models for unsupervised speech enhancement. However, the field lacks a systematic comparison that evaluates these models side by side under standardized conditions. This project aims to bridge this gap through meticulously designed experiments that compare the effectiveness of VAEs and diffusion models in speech enhancement tasks. Each model will be implemented using similar network architectures to ensure that any differences in performance are attributed to the model capabilities and not to disparities in model complexity or configuration. The objective includes not only quantifying their performance in enhancing speech but also understanding their operational differences, resilience to various noise types, and computational efficiency. The insights gained from this analysis will provide valuable guidance for future developments in speech processing technologies, aiming to optimize model selection and configuration for specific enhancement needs.
More precisely, the objectives of this project are outlined below:
- Implement both variational autoencoders and diffusion models using similar architectures to ensure comparability. Conduct detailed performance evaluations focusing on speech quality, intelligibility, noise reduction, and model efficiency under various noise conditions.
- Analyze the strengths and limitations of each model in handling diverse environmental noises and document their operational differences to determine their suitability for different speech enhancement scenarios.
References
[1] G. Daras, H. Chung, C.-H. Lai, Y. Mitsufuji, J. C. Ye, P. Milanfar, A. G. Dimakis, and M. Delbracio, A survey on diffusion models for inverse problems arXiv preprint arXiv : , 2024.
[2] B. Nortier, M. Sadeghi, and R. Serizel, Unsupervised speech enhancement with diffusion-based generative models In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024.
[3] X. Bie, S. Leglaive, X. Alameda-Pineda, and L. Girin, Unsupervised speech enhancement using dynamical variational autoencoders IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp , 2022.
[4] M. Sadeghi, and R. Serizel, Posterior sampling algorithms for unsupervised speech enhancement with recurrent variational autoencoder In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024.
[5] J. Richter, S. Welker, J.-M. Lemercier, B. Lay, and T. Gerkmann, Speech enhancement and dereverberation with diffusion-based generative models IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp , 2023.
CompétencesPreferred qualifications for candidates include a strong foundation in statistical (speech) signal processing, and computer vision, as well as expertise in machine learning and proficiency with deep learning frameworks, particularly PyTorch.
Avantages- Subsidized meals
- Partial reimbursement of public transport costs
- Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
- Possibility of teleworking (after 6 months of employment) and flexible organization of working hours
- Professional equipment available (videoconferencing, loan of computer equipment, etc.)
- Social, cultural and sports events and activities
- Access to vocational training
- Social security coverage
€ 4.35/hour
Informations générales- Thème/Domaine : Langue, parole et audio
Calcul Scientifique (BAP E)
- Ville : Villers lès Nancy
- Centre Inria : Centre Inria de l'Université de Lorraine
- Date de prise de fonction souhaitée :
- Durée de contrat : 6 mois
- Date limite pour postuler :
Attention: Les candidatures doivent être déposées en ligne sur le site Inria. Le traitement des candidatures adressées par d'autres canaux n'est pas garanti.
Consignes pour postulerSécurité défense :
Ce poste est susceptible d'être affecté dans une zone à régime restrictif (ZRR), telle que définie dans le décret n° relatif à la protection du potentiel scientifique et technique de la nation (PPST). L'autorisation d'accès à une zone est délivrée par le chef d'établissement, après avis ministériel favorable, tel que défini dans l'arrêté du 03 juillet 2012, relatif à la PPST. Un avis ministériel défavorable pour un poste affecté dans une ZRR aurait pour conséquence l'annulation du recrutement.
Politique de recrutement :
Dans le cadre de sa politique diversité, tous les postes Inria sont accessibles aux personnes en situation de handicap.
Contacts- Équipe Inria : MULTISPEECH
- Recruteur :
Sadeghi Mostafa /
L'essentiel pour réussirProspective applicants are invited to submit their academic transcripts, a detailed curriculum vitae (CV), and, if they choose, a cover letter. The cover letter should highlight the reasons for their enthusiasm and interest in this specific project.
A propos d'InriaInria est l'institut national de recherche dédié aux sciences et technologies du numérique. Il emploie 2600 personnes. Ses 215 équipes-projets agiles, en général communes avec des partenaires académiques, impliquent plus de 3900 scientifiques pour relever les défis du numérique, souvent à l'interface d'autres disciplines. L'institut fait appel à de nombreux talents dans plus d'une quarantaine de métiers différents. 900 personnels d'appui à la recherche et à l'innovation contribuent à faire émerger et grandir des projets scientifiques ou entrepreneuriaux qui impactent le monde. Inria travaille avec de nombreuses entreprises et a accompagné la création de plus de 200 start-up. L'institut s'efforce ainsi de répondre aux enjeux de la transformation numérique de la science, de la société et de l'économie.
-
Master internship
il y a 3 jours
Nancy, Grand Est, France Loria Temps pleinMaster 2 Research Internship – Acoustic Aware Speech Enhancement in Distributed Microphone ArraysLab:Loria / Inria Nancy – Grand Est, Nancy )Supervisors:Romain Serizel (LORIA), François Effa (LORIA)Start:Spring 2026Duration:6 MonthsMotivations and contextThis internship takes place within the ANR-DFG project AWESOME. The project involves researchers...
-
Nancy, Grand Est, France Institut Jean Lamour (IJL) Temps pleinMasters Internship in Computational Materials Science:AI-Driven Neural Operators for Modeling of Phase Transformations in Materials Location: Institut Jean Lamour, Nancy, France Duration: 6 months, starting in early 2026 Application deadline: 30 November 2025 Ideal Candidate: excellent student of physics, applied mathematics, mechanical/chemical/process...
-
Nancy, Grand Est, France Centre de Recherche en Automatique de Nancy ( CRAN ) Temps pleinHow to ensure sufficient data richness for the estimation of stochastic dynamical systems in finite time?Réf ABG-134446Sujet de Thèse18/11/2025Contrat doctoralCentre de Recherche en Automatique de Nancy ( CRAN )Lieu de travailNancy - Grand Est - FranceIntitulé du sujetHow to ensure sufficient data richness for the estimation of stochastic dynamical...
-
Villers-lès-Nancy, Grand Est, France Inria Temps pleinType de contrat : Convention de stageNiveau de diplôme exigé : Bac + 4 ou équivalentFonction : Stagiaire de la rechercheContexte et atouts du posteContext and funding:This position is funded by the euROBIN project.Within this framework, the HUCEBOT team is developing multimodal strategies for online control and adaptation of dynamic legged robot...
-
Villers-lès-Nancy, Grand Est, France Inria Temps pleinLe descriptif de l'offre ci-dessous est en AnglaisType de contrat : CDDNiveau de diplôme exigé : Thèse ou équivalentFonction : Post-DoctorantNiveau d'expérience souhaité : De 3 à 5 ansContexte et atouts du posteThis 2-year postdoctoral position is funded by the prestigious Programme Inria Quadrant (PIQ) for the project DynaNova, which aims to advance...
-
Villers-lès-Nancy, Grand Est, France Inria Temps pleinLe descriptif de l'offre ci-dessous est en AnglaisType de contrat : Convention de stageNiveau de diplôme exigé : Bac + 4 ou équivalentFonction : Stagiaire de la rechercheContexte et atouts du posteContext and funding:This position is funded by the PEPR O2R AS3 project.Within this framework, the HUCEBOT team is developing multimodal strategies for online...
-
Villers-lès-Nancy, Grand Est, France Inria Temps pleinType de contrat : CDDNiveau de diplôme exigé : Bac + 5 ou équivalentFonction : DoctorantContexte et atouts du posteThis 3-year PhD position is funded by the prestigious Programme Inria Quadrant (PIQ) for the project DynaNova, which aims to advance our understanding of conformational dynamics and allosteric communication in macromolecular complexes. The...
-
Engineering Internship – 4th Year
il y a 3 jours
Nancy, Grand Est, France YPSO FACTO Temps pleinWe are always looking for talented and motivated people to help us disrupt the way life science processes are developed. This is how we contribute to build a better world.We help obtaining innovative molecules, shortening development times, minimizing the environmental impact.You are skilled and passionate ? You want to join a dynamic growing company?Send us...
-
Research Engineer in digital solutions for product quality management
il y a 2 semaines
Maizières-lès-Metz, Grand Est, France ArcelorMittal Temps pleinDescriptionArcelorMittal is the world's largest steel producer. We use the most innovative technologies to create the steels tomorrow's world will be made of. Every day over 125,000 of our talented people, located in over 60 countries, push the boundaries of digitalization, and use advanced technologies to create a stronger, faster, and smarter world. To...
-
Research Engineer in digital solutions for product quality management
il y a 2 semaines
Maizières-lès-Metz, Grand Est, France ArcelorMittal - Recommended Jobs Temps plein 80 000 € - 120 000 € par anArcelorMittal is the world's largest steel producer. We use the most innovative technologies to create the steels tomorrow's world will be made of. Every day over 125,000 of our talented people, located in over 60 countries, push the boundaries of digitalization, and use advanced technologies to create a stronger, faster, and smarter world. To help make this...