Postdoctoral Research Visit F/M Deep generative models for robust and generalizable audio-visual speech enhancement

il y a 4 semaines

VillerslèsNancy, Grand Est, France INRIA Temps plein

Context and advantages of the position

This postdoctoral research is part of the REAVISE project: "Robust and Efficient Deep Learning based Audiovisual Speech Enhancement" funded by the French National Research Agency (ANR). The general objective of REAVISE is to develop a unified audio-visual speech enhancement (AVSE) framework. This will leverage recent breakthroughs in statistical signal processing, machine learning, and deep neural networks to create a robust and efficient AVSE system.

The postdoctoral researcher will be supervised by (associate professor, University of Lorraine), as members of the (Inria Grenoble), member of the Work environment:

Mission confiée

Background. Audio-visual speech enhancement (AVSE) aims to improve the intelligibility and quality of noisy speech signals by utilizing complementary visual information, such as the lip movements of the speaker [1]. This technique is especially useful in highly noisy environments. The advent of deep neural network (DNN) architectures has led to significant advancements in AVSE, prompting extensive research into the area [1]. Existing DNN-based AVSE methods are divided into supervised and unsupervised approaches. In supervised approaches, a DNN is trained on a large audiovisual corpus, like AVSpeech [2], which includes a wide range of noise conditions. This training enables the DNN to transform noisy speech signals and corresponding video frames into a clean speech estimate. These models are typically complex, containing millions of parameters.

On the other hand, unsupervised methods [3-5] employ statistical modeling combined with DNNs. These methods use deep generative models, such as variational autoencoders (VAEs) [6] and diffusion models [7], trained on clean datasets like TCD-TIMIT [8], to probabilistically estimate clean speech signals. Since these models do not train on noisy data, they are generally lighter than supervised models and may offer better generalization capabilities and robustness to visual noise, as indicated by their probabilistic nature [3-5]. Despite these advantages, unsupervised methods remain less explored compared to their supervised counterparts.

Main activities

Objectives. In this project, we aim to develop a robust and efficient AVSE framework by thoroughly exploring the integration of recent deep-learning architectures designed for speech enhancement, encompassing both supervised and unsupervised approaches. Our goal is to leverage the strengths of both strategies alongside cutting-edge generative modeling techniques to bridge their gap. This includes the implementation of computationally efficient multimodal (latent) diffusion models, dynamical VAEs [9], temporal convolutional networks (TCNs) [10], and attention-based methods [11]. The main objectives of the project are outlined as follows:

Develop a neural architecture that assesses the reliability of lip images—whether they are frontal, non-frontal, occluded, in extreme poses, or missing—by providing a normalized reliability score at the output [12];
Design deep generative models that efficiently exploit the sequential nature of data and effectively fuse audio-visual features;
Integrate the visual reliability analysis network within the deep generative model to selectively use visual data. This will enable a flexible and robust framework for audio-visual fusion and enhancement.

References:

[1] D. Michelsanti, Z. H. Tan, S. X. Zhang, Y. Xu, M. Yu, D. Yu, and J. Jensen, "An overview of deep learning-based audio-visual speech enhancement and separation," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, 2021.[2] A. Ephrat, I. Mosseri, O. Lang, T. Dekel, K. Wilson, A. Hassidim, W.T. Freeman, M. Rubinstein, "Looking-to-Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation," in SIGGRAPH 2018.[3] M. Sadeghi, S. Leglaive, X. Alameda-Pineda, L. Girin, and R. Horaud, "Audio-visual speech enhancement using conditional variational auto-encoders," in IEEE/ACM Transactions on Audio, Speech and Language Processing, vol. 28, pp. 1788 –1800, 2020.[4] A. Golmakani, M. Sadeghi, and R. Serizel, "Audio-visual Speech Enhancement with a Deep Kalman Filter Generative Model," in IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), Rhodes Island, June 2023.[5] B. Nortier, M. Sadeghi, and R. Serizel, "Unsupervised Speech Enhancement with Diffusion-based Generative Models," in IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), Seoul, Korea, April 2024.[6] D. P. Kingma and M. Welling, "An introduction to variational autoencoders," in Foundations and Trends in Machine Learning, vol. 12, no. 4, 2019.[7] Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, "Score-based generative modeling through stochastic differential equations," in International Conference on Learning Representations (ICLR), 2021.[8] N. Harte and E. Gillen, "TCD-TIMIT: An Audio-Visual Corpus of Continuous Speech," in IEEE Transactions on Multimedia, vol.17, no.5, pp , May 2015.[9] L. Girin, S. Leglaive, X. Bie, J. Diard, T. Hueber, and X. Alameda-Pineda, "Dynamical variational autoencoders: A comprehensive review," in Foundations and Trends in Machine Learning, vol. 15, no. 1-2, 2021.[10] C. Lea, R. Vidal, A. Reiter, and G. D. Hager. "Temporal convolutional networks: A unified approach to action segmentation," in European Conference on Computer Vision (ECCV), pp Springer, Cham, 2016.[11] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N Gomez, L. Kaiser, and I. Polosukhin, "Attention is all you need," in Advances in neural information processing systems (NeurIPS), 2017, pp. 5998–6008.[12] Z. Kang, M. Sadeghi, R. Horaud, and X. Alameda-Pineda, "Expression-preserving Face Frontalization Improves Visually Assisted Speech Processing", in International Journal of Computer Vision (IJCV), 2022.

Skills

The preferred profile is described below.

Master's degree, or equivalent, in the field of speech/audio processing, computer vision, machine learning, or in a related field;
Ability to work independently as well as in a team;
Solid programming skills (Python, PyTorch) and deep learning knowledge;
Good level of written and spoken English.

Advantages

Subsidized meals
Partial reimbursement of public transport costs
Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
Possibility of teleworking (after 6 months of employment) and flexible organization of working hours
Professional equipment available (videoconferencing, loan of computer equipment, etc.)
Social, cultural and sports events and activities
Access to vocational training
Social security coverage

Salary

2788€ gross/month

Postdoctoral Research Visit F/M Deep generative models for robust and generalizable audio-visual speech enhancement

il y a 4 semaines

Villers-lès-Nancy, Grand Est, France INRIA Temps plein

Context and strengths of the positionThis postdoctoral research is part of the REAVISE project: "Robust and Efficient Deep Learning based Audiovisual Speech Enhancement" funded by the French National Research Agency (ANR). The general objective of REAVISE is to develop a unified audio-visual speech enhancement (AVSE) framework. This will leverage recent...
Postdoctoral Research Visit F/M Deep Generative Models for Robust and Generalizable Audio-Visual Speech Enhancement

il y a 4 semaines

Villers-lès-Nancy, Grand Est, France INRIA Temps plein

Job Context and RequirementsThis postdoctoral research position is part of the REAVISE project, a collaborative effort between INRIA and the University of Lorraine, focusing on the development of a unified audio-visual speech enhancement framework. The project aims to leverage recent breakthroughs in statistical signal processing, machine learning, and deep...
Postdoctoral Researcher in Quantum Circuit Optimization

il y a 3 semaines

Villers-lès-Nancy, Grand Est, France INRIA Temps plein

OverviewThe Mocqua team at Inria Lorraine is seeking a postdoctoral researcher to work on optimizing quantum circuits.ObjectiveThe primary objective of this position is to extend recent results on the equational theory of quantum circuits to more general settings, including qudit circuits.Main TasksThe successful candidate will develop new techniques for...
Postdoctoral Researcher in Distributed Voronoi Diagrams for Large-Scale Optimal Transport

il y a 3 semaines

Nancy, Grand Est, France INRIA Temps plein

COSMOGRAM Project - Postdoctoral ResearcherWithin the framework of the COSMOGRAM project, we are seeking a highly motivated Postdoctoral Researcher to join our team at the INRIA research center.About the ProjectThe COSMOGRAM project aims to develop new geometric methods for computational cosmology.The project is currently funded by an Inria exploratory...
Postdoctoral Researcher

il y a 4 semaines

Nancy, Grand Est, France INRIA Temps plein

Job Opportunity at INRIAProject OverviewThe COSMOGRAM project, currently funded by an Inria exploratory action, aims to develop new geometric methods for computational cosmology. As part of this project, we are seeking a postdoctoral researcher to work on developing a novel algorithm for computing large-scale Voronoi diagrams on PC...
Researcher Position F/M in Collaborative Systems for Trust Evaluation

il y a 2 semaines

Villers-lès-Nancy, Grand Est, France INRIA Temps plein

Research Context and ObjectivesThis research position will be part of a project focused on developing a computational trust model for collaborative systems. The goal is to evaluate trust between users in a large-scale collaborative environment, such as Wikipedia.BackgroundCollaborative systems, where multiple users work together to achieve a shared goal, are...
Postdoctoral Researcher

il y a 4 semaines

Nancy, Grand Est, France INRIA Temps plein

Research Position in Distributed Voronoi DiagramsWe are seeking a highly motivated postdoctoral researcher to join our team at INRIA and contribute to the development of novel algorithms for computing large-scale Voronoi diagrams on PC clusters.About the ProjectThe COSMOGRAM project, funded by an Inria exploratory action, aims to develop new geometric...
Research Engineer in Advanced Models and Control Engineering

il y a 4 semaines

Maizières-lès-Metz, Grand Est, France ArcelorMittal Temps plein

ArcelorMittal is a global leader in the steel industry, leveraging innovative technology to create sustainable solutions for the future. As a Research Engineer in Advanced Models and Control Engineering, you will be part of a highly qualified team driving digital transformation in steel manufacturing.The Digital Technologies cluster, part of the Process Lab,...
Research Engineer in Advanced Models and Control Engineering

il y a 4 semaines

Maizières-lès-Metz, Grand Est, France ArcelorMittal Temps plein

ArcelorMittal is a world leader in steel production, leveraging cutting-edge technology to create the materials of tomorrow. Our team of over 190,000 talented individuals, spread across 60 countries, is dedicated to pushing the boundaries of digitalization and innovation. To support this mission, we offer a comprehensive training and support program that...
Postdoctoral Research Visit Position: Distributed Voronoi Diagrams for Large-Scale Optimal Transport

il y a 2 semaines

Nancy, Grand Est, France INRIA Temps plein

Position ContextThe COSMOGRAM project, currently funded by an Inria exploratory action (AeX grant), aims to develop new geometric methods for computational cosmology.Key ResponsibilitiesDevelop a novel algorithm to compute large-scale Voronoi diagrams on PC clusters, assisted by Bruno Lévy.Implement and conduct practical experiments using the Grid5000...
Distributed Systems Researcher: Designing Secure and Scalable File Systems

il y a 6 jours

Villers-lès-Nancy, Grand Est, France INRIA Temps plein

Job DescriptionWe are seeking a highly skilled researcher to join our team at INRIA, working on the design of secure and scalable distributed file systems.Job OverviewThis PhD position offers a unique opportunity to contribute to cutting-edge research in the field of distributed systems, focusing on the development of a collaborative file system that...
PhD Position F/M: Investigating Trust and Legitimacy in Collaborative Writing Systems

il y a 4 semaines

Villers-lès-Nancy, Grand Est, France INRIA Temps plein

Research Opportunity: Collaborative Writing SystemsThis PhD thesis will be supervised by Claudia-Lavinia Ignat, researcher at Inria, and co-supervised by Léo Joubert, assistant professor at Université de Rouen Normandie.Research ContextLarge-scale collaborative systems, where multiple users collaborate to achieve a shared goal, are gaining attention from...
PhD Position for Distributed File System Collaboration Research

il y a 3 semaines

Villers-lès-Nancy, Grand Est, France INRIA Temps plein

Context and Benefits of the JobThis PhD research will take place in the COAST team, under the supervision of Claudia-Lavinia Ignat, HDR, CRCN at INRIA, and Gérald Oster, MCF at Lorraine University.Mission ObjectivesOur collaborative file system has to support several collaboration modes: connected, disconnected, and ad-hoc collaboration. We want to build a...
PhD Position in User Trust and Legitimacy in Collaborative Writing Systems

il y a 3 semaines

Villers-lès-Nancy, Grand Est, France INRIA Temps plein

Research Opportunity: We are seeking a highly motivated PhD candidate to join our research team at INRIA, focusing on user trust and legitimacy in collaborative writing systems.Context: In the context of large-scale collaborative systems, trust and legitimacy play a crucial role in ensuring the quality and accuracy of collaborative content. Our research aims...
Research and Development Engineer in Surface Product Characterization

il y a 2 semaines

Maizières-lès-Metz, Grand Est, France ArcelorMittal Temps plein

ArcelorMittal: Innovating Steel ProductionArcelorMittal is a leading steel producer pushing the boundaries of digitalization and innovation. To drive this progress, we're seeking a skilled Research Engineer to join our Digital Technologies cluster in the Process lab.In this role, you'll develop new solutions for product characterization, focusing on steel...
PhD Researcher in Distributed Collaborative Systems Security

il y a 6 jours

Villers-lès-Nancy, Grand Est, France INRIA Temps plein

**Job Summary**We are seeking a highly motivated PhD researcher to join our team at INRIA and contribute to the development of innovative security mechanisms for distributed collaborative systems.The successful candidate will have the opportunity to work on a cutting-edge project that focuses on designing and implementing secure access control mechanisms for...
PhD Position F/M: Collaborative File System Researcher

il y a 4 semaines

Villers-lès-Nancy, Grand Est, France INRIA Temps plein

Post DescriptionThis PhD thesis will take place in the COAST team at INRIA, under the supervision of Claudia-Lavinia Ignat and Gérald Oster.Research MissionThe objective of this research is to design and implement a distributed collaborative file system that allows users to share files directly, without the need for a central authority. The system must...
Ingénieur en Traitement Linguistique et Développement de Modèles de synthèse vocale

il y a 6 jours

Villers-lès-Nancy, Grand Est, France INRIA Temps plein

Contexte du posteL'objectif principal de ce projet est de contribuer au développement de corpus et d'outils libres pour le français et les autres langues de France. Cela implique la création d'un système de synthèse vocale de haute qualité capable de traiter différentes langues et dialectes.MissionsLa mission principale consiste à...
Research Engineer in Surface Product Characterization

il y a 4 semaines

Maizières-lès-Metz, Grand Est, France ArcelorMittal Temps plein

About the JobArcelorMittal, the world's largest steel producer, is seeking a highly skilled Research Engineer to join its Digital Technologies cluster. As a key member of the team, you will be responsible for developing new solutions for product characterization, focusing on spectral and hyperspectral approaches.Working closely with academic laboratories and...
PhD Position F/M: Investigating Trust and Legitimacy in Collaborative Writing on Wikipedia

il y a 4 semaines

Villers-lès-Nancy, Grand Est, France INRIA Temps plein

Job DescriptionThis PhD position is part of a research project focused on understanding trust and legitimacy in collaborative writing on Wikipedia. The successful candidate will work under the supervision of Dr. Claudia-Lavinia Ignat at INRIA and co-supervision by Dr. Léo Joubert at Université de Rouen Normandie.Research ContextCollaborative systems, such...

Amériques

Europe

Asie / Océanie

Afrique

Postdoctoral Research Visit F/M Deep generative models for robust and generalizable audio-visual speech enhancement