Postdoctoral Research Visit F/M Deep generative models for robust and generalizable audio-visual speech enhancement

il y a 3 jours

VillerslèsNancy, Grand Est, France INRIA Temps plein

Context and strengths of the position

This postdoctoral research is part of the REAVISE project: "Robust and Efficient Deep Learning based Audiovisual Speech Enhancement" funded by the French National Research Agency (ANR). The general objective of REAVISE is to develop a unified audio-visual speech enhancement (AVSE) framework. This will leverage recent breakthroughs in statistical signal processing, machine learning, and deep neural networks to create a robust and efficient AVSE system.

The postdoctoral researcher will be supervised by (associate professor, University of Lorraine), as members of the (Inria Grenoble), member of the Work environment:

Mission entrusted

Background. Audio-visual speech enhancement (AVSE) aims to improve the intelligibility and quality of noisy speech signals by utilizing complementary visual information, such as the lip movements of the speaker [1]. This technique is especially useful in highly noisy environments. The advent of deep neural network (DNN) architectures has led to significant advancements in AVSE, prompting extensive research into the area [1]. Existing DNN-based AVSE methods are divided into supervised and unsupervised approaches. In supervised approaches, a DNN is trained on a large audiovisual corpus, like AVSpeech [2], which includes a wide range of noise conditions. This training enables the DNN to transform noisy speech signals and corresponding video frames into a clean speech estimate. These models are typically complex, containing millions of parameters.

On the other hand, unsupervised methods [3-5] employ statistical modeling combined with DNNs. These methods use deep generative models, such as variational autoencoders (VAEs) [6] and diffusion models [7], trained on clean datasets like TCD-TIMIT [8], to probabilistically estimate clean speech signals. Since these models do not train on noisy data, they are generally lighter than supervised models and may offer better generalization capabilities and robustness to visual noise, as indicated by their probabilistic nature [3-5]. Despite these advantages, unsupervised methods remain less explored compared to their supervised counterparts.

Main activities

Objectives. In this project, we aim to develop a robust and efficient AVSE framework by thoroughly exploring the integration of recent deep-learning architectures designed for speech enhancement, encompassing both supervised and unsupervised approaches. Our goal is to leverage the strengths of both strategies alongside cutting-edge generative modeling techniques to bridge their gap. This includes the implementation of computationally efficient multimodal (latent) diffusion models, dynamical VAEs [9], temporal convolutional networks (TCNs) [10], and attention-based methods [11]. The main objectives of the project are outlined as follows:

Develop a neural architecture that assesses the reliability of lip images—whether they are frontal, non-frontal, occluded, in extreme poses, or missing—by providing a normalized reliability score at the output [12];
Design deep generative models that efficiently exploit the sequential nature of data and effectively fuse audio-visual features;
Integrate the visual reliability analysis network within the deep generative model to selectively use visual data. This will enable a flexible and robust framework for audio-visual fusion and enhancement.

References:

[1] D. Michelsanti, Z. H. Tan, S. X. Zhang, Y. Xu, M. Yu, D. Yu, and J. Jensen, "An overview of deep learning-based audio-visual speech enhancement and separation," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, 2021.[2] A. Ephrat, I. Mosseri, O. Lang, T. Dekel, K. Wilson, A. Hassidim, W.T. Freeman, M. Rubinstein, "Looking-to-Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation," in SIGGRAPH 2018.[3] M. Sadeghi, S. Leglaive, X. Alameda-Pineda, L. Girin, and R. Horaud, "Audio-visual speech enhancement using conditional variational auto-encoders," in IEEE/ACM Transactions on Audio, Speech and Language Processing, vol. 28, pp. 1788 –1800, 2020.[4] A. Golmakani, M. Sadeghi, and R. Serizel, "Audio-visual Speech Enhancement with a Deep Kalman Filter Generative Model," in IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), Rhodes Island, June 2023.[5] B. Nortier, M. Sadeghi, and R. Serizel, "Unsupervised Speech Enhancement with Diffusion-based Generative Models," in IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), Seoul, Korea, April 2024.[6] D. P. Kingma and M. Welling, "An introduction to variational autoencoders," in Foundations and Trends in Machine Learning, vol. 12, no. 4, 2019.[7] Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, "Score-based generative modeling through stochastic differential equations," in International Conference on Learning Representations (ICLR), 2021.[8] N. Harte and E. Gillen, "TCD-TIMIT: An Audio-Visual Corpus of Continuous Speech," in IEEE Transactions on Multimedia, vol.17, no.5, pp , May 2015.[9] L. Girin, S. Leglaive, X. Bie, J. Diard, T. Hueber, and X. Alameda-Pineda, "Dynamical variational autoencoders: A comprehensive review," in Foundations and Trends in Machine Learning, vol. 15, no. 1-2, 2021.[10] C. Lea, R. Vidal, A. Reiter, and G. D. Hager. "Temporal convolutional networks: A unified approach to action segmentation," in European Conference on Computer Vision (ECCV), pp Springer, Cham, 2016.[11] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N Gomez, L. Kaiser, and I. Polosukhin, "Attention is all you need," in Advances in neural information processing systems (NeurIPS), 2017, pp. 5998–6008.[12] Z. Kang, M. Sadeghi, R. Horaud, and X. Alameda-Pineda, "Expression-preserving Face Frontalization Improves Visually Assisted Speech Processing", in International Journal of Computer Vision (IJCV), 2022.

Skills

The preferred profile is described below.

Master's degree, or equivalent, in the field of speech/audio processing, computer vision, machine learning, or in a related field;
Ability to work independently as well as in a team;
Solid programming skills (Python, PyTorch) and deep learning knowledge;
Good level of written and spoken English.

Advantages

Subsidized meals
Partial reimbursement of public transport costs
Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
Possibility of teleworking (after 6 months of employment) and flexible organization of working hours
Professional equipment available (videoconferencing, loan of computer equipment, etc.)
Social, cultural and sports events and activities
Access to vocational training
Social security coverage

Salary

2788€ gross/month

Postdoctoral Research Visit F/M Deep Generative Models for Robust and Generalizable Audio-Visual Speech Enhancement

il y a 3 semaines

Villers-lès-Nancy, Grand Est, France INRIA Temps plein

Job DescriptionContext and Key FeaturesThis postdoctoral research position is part of the REAVISE project, a collaborative effort between INRIA and the University of Lorraine to develop a unified audio-visual speech enhancement framework. The project aims to leverage recent breakthroughs in statistical signal processing, machine learning, and deep neural...
Postdoctoral Research Visit F/M Deep Generative Models for Robust and Generalizable Audio-Visual Speech Enhancement

il y a 4 semaines

Villers-lès-Nancy, Grand Est, France INRIA Temps plein

Job DescriptionContext and Advantages of the PositionThis postdoctoral research position is part of the REAVISE project, a collaborative effort between INRIA and the University of Lorraine, funded by the French National Research Agency (ANR). The project aims to develop a unified audio-visual speech enhancement (AVSE) framework, leveraging recent...
Postdoctoral Research Visit F/M Deep Generative Models for Robust and Generalizable Audio-Visual Speech Enhancement

il y a 3 semaines

Villers-lès-Nancy, Grand Est, France INRIA Temps plein

Job DescriptionContext and Key Features of the PositionThis postdoctoral research position is part of the REAVISE project, a collaborative effort between INRIA and the University of Lorraine, funded by the French National Research Agency (ANR). The project aims to develop a unified audio-visual speech enhancement (AVSE) framework, leveraging recent...
Postdoctoral Research Visit F/M Deep Generative Models for Robust and Generalizable Audio-Visual Speech Enhancement

il y a 6 jours

Villers-lès-Nancy, Grand Est, France INRIA Temps plein

Context and ObjectivesThis postdoctoral research position is part of the REAVISE project, a collaborative effort between INRIA and the University of Lorraine to develop a unified audio-visual speech enhancement framework. The project aims to leverage recent breakthroughs in statistical signal processing, machine learning, and deep neural networks to create a...
Postdoctoral Research Visit F/M Deep Generative Models for Robust and Generalizable Audio-Visual Speech Enhancement

il y a 1 mois

Villers-lès-Nancy, Grand Est, France INRIA Temps plein

Job DescriptionContext and Key FeaturesThis postdoctoral research position is part of the REAVISE project, a collaborative effort between INRIA and the University of Lorraine, focusing on the development of a unified audio-visual speech enhancement framework. The project aims to leverage recent breakthroughs in statistical signal processing, machine...
Postdoctoral Research Visit F/M Deep Generative Models for Robust and Generalizable Audio-Visual Speech Enhancement

il y a 3 semaines

Villers-lès-Nancy, Grand Est, France INRIA Temps plein

Job DescriptionContext and Advantages of the PositionThis postdoctoral research position is part of the REAVISE project, a collaborative effort between INRIA and the University of Lorraine, funded by the French National Research Agency (ANR). The project aims to develop a unified audio-visual speech enhancement (AVSE) framework, leveraging recent...
Postdoctoral Research Visit F/M Deep generative models for robust and generalizable audio-visual speech enhancement

il y a 2 jours

Villers-lès-Nancy, Grand Est, France INRIA Temps plein

Context and advantages of the positionThis postdoctoral research is part of the REAVISE project: "Robust and Efficient Deep Learning based Audiovisual Speech Enhancement" funded by the French National Research Agency (ANR). The general objective of REAVISE is to develop a unified audio-visual speech enhancement (AVSE) framework. This will leverage recent...
Postdoctoral Research Visit F/M Deep Generative Models for Robust and Generalizable Audio-Visual Speech Enhancement

il y a 2 semaines

Villers-lès-Nancy, Grand Est, France INRIA Temps plein

Job DescriptionContext and Key FeaturesThis postdoctoral research position is part of the REAVISE project, a collaborative effort between INRIA and the University of Lorraine, funded by the French National Research Agency (ANR). The project aims to develop a unified audio-visual speech enhancement (AVSE) framework, leveraging recent breakthroughs in...
Postdoctoral Research Visit F/M Deep Generative Models for Robust and Generalizable Audio-Visual Speech Enhancement

il y a 5 jours

Villers-lès-Nancy, Grand Est, France INRIA Temps plein

Job Context and RequirementsThis postdoctoral research position is part of the REAVISE project, a collaborative effort between INRIA and the University of Lorraine, focusing on the development of a unified audio-visual speech enhancement framework. The project aims to leverage recent breakthroughs in statistical signal processing, machine learning, and deep...
Postdoctoral Research Visit F/M Deep Generative Models for Robust and Generalizable Audio-Visual Speech Enhancement

il y a 1 mois

Villers-lès-Nancy, Grand Est, France INRIA Temps plein

Job DescriptionContext and Advantages of the PositionThis postdoctoral research position is part of the REAVISE project, a collaborative effort between INRIA and the University of Lorraine, aimed at developing a unified audio-visual speech enhancement framework. The project focuses on leveraging recent breakthroughs in statistical signal processing, machine...
Postdoctoral Research Visit F/M Deep Generative Models for Robust and Generalizable Audio-Visual Speech Enhancement

il y a 1 semaine

Villers-lès-Nancy, Grand Est, France INRIA Temps plein

Job DescriptionAs a postdoctoral researcher at INRIA, you will be part of the REAVISE project, which aims to develop a unified audio-visual speech enhancement framework. This project leverages recent breakthroughs in statistical signal processing, machine learning, and deep neural networks to create a robust and efficient AVSE system.Key...
Postdoctoral Research Visit F/M Deep Generative Models for Robust and Generalizable Audio-Visual Speech Enhancement

il y a 2 semaines

Villers-lès-Nancy, Grand Est, France INRIA Temps plein

Job DescriptionContext and Key FeaturesThis postdoctoral research position is part of the REAVISE project, a 3-year research initiative funded by the French National Research Agency (ANR). The project aims to develop a unified audio-visual speech enhancement framework, leveraging recent breakthroughs in statistical signal processing, machine learning, and...
Postdoctoral Research Visit F/M Deep Generative Models for Robust and Generalizable Audio-Visual Speech Enhancement

il y a 4 semaines

Villers-lès-Nancy, Grand Est, France INRIA Temps plein

Job DescriptionContext and Key FeaturesThis postdoctoral research position is part of the REAVISE project, a 3-year research initiative funded by the French National Research Agency (ANR). The project aims to develop a unified audio-visual speech enhancement framework, leveraging recent breakthroughs in statistical signal processing, machine learning, and...
Postdoctoral Researcher in Distributed Voronoi Diagrams

il y a 3 semaines

Nancy, Grand Est, France INRIA Temps plein

Job Title: Postdoctoral Research Visit F/M Distributed Voronoi Diagrams for Large-Scale Optimal TransportWithin the framework of the COSMOGRAM project, in the PARAM project-team at INRIA, we are seeking a highly motivated postdoctoral researcher to work on developing novel algorithms for computing large-scale Voronoi diagrams on PC clusters.Context and...
Postdoctoral Researcher in Polymer Synthesis and Characterization

il y a 4 semaines

Nancy, Grand Est, France Université de Lorraine Temps plein

Job DescriptionThe Université de Lorraine is seeking a highly motivated Postdoctoral Researcher to join our team in the field of polymer synthesis and characterization. The successful candidate will be part of the ANR NoToBAlz project, which aims to develop nanobody thermogels for the treatment of Alzheimer's disease.Main ResponsibilitiesCharacterize...
Postdoctoral Researcher in Polymer Synthesis and Characterization

il y a 4 semaines

Nancy, Grand Est, France Université de Lorraine Temps plein

Job DescriptionThe Université de Lorraine is seeking a highly motivated Postdoctoral Researcher to join our team in the field of polymer synthesis and characterization. The successful candidate will be part of the ANR NoToBAlz project, which aims to develop nanobody thermogels for the treatment of Alzheimer's disease.Main ResponsibilitiesCharacterize...
Postdoctoral Researcher in Thermal Modeling

il y a 4 semaines

Nancy, Grand Est, France CNRS Temps plein

Job SummaryWe are seeking a highly motivated Postdoctoral Researcher to join our team at CNRS. The successful candidate will work on a project focused on thermal modeling and simulation of battery modules. The goal is to develop a precise and systemic thermal model of a battery module incorporating new materials, as well as to create virtual tools for...
Postdoctoral Researcher

il y a 7 jours

Nancy, Grand Est, France INRIA Temps plein

Research PositionWithin the framework of the COSMOGRAM project, we are seeking a highly motivated Postdoctoral Researcher to join our team at INRIA. The successful candidate will be responsible for developing a novel algorithm to compute large-scale Voronoi diagrams on PC clusters, in collaboration with Bruno Lévy.Key ResponsibilitiesDesign and implement a...
Postdoctoral Researcher in Distributed Voronoi Diagrams for Large-Scale Optimal Transport

il y a 4 semaines

Nancy, Grand Est, France INRIA Temps plein

Job DescriptionWithin the framework of the COSMOGRAM project, we are seeking a highly motivated Postdoctoral Researcher to join our team at INRIA. The successful candidate will be part of the PARAM project-team and will work closely with Dr. Bruno Lévy to develop a novel algorithm for computing large-scale Voronoi diagrams on PC clusters.Key...
Postdoctoral Researcher

il y a 4 jours

Nancy, Grand Est, France INRIA Temps plein

Job Opportunity at INRIAProject OverviewThe COSMOGRAM project, currently funded by an Inria exploratory action, aims to develop new geometric methods for computational cosmology. As part of this project, we are seeking a postdoctoral researcher to work on developing a novel algorithm for computing large-scale Voronoi diagrams on PC...

Amériques

Europe

Asie / Océanie

Afrique

Postdoctoral Research Visit F/M Deep generative models for robust and generalizable audio-visual speech enhancement