PhD Position F/M Explainable and frugal audio scene description

Il y a 6 mois


Paris, France INRIA Temps plein

Contexte et atouts du poste

Inria Défense&Sécurité (Inria D&S) was created in 2020 to federate Inria’s actions for the benefit of military forces. The PhD will be carried out within the audio processing research team of Inria D&S, under the supervision of Jean-François Bonastre and co-supervised by Raphaël Duroselle.

 The automatic audio scene description task is to present operators with a summary of the information present in the scene, in the form of augmented text. This text provides a visual summary of the most important information, while efficiently structuring access to specific information. Here is an illustrative example of a summary: « This five-minute recording features three different speakers. Speaker A corresponds to a known identity in the database and speaks French with a strong Monawa accent, speakers B and C are unknown in the database and speak English in their interactions with A and use an unidentified language when talking to each other. The voices of B and C show strong similarities with speakers from the Eastern Quabar region. The main theme of the recording concerns a transfer of goods between the cities of Orienta and Flagrance. The date July 8, 2023 is mentioned three times.». Clicking on A gives the operator information about A and details of the voice identification performed. There will be direct access to the time segments during which A spoke and to their transcription. The transcription will highlight names of people, places or dates (named entities).

Mission confiée

Goal

The aim of this thesis is to propose a general framework for processing audio recordings for intelligence purposes. It consists in defining a high-level application adapted to the needs of end users, favouring the presentation of a recording in the form of a summary report to highlight its salient points.

Approach

This approach is inspired both by textual description of video scenes [1] and by dialogue systems based on audio-visual scenes [2]. The system will be based on the extraction of speech signal representations at different scales (frame, speech segment or sound event, complete recording), possibly dedicated to different tasks. The representations, useful for the various technological bricks of the system, will be embeddings extracted from deep neural networks, either generic [3] or dedicated to each task. The fusion between the different levels of information can be achieved with an architecture inspired by the multi-stream "Encoder-Decoder" scheme [4], with several encoders producing sequences of representations and one or more decoders performing the tasks or sub-tasks required by the system. One of these decoders will produce a textual summary of the scene.

Potential research directions, aiming to go beyond an audio scene description system by assembling existing bricks, can be discussed and refined with the candidate.

Principales activités

Bibliography, development and evaluation of deep learning systems ; Definition of a new task, definition of a corpus and evaluation protocol ; Work on the alignment between self-supervised representations of the speech signal and large language models ; Weakly supervised system training ; System evaluation.

Compétences

Master level in computer science, mathematics or phonetics.

Strong interest in applied research.

Written and spoken English

Signal processing

Machine learning and deep learning

Experience with deep learning toolkits such as pytorch or keras

Speech processing experience, knowledge of open source toolkits such as kaldi or speechbrain.

References

[1] Aafaq, N., Mian, A., Liu, W., Gilani, S. Z., & Shah, M. . Video description: A survey of methods, datasets, and evaluation metrics. ACM Computing Surveys (CSUR), 52, 1-37.

[2] Hori, Chiori, Huda Alamri, Jue Wang, Gordon Wichern, Takaaki Hori, Anoop Cherian, Tim K. Marks, et al. « End-to-End Audio Visual Scene-Aware Dialog Using Multimodal Attention-Based Video Features ». In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2352‑56. Brighton, United Kingdom: IEEE, 2019. [3] Zhang, C., & Tian, Y. (2016, December). Automatic video description generation via lstm with joint two-stream encoding. In 2016 23rd International Conference on Pattern Recognition (ICPR) (pp. 2924-2929). IEEE.

[4] Pratap, Vineel, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, et al. 2023. « Scaling Speech Technology to 1,000+ Languages ». arXiv.

Avantages

Subsidized meals, Partial reimbursement of public transport costs, Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.), Possibility of teleworking and flexible organization of working hours, Professional equipment available (videoconferencing, loan of computer equipment, etc.), Social, cultural and sports events and activities,

Rémunération

1st and 2nd year : 2082 € bruts - gross /month 3rd year : 2190 € bruts - gross /month

  • Paris, Île-de-France INRIA Temps plein

    Research OpportunityINRIA Défense&Sécurité is seeking a PhD researcher to work on the development of explainable audio scene description systems. The successful candidate will join the audio processing research team and contribute to the design and implementation of novel deep learning architectures for audio signal processing.Key ResponsibilitiesDevelop...


  • Paris, Île-de-France INRIA Temps plein

    Job Description We are seeking a highly motivated PhD researcher to join our team at INRIA Défense&Sécurité. The successful candidate will be working on a PhD project focused on developing a general framework for processing audio recordings for intelligence purposes. Key ResponsibilitiesDevelop a high-level application adapted to the needs of end users,...


  • Paris, France INRIA Temps plein

    Contexte et atouts du poste Inria Défense&Sécurité (Inria D&S) a été créé en 2020 pour fédérer les actions d’Inria répondant aux besoins numériques des forces armées et forces de l’intérieur. La thèse sera réalisée au sein de l’équipe de recherche en traitement de l’audio de Inria D&S, sous la direction de Jean-François...


  • Paris, Île-de-France INRIA Temps plein

    Contexte et objectifs du posteINRIA Défense&Sécurité (INRIA D&S) a été créé pour répondre aux besoins numériques des forces armées et forces de l'intérieur. Le poste de chercheur en traitement de l'audio est disponible dans l'équipe de recherche en traitement de l'audio de INRIA D&S, sous la direction de Jean-François Bonastre et co-encadrée...

  • 3D Audio Researcher

    il y a 3 semaines


    Paris, Île-de-France IRCAM Temps plein

    Research Position in 3D Audio and Audio Signal ProcessingThe Acoustic and Cognitive Spaces team at IRCAM is currently recruiting for a research position in 3D audio and audio signal processing. The selected candidate will be part of a team specializing in the analysis/synthesis and perception of immersive audio.Responsibilities:Optimization of a 3D hybrid...


  • Paris, France Inria Temps plein

    Le descriptif de l’offre ci-dessous est en Anglais_ **Type de contrat **:CDD **Niveau de diplôme exigé **:Bac + 5 ou équivalent **Fonction **:Doctorant **Contexte et atouts du poste**: **Mission confiée**: **PhD topic** In particular, this Ph.D. thesis aims to detect formal and informal logical fallacies in a multilingual corpus of political...

  • Audio Lead

    Il y a 2 mois


    Paris, France Lionbridge Temps plein

    Paris, Île-de-France, France**Job Title**: Audio Lead **Location**: Paris, France **Salary**: 25,000 - 35,000 EUR per month **Audio Lead** As an Audio Lead at Lionbridge Games, you will be responsible for managing and coordinating all aspects of audio production for video games, focusing on dubbing and voiceover work. You will oversee casting,...


  • Paris, France IRCAM Temps plein

    The Acoustic and Cognitive Spaces team at IRCAM is currently recruiting for a post-doctoral researcher in 3D audio and audio signal processing on an 15-months fixed-term contract. The selected candidate should ideally start as soon as possible but no later than November 1st, 2023. INTRODUCTION TO IRCAM : IRCAM, a non-profit association located at 1,...


  • Paris, Île-de-France Inria Temps plein

    Job Opportunity The following job description is in English Required Qualifications: A graduate degree or equivalent is required for this position.Job Function: PhD PositionJob Context and ResponsibilitiesThis PhD position is focused on the development of optimization and machine learning methods for solving complex problems in various domains.Main...

  • Audio Tech Lead

    Il y a 5 mois


    Paris, France Enchanted Tools Temps plein

    **Change the face of robotics with us.** At Enchanted Tools, we are bringing a new generation of robots to life. Combining world-class engineering expertise and the power of imagination, we plan to make everyone’s life better with robotic characters, by tackling concrete issues and needs. Why you should join us: - You will have a unique opportunity to...

  • Postdoctoral Researcher

    Il y a 7 mois


    Paris, France Meta Temps plein

    **Postdoctoral Researcher (PhD) Responsibilities**: - Perform research to advance the science and technology of intelligent machines - Perform research that enables learning the semantics of data (images, video, text, audio, and other modalities) - Devise better data-driven models of human behavior - Contribute research that can be applied to Facebook...


  • Paris, France Inria Temps plein

    Le descriptif de l’offre ci-dessous est en Anglais_ **Type de contrat **:CDD **Niveau de diplôme exigé **:Bac + 5 ou équivalent **Fonction **:Doctorant **Contexte et atouts du poste**: This PhD project will be realized in the Inria NERV team, a research lab supported by the French institutions Inria, Inserm, CNRS, and Sorbonne University. The team...


  • Paris, Île-de-France IRCAM Temps plein

    The Acoustic and Cognitive Spaces team at IRCAM is currently recruiting for a researcher in the development of active acoustic systems and acoustic signal processing techniques. The successful candidate will be part of an 18-month fixed-term contract and will work on the implementation of innovative audio production frameworks.Job Overview:The researcher...

  • Audio Production Manager

    il y a 1 mois


    Paris, Île-de-France Lionbridge Temps plein

    Job Title: Audio Production ManagerLocation: Paris, FranceSalary: 40,000 EUR per year Audio Production ManagerLionbridge Games is seeking an experienced Audio Production Manager to oversee the coordination of all aspects of audio production for video games, focusing on dubbing and voiceover work. As a key member of our team, you will be responsible for...

  • Audio Lead

    il y a 3 semaines


    Paris, Île-de-France Lionbridge Temps plein

    About the RoleAs an Audio Lead at Lionbridge Games, you will be responsible for the overall management and coordination of audio production for video games, with a focus on dubbing and voiceover work. Key responsibilities include casting, scheduling, and recording sessions, as well as collaborating with internal teams, talent, and freelancers to ensure...

  • Audio Project Manager

    il y a 4 semaines


    Paris, Île-de-France Lionbridge Temps plein

    Job OverviewLionbridge Games is seeking an experienced Audio Lead to manage and coordinate all aspects of audio production for video games, focusing on dubbing and voiceover work. This role will oversee casting, scheduling, and recording sessions while collaborating with internal teams, talent, and freelancers to ensure seamless project execution.Key...


  • Paris, Île-de-France IRCAM Temps plein

    The Acoustic and Cognitive Spaces team at IRCAM invites applications for a postdoctoral researcher in 3D audio and audio signal processing.IntroductionIRCAM is a leading research institution in the field of music and science, affiliated with the Georges Pompidou National Centre for Art and Culture.Project OverviewThe CONTINUUM project aims to develop novel...


  • Paris, Île-de-France INRIA Temps plein

    Contexte du posteInria Défense&Sécurité recherche un chercheur en traitement de l'audio pour contribuer à la définition d'un système de description de scènes audio pour les besoins des forces armées et forces de l'intérieur.MissionLe candidat devra développer un cadre général pour le traitement des enregistrements audio dans le cadre du...

  • Project Manager

    il y a 1 mois


    Paris, Île-de-France Audio Temps plein

    Project Controller Opportunity at AudioWe are seeking a skilled Project Controller to join our team at Audio, a company shaping the future of sound. As a Project Controller, you will play a crucial role in budgetary monitoring and management control, ensuring the success of our projects.Your Mission:Contribute to the costing and margin elements of a project...


  • Paris, France Inria Temps plein

    Contexte et atouts du posteThis PhD project will be realized in the Inria NERV team, a research lab supported by the French institutions Inria, Inserm, CNRS, and Sorbonne University. The team is located in the Paris Brain Institute (ICM) within the Pitie-Salpetriere hospital. The NERV team pursues a multidsciplinary research program at the intersection...