PhD Position F/M Code and Proof Generation with Large Language Models

Il y a 3 mois


Paris, France INRIA Temps plein

PhD Position F/M Code and Proof Generation with Large Language Models

Le descriptif de l’offre ci-dessous est en Anglais

Type de contrat : CDD

Niveau de diplôme exigé : Bac + 5 ou équivalent

Fonction : Doctorant

Mission confiée

Generative AI is gaining momentum and has raised significant interest in tackling more and more problems from linguistics, maths, commonsense reasoning, biology, physics, etc.Transformers introduced in [1] have quickly become the state-of-the-art neural networkarchitecture for sequence processing with applications ranging from natural language processing and computer vision to code generation [2]. Transformers performances scale with the number of parameters and the number of training data [3], and with modern GPU/TPU chips it is nowpossible to train very large Transformers models with billions of parameters.Large Language Models (LLMs), like GPT-4, are extremely large Transformers models trained for natural processing tasks on huge datasets containing billions of words. After the initial training, LLMs can be specialized for a specific task using various techniques: - Fine-tuning consists in adjusting the parameters of an LLM by re-training the model, or part of the model, on a specialized dataset starting from pre-trained parameters. In addition, direct preference optimization [4] can fine-tune LMs to align with human preferences, achieving precise control of the behavior of LLMs. - Prompt augmentation techniques leverage the capabilities of general-purpose LLMs to learn andadapt by adding context directly in the user input thanks to a prompt. Retrieval Augmented Generation (RAG) [5] is an advanced form of prompt augmentation where, given a prompt, relevant data are retrieved from an external database, and added to the original prompt.
Beyond natural language, general-purpose LLMs quickly demonstrated emergent programming abilities due to the presence of code in the training dataset. There has been an explosion of specialized LLMs either entirely trained or fine-tuned on code: AlphaCode [6], StarCoder [7], Codex [2], CodeT5 [8], Code LLaMa [9], etc. Researchers are only beginning to explore the capabilities of LLMs for software development and many challenges need to be addressed.In this thesis, we will explore new research in neural code generation and applications to formal verification. To improve the reliability of LLM based code assistants, we will explore possibleinteractions between the LLM and external tools like a Python interpreter, a test framework, or a proof assistant. While LLM based interactive tools are only nascent, they have thepotential to improve software development at every level.LLMs have shown promise in proving formal theorems using interactive theorem provers (ITP) such as Isabelle, Lean or Coq. While full proof automation remains challenging, one of our goals in this thesis is to build a tool to enable the triple interaction human-ITP-LLM for Coq. We will explore various fine-tuning and prompt augmentation techniques in this context and then focus more precisely on the verification of generated code. We want to use an LLM to formalize a specification in Coq, and generate both the corresponding code, and a proof of correctness using existing formalized semantics. The proof assistant then tries the proof to accept or reject a program, and the human can validate the formal specification, or refine it if necessary.
**References:**
- [1] Attention Is All You Need, Vaswani et al., 2017
- [2] Evaluating Large Language Models Trained on Code, Chen et al., 2021
- [3] Training Compute-Optimal Large Language Models, Hoffmann et al., 2022
- [4] Direct Preference Optimization: Your Language Model is Secretly a Reward Model, Rafailov et al., 2023
- [5] Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, Lewis et al., 2020
- [6] Competition-Level Code Generation with AlphaCode, Li et al., 2022
- [7] StarCoder: may the source be with you, Li et al., 2023
- [8] CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation, Wang et al., 2021,
- [9] Code Llama: Open Foundation Models for Code, Rozière et al., 2023

Avantages

Subsidized meals Partial reimbursement of public transport costs Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.) Possibility of teleworking Flexible organization of working hours (after 12 months of employment) Professional equipment available (videoconferencing, loan of computer equipment, etc.) Social, cultural and sports events and activities Access to vocational training Social security coverage

  • Paris, Île-de-France INRIA Temps plein

    PhD Position in Code and Proof Generation with Large Language ModelsWe are seeking a highly motivated PhD student to join our research team at INRIA, focusing on code and proof generation with large language models. The successful candidate will work on developing new techniques for neural code generation and exploring their applications to formal...


  • Paris, Île-de-France INRIA Temps plein

    About the JobWe are seeking a highly motivated PhD researcher to join our team at INRIA and contribute to the development of novel techniques for code and proof generation using large language models.Job DescriptionResearch Focus: The successful candidate will work on the development of new methods for neural code generation and formal verification,...


  • Paris, Île-de-France INRIA Temps plein

    About the JobWe are seeking a highly motivated PhD researcher to join our team at INRIA and contribute to the development of novel techniques for code and proof generation using large language models.Job DescriptionResearch Focus: The successful candidate will work on the development of new methods for neural code generation and formal verification,...


  • Paris, France Meta Temps plein

    Meta is seeking a Postdoctoral Researcher to work on innovative approaches to code generation with LLMs at FAIR (Fundamental AI Research). We publish groundbreaking papers and release frameworks/libraries that are widely used in the open-source community. We closely collaborate with other organizations at Meta to bring the latest research findings to...


  • Paris, France Meta Temps plein

    Meta is seeking a Postdoctoral Researcher to work on innovative approaches to code generation with LLMs at FAIR (Fundamental AI Research). We publish groundbreaking papers and release frameworks/libraries that are widely used in the open-source community. We closely collaborate with other organizations at Meta to bring the latest research findings to...


  • Paris, France Meta Temps plein

    Postdoctoral Researcher, Code Generation (PhD)Apply to this job Location pin icon Paris, France Apply to this job Meta is seeking a Postdoctoral Researcher to work on innovative approaches to code generation with LLMs at FAIR (Fundamental AI Research). We publish groundbreaking papers and release frameworks/libraries that are widely used in the open-source...

  • Software Engineer

    il y a 1 semaine


    Paris, Île-de-France Datadog Temps plein

    Job Summary:We are seeking a highly skilled Senior Software Engineer to join our AI Code Insights team at Datadog. As a key member of this team, you will be responsible for creating AI-assisted experiences that empower developers with actionable information.About the Role:Create AI-assisted experiences that provide developers with actionable information,...


  • Paris, Île-de-France Meta Temps plein

    Position OverviewMeta is in search of a Postdoctoral Researcher focused on pioneering methods for code generation utilizing Large Language Models (LLMs) at the Fundamental AI Research (FAIR) division. Our team is dedicated to publishing influential research and developing frameworks that are integral to the open-source ecosystem. We engage in collaborative...


  • Paris, Île-de-France Mistral AI Temps plein

    Position Overview:Mistral AI is seeking a skilled professional to excel in the domain of pre-training and fine-tuning extensive language models.Key Responsibilities:Adapt pre-trained extensive language models to enhance their interaction capabilities with users.Enable extensive language models to utilize external tools effectively.Align extensive language...


  • Paris, Île-de-France Meta Temps plein

    Position OverviewMeta is on the lookout for a Postdoctoral Researcher focused on pioneering methodologies in code generation utilizing Large Language Models (LLMs) at FAIR (Fundamental AI Research). Our team is dedicated to publishing influential research and developing frameworks that significantly contribute to the open-source ecosystem.As a Postdoctoral...

  • Data Scientist

    il y a 1 semaine


    Paris, Île-de-France Datadog Temps plein

    Job Summary:We are seeking a highly skilled Data Scientist to join our Large Language Models team at Datadog. As a Data Scientist, you will play a key role in developing and deploying Large Language Models and Generative AI technologies to create powerful features within our application.About the Role:Design and develop large-scale distributed fine-tuning...


  • Paris, Île-de-France Meta Temps plein

    Position OverviewMeta is on the lookout for a Postdoctoral Researcher specializing in code generation utilizing Large Language Models (LLMs) at the Fundamental AI Research (FAIR) division. Our team is dedicated to pioneering research and producing influential publications, as well as developing frameworks and libraries that contribute significantly to the...

  • Data Scientist

    Il y a 4 mois


    Paris, France Datadog Temps plein

    Data Scientist - Large Language Models / Generative AI Paris, France Senior Data Scientist - Large Language Models / Generative AI Our Data Science Large Language Models team uses advanced generative AI technologies to create powerful features within the Datadog application. Our focus lies in the fine-tuning, training, and serving of LLMs to...

  • Data Scientist

    il y a 2 jours


    Paris, Île-de-France Datadog Temps plein

    About the RoleWe are seeking a highly skilled Data Scientist to join our Large Language Models team at Datadog. As a key member of our team, you will be responsible for developing and deploying cutting-edge AI technologies to power our application.Key ResponsibilitiesDesign and implement large-scale distributed fine-tuning and training infrastructure for our...


  • Paris, Île-de-France Meta Temps plein

    Meta is on the lookout for a Postdoctoral Researcher focused on pioneering methods in code generation utilizing Large Language Models (LLMs) at FAIR (Fundamental AI Research). Our team is dedicated to publishing influential research and developing frameworks and libraries that are extensively utilized within the open-source community. We engage in...


  • Paris, Île-de-France Meta Temps plein

    Summary: Meta is seeking a Research Scientist to join our Llama Large Language Model (LLM) Research team. We are looking for recognized experts in NLP or reinforcement learning; with experience in areas like LLM alignment; multilingual modeling; code generation; responsible AI; and model controllability. The ideal candidate will have an interest in producing...

  • Data Scientist

    il y a 7 jours


    Paris, Île-de-France Datadog Temps plein

    Job Summary:We are seeking a highly skilled Data Scientist to join our Large Language Models team at Datadog. As a Data Scientist, you will play a key role in developing and deploying large-scale distributed fine-tuning and training infrastructure, deploying LLMs on GPU instances for real-time use cases, designing robust, secure infrastructure, and...

  • Language AI Research Scientist

    il y a 4 semaines


    Paris, Île-de-France Meta Temps plein

    Meta is in search of a Research Scientist to contribute to our Llama Large Language Model (LLM) Research team. We are looking for acknowledged specialists in Natural Language Processing (NLP) or reinforcement learning, with expertise in areas such as LLM alignment, multilingual modeling, code generation, responsible AI, and model controllability. The ideal...


  • Paris, Île-de-France INRIA Temps plein

    About the PositionWe are seeking a highly motivated PhD researcher to join our team at INRIA, working on a project focused on explainable and frugal audio scene description. The successful candidate will have the opportunity to contribute to the development of innovative audio processing techniques and work closely with our team of experts in the...


  • Paris, Île-de-France INRIA Temps plein

    About the PositionWe are seeking a highly motivated PhD researcher to join our team at INRIA, working on a project focused on explainable and frugal audio scene description. The successful candidate will have the opportunity to contribute to the development of innovative audio processing techniques and work closely with our team of experts in the...