Large Language Models for Information Extraction

Il y a 2 mois


Palaiseau, France CEA Temps plein

Position description

**Category**:

- Mathematics, information, scientific, software

**Contract**:

- Internship

**Job title**:

- Large Language Models for Information Extraction H/F

**Subject**:

- We propose to study the performance of LLMs on Information Extraction tasks.

**Contract duration (months)**:

- 6

**Job description**:

- Large Language Models (LLM) have been widely adopted by the Natural Language Processing (NLP) community and have been applied with success to a variety of tasks (Le Scao et al., 2023; Touvron et al., 2023). These models have been pretrained in a self-supervised fashion on large corpora of raw text and have been tested on standardized benchmarks devised by the community. Most of the time, these benchmarks include Natural Language Understanding (NLU) tasks such as reasoning and common sense in a variety of domains (e.g. microeconomics, physics or maths) (Hendrycks et al., 2021; Srivastava et al., 2023). Other evaluate the capacities of these models to generate code or to translate a program into another language (Zheng et al., 2023). Only a few research efforts concentrate on evaluating these models on information extraction tasks. Among them, Wang et al. (2023) introduce IE INSTRUCTIONS, a benchmark composed of 32 information extraction datasets that includes Named Entity Recognition (NER), Relation Extraction (RE) and Event Extraction (EE) tasks.In this context, we propose to further study the performance of LLMs on Information Extraction tasks. Specifically, this study will focus on their few
- and zero-shot capabilities for Named Entity Recognition (NER) in a context where the number of types of entities to identify in texts is very high, which results in a very small volume of annotated data.. Among other tasks, the successful intern will have the following responsibilities:
- Perform and maintain an up-to-date literature review on the topic
- and zero-shot setting
- Evaluate state-of-the-art models in this framework by relying in our computing cluster
- Devise, implement, and evaluate new methods for zero
- and few-shot NER using pretrained LLMs- Hendrycks et al. (2021). “Measuring Massive Multitask Language Undestanding”. ICLR
- Le Scao et al. (2023). “BLOOM: A 176B-Parameter Open-Access Multilingual Language Model”. arXiv 2211.05100
- Srivastava et al. (2023). “Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models”. arXiv 2206.04615
- Touvron et al. (2023). “Llama 2: Open Foundation and Fine-Tuned Chat Models”. arXiv 2307.09288
- Wang et al. (2023). “InstructUIE: Multi-task Instruction Tuning for Unified Information Extraction”. arXiv 2304.08085**Applicant Profile**:

- Non-exhaustive list of required skills:
- Able to work in a Linux environment
- Background in natural language generation and language modeling
- Familiarity with pre-trained language models and large language models
- Familiarity with Python and specifically with pytorch and other AI/NLP related libraries

Position location

**Site**:

- Saclay

**Job location**:

- France, Ile-de-France, Essonne (91)

**Location**:

- Palaiseau

**Prepared diploma**:

- Bac+5 - Diplôme École d'ingénieurs

**PhD opportunity**:

- Oui

Requester

**Position start date**:

- 01/04/2024

General information

**Organisation**:
The French Alternative Energies and Atomic Energy Commission (CEA) is a key player in research, development and innovation in four main areas:
- defence and security,
- nuclear energy (fission and fusion),
- technological research for industry,
- fundamental research in the physical sciences and life sciences.

Drawing on its widely acknowledged expertise, and thanks to its 16000 technicians, engineers, researchers and staff, the CEA actively participates in collaborative projects with a large number of academic and industrial partners.
- The CEA is established in ten centers spread throughout France**Reference **:2023-30280**Description de l'unité**:

- Based in Paris-Saclay, CEA List is one of the four institutes under CEA Tech, the technological research branch of CEA. Specializing in intelligent digital systems, it contributes to enhancing the competitiveness of businesses through technology development and transfer.
- The expertise and skills cultivated by the 800 research engineers and technicians at CEA List enable the institute to support annually over 200 French and international companies in applied research projects. These projects are based on four programs and nine technological platforms. Since 2003, 21 start-ups have been created as a result of these efforts. Designated as a "Carnot Institute" since 2006, CEA List is currently recognized as the "Digital Technologies Carnot Institute".
- The Laboratory of Semantic Analysis of Texts and Images (LASTI) is a team comprising around 25 individuals, including researchers, engineers, and doctoral students. They are engaged in research activities focusing on technologies for de



  • Palaiseau, Île-de-France Cea Temps plein

    Le LASTI vous invite à rejoindre son équipe pour prendre part à des projets de recherche et développement dans des domaines comme le manufacturing, la santé.... Dans le domaine de la santé, parmi les objectifs des projets en cours nous pouvons citer : le développement d'une méthodologie outillée pour la conception et le déploiement de « jumeaux...


  • Palaiseau, France INRIA Temps plein

    PhD Position F/M Efficient Space and Garbage Collection for Functional Languages and Lambda Calculi Le descriptif de l’offre ci-dessous est en Anglais Type de contrat : CDD Niveau de diplôme exigé : Bac + 5 ou équivalent Fonction : Doctorant A propos du centre ou de la direction fonctionnelle The Inria...


  • Palaiseau, France CEA Temps plein

    Description de l'offre Le LASTI vous invite à rejoindre son équipe pour prendre part à des projets de recherche et développement dans des domaines comme le manufacturing, la santé….  Dans le domaine de la santé, parmi les objectifs des projets en cours nous pouvons citer : le développement d’une méthodologie outillée pour la conception...


  • Palaiseau, Île-de-France Cea Temps plein

    We are looking for a highly skilled and motivated Research Engineer to join our team to develop cutting-edge technology in the field of Electronic Design Automation (EDA) .In the context of a national project, you will be responsible for developing a simulation environment for chiplet-based SoC architecture, using AI-driven methodologies to improve existing...


  • Palaiseau, France Cea Temps plein

    We are looking for a highly skilled and motivated Research Engineer to join our team to develop cutting-edge technology in the field of Electronic Design Automation (EDA) .In the context of a national project, you will be responsible for developing a simulation environment for chiplet-based SoC architecture, using AI-driven methodologies to improve existing...


  • Palaiseau, Île-de-France Cea Temps plein

    The Embedded and Autonomous Systems Design Laboratory (LSEA) of the CEA LIST works on the development of tools and methodologies to design safe and efficient softwares.The developed systems must provide functions with associated service qualities. The increasing complexity of these systems makes it necessary to approach their design at high levels of...


  • Palaiseau, France CEA Temps plein

    Position description Category Mathematics, information, scientific, software Contract Fixed-term contract Job title Research Engineer in Model-Based Engineering for Security Assurance of Medical Devices H/F Socio-professional category Executive Contract duration (months) 24 Job description Advances in healthcare...


  • Palaiseau, France CEA Temps plein

    Description de l'offre Context:  The Embedded and Autonomous Systems Design Laboratory (LSEA) of the CEA LIST works on the development of tools and methodologies to design safe and efficient softwares.  The systems developed must provide functions with associated service qualities. The increasing complexity of these systems makes it necessary to...


  • Palaiseau, France CEA Temps plein

    Position description Category Mathematics, information, scientific, software Contract Internship Job title Integrate Copilot in the Papyrus Platform to generate Domain Specific Models H/F Subject The objective of this internship is to integrate a Copilot serivce [1] in Papyrus to automate the generation of sysml based models....


  • PALAISEAU, France CEA Temps plein

    We are looking for a highly skilled and motivated Research Engineer to join our team to develop cutting-edge technology in the field of Electronic Design Automation (EDA).In the context of a national project, you will BE responsible for developing a simulation environment for chiplet-based SoC architecture, using AI-driven methodologies to improve existing...


  • Palaiseau, Île-de-France Cea Temps plein

    The objective of this project is to develop an interactive editor and associated library for safety requirement patterns. This requires study of (1) existing requirement patterns and related work [1], (2) the vocabulary and safety-related concepts defined in safety standards, e.g. ISO26262 [2]. Then the safety-related concepts should be mapped to the...


  • Palaiseau, France Cea Temps plein

    Simulation tools are essential for the design and validation of digital circuits. They use different levels of abstraction to facilitate hardware/software co-design and co-validation. Architecture simulators, called Instruction Set Simulators (ISSs) , provide high-level abstraction for fast functional verification and early design space exploration, while...


  • Palaiseau, Île-de-France Cea Temps plein

    Simulation tools are essential for the design and validation of digital circuits. They use different levels of abstraction to facilitate hardware/software co-design and co-validation. Architecture simulators, called Instruction Set Simulators (ISSs) , provide high-level abstraction for fast functional verification and early design space exploration, while...


  • Palaiseau, France CEA Temps plein

    Position description Category Electronics components and equipments Contract Fixed-term contract Job title PhD Position in AI-Assisted Generation of High-Level Models and Simulators for Hardware Design Socio-professional category Non Cadre Contract duration (months) 36 Job description Simulation tools are...


  • PALAISEAU, 91120, Palaiseau, France CEA Temps plein

    We are looking for a highly skilled and motivated Research Engineer to join our team to develop cutting-edge technology in the field of Electronic Design Automation (EDA).In the context of a national project, you will BE responsible for developing a simulation environment for chiplet-based SoC architecture, using AI-driven methodologies to improve existing...


  • Palaiseau, France INRIA Temps plein

    Contexte et atouts du poste The Comete project team at the Inria Saclay Center specializes in security and privacy protection and has sixteen researchers ( Mission confiée The purpose of this position is to work with the COMETE research team to develop methods and software for estimating statistics from data protected with local...


  • PALAISEAU, 91120, Palaiseau, France CEA Temps plein

    Simulation tools are essential for the design and validation of digital circuits. They use different levels of abstraction to facilitate hardware/software co-design and co-validation. Architecture simulators, called Instruction Set Simulators (ISSs), provide high-level abstraction for fast functional verification and early design space exploration, while...


  • PALAISEAU, France CEA Temps plein

    Dans le cadre d'un projet de recherche en association avec d'autres laboratoires du CEA, le LASTI vise à exploiter les technologies du Traitement Automatique des Langues (TAL) pour automatiser l'extraction d'information depuis la littérature scientifique dans le domaine des matériaux.Plus précisément, le domaine d'application du...


  • Palaiseau, France TotalEnergies Temps plein

    Research Innovation&Developpt - PALAISEAU-ROUTE DE SACLAY(FRA) - France - To successfully integrate the electricity and renewable energy businesses, TotalEnergies has created a new entity, OneTech, which brings together all the technical and R&D expertise of the branches (Exploration & Production, Refining & Chemicals, Marketing & Services and Gas,...


  • Palaiseau, France CEA Temps plein

    Description de l'offre Dans le cadre d’un projet de recherche en association avec d’autres laboratoires du CEA, le LASTI vise à exploiter les technologies du Traitement Automatique des Langues (TAL) pour automatiser l’extraction d’information depuis la littérature scientifique dans le domaine des matériaux.  Plus précisément, le domaine...