Backdoor Attack Scalability and Defense Evaluation in Large Language Models H/F

il y a 3 jours


Saclay, Île-de-France CEA Temps plein

Position description
Category
Mathematics, information, scientific, software

Contract
Internship

Job title
Backdoor Attack Scalability and Defense Evaluation in Large Language Models H/F

Subject
Large Language Models (LLMs) deployed in safety-critical domains are increasingly vulnerable to backdoor and data poisoning attacks. Recent studies show that even a small number of poisoned samples can compromise models at massive scales, highlighting urgent security challenges. This internship focuses on empirically testing and advancing poisoning attacks and defenses in LLMs through systematic experimentation and adversarial evaluation. Tasks include implementing state-of-the-art attack methods (e.g., jailbreaks, denial-of-service, data extraction), evaluating defenses, analyzing attack scalability across model sizes, and establishing standardized evaluation metrics such as Attack Success Rate and Clean Accuracy to support reproducible benchmarking and robust model defense strategies.

Contract duration (months)
6

Job Description
Context:
Large Language Models (LLMs) deployed in safety-critical domains face significant threats from backdoor attacks. Recent empirical evidence contradicts previous assumptions about attack scalability: poisoning attacks remain effective regardless of model or dataset size, requiring as few as 250 poisoned documents to compromise models from up to 13B parameters. This suggests data poisoning becomes easier, not harder, as systems scale.

Backdoors persist through post-training alignment techniques like Supervised Fine-Tuning and Reinforcement Learning from Human Feedback, compromising current defenses. However, persistence depends critically on poisoning timing and backdoor characteristics. Current verification methods are computationally prohibitive—Proof-of-Learning requires full model retraining and complete training transcript access. While step-wise verification shows promise for runtime detection, scalability to production models and resilience against adaptive adversaries remain unresolved.

Existing defenses focus on post-training detection rather than preventing attack success during training. Advancing data poisoning scaling dynamics—understanding how attack success correlates with dataset composition, poisoning density, and model capacity—is essential for developing evidence-based threat models and defense strategies.

Objective:
This internship aims to empirically test and advance data poisoning attacks and defenses for LLMs through systematic experimentation and adversarial evaluation. Key responsibilities include: implementing state-of-the-art attack methods across multiple vectors (jailbreaking, targeted refusal, denial-of-service, information extraction); testing attacks on diverse model architectures and scales; establishing standardized evaluation protocols with metrics such as Attack Success Rate and Clean Accuracy; evaluating existing defenses, particularly step-wise verification; and developing reproducible test suites for objective defense benchmarking.

Applicant Profile
Requirements:

  • Background in computer science or a related field, with a focus on machine learning security, or adversarial machine learning.
  • Strong programming skills in languages commonly used for machine learning tasks (e.g., Python, C++).
  • Experience with machine learning systems, model training, or adversarial robustness is a plus.
  • Ability to work independently and collaborate in a research-driven environment.
  • Comfortable working in English, essential for documentation purposes.

Position location
Site
Saclay

Job location
France, Ile-de-France, Essonne (91)

Location
Gif-sur-Yvette

Candidate criteria
Languages
English (Fluent)

Prepared diploma
Bac+5 - Master 2

Recommended training
Computer Science

PhD opportunity
Oui

Requester
Position start date
27/10/2025



  • Saclay, Île-de-France CEA Temps plein

    Position descriptionCategoryInformation systemContractInternshipJob titleDesign of Fault Injection Models Within Pre-silicon Security Methodologies H/FSubjectFault-injection attacks exploit hardware perturbations to drive a processor into unexpected states or execution paths, which can leak secrets or enable privilege escalation. Fault-injection attacks are...


  • Saclay, Île-de-France CEA Temps plein

    Position descriptionCategoryMathematics, information, scientific, softwareContractInternshipJob titleFormal methodology for the exploration and the evaluation of complex critical SW architecture M/FSubjectThe internship aims to implement and improve the formalization and implementation of an iterative methodology for critical embedded software architectures...


  • Saclay, Île-de-France CEA Temps plein

    General information Organisation The French Alternative Energies and Atomic Energy Commission (CEA) is a key player in research, development and innovation in four main areas :• defence and security,• nuclear energy (fission and fusion),• technological research for industry,• fundamental research in the physical sciences and life sciences.Drawing...

  • Internship position H/F

    il y a 2 semaines


    Saclay, Île-de-France CEA Temps plein

    Although formal verification is essential for ensuring the safety and security of software, it remains difficult to deploy and use effectively by non-experts due to its steep learning curve. Recent advances in large language models (LLMs) have demonstrated remarkable abilities in code understanding, synthesis, and reasoning. These advances open promising...


  • Saclay, Île-de-France CEA Temps plein

    General information Organisation The French Alternative Energies and Atomic Energy Commission (CEA) is a key player in research, development and innovation in four main areas :• defence and security,• nuclear energy (fission and fusion),• technological research for industry,• fundamental research in the physical sciences and life sciences.Drawing...


  • Saclay, Île-de-France CEA Temps plein

    This post-doctoral position is part of a collaboration between LIAD (Laboratory of Artificial Intelligence and Data Sciences, CEA Saclay), the NRX Nanostructures and X-Rays Team at CEA Grenoble, the University of Lorraine, CentraleSupélec, and the European Synchrotron (ESRF). It is jointly supervised by:Aurore Lomet, Research Engineer in AI at LIAD, CEA...


  • Saclay, Île-de-France CEA Temps plein

    See illustrations on:During this internship, you will investigate state of the art techniques for detection of objects in images when the CAD model of the object is available (CAD-conditioned detection). Indeed, most state of the art approaches start with an object-agnostic segmentation (using Meta's SAM for instance) followed by template matching. But this...


  • Saclay, Île-de-France CEA Temps plein

    Informations générales Entité de rattachement Le CEA est un acteur majeur de la recherche, au service des citoyens, de l'économie et de l'Etat.Il apporte des solutions concrètes à leurs besoins dans quatre domaines principaux : transition énergétique, transition numérique, technologies pour la médecine du futur, défense et sécurité sur un...


  • Saclay, Île-de-France CEA Temps plein

    Informations générales Entité de rattachement Le CEA est un acteur majeur de la recherche, au service des citoyens, de l'économie et de l'Etat.Il apporte des solutions concrètes à leurs besoins dans quatre domaines principaux : transition énergétique, transition numérique, technologies pour la médecine du futur, défense et sécurité sur un...

  • development of a NET

    il y a 6 jours


    Saclay, Île-de-France CEA Temps plein

    Informations générales Entité de rattachement Le CEA est un acteur majeur de la recherche, au service des citoyens, de l'économie et de l'Etat.Il apporte des solutions concrètes à leurs besoins dans quatre domaines principaux : transition énergétique, transition numérique, technologies pour la médecine du futur, défense et sécurité sur un...