AI Evaluation Engineer

il y a 3 jours


Greater Paris Metropolitan Region, France Braintrust Temps plein

Job Description

This is a contracting engagement - initially 6 months - with potential for long term engagement.

Location: Paris-based preferred; alternatively Europe remote for strong candidate

s

We are building and evaluating state-of-the-art large language models (LLMs) and are looking for experienced software engineers to join our evaluation and annotation team. This role sits at the intersection o
f real-world software engineering, model evaluation, and applied
AI, and is critical to improving model reliability, reasoning, and code qualit

y.You will design challenging coding tasks, evaluate model outputs against rigorous benchmarks, identify failure modes, and contribute to reinforcement learning and model improvement workflow

s.

This
is
not a junior annotation role. We are looking for practitioners with deep hands-on coding experience who can think like both an engineer and an evaluat

or.

What You'l

  • l DoCreate high-quality coding prompts and reference answers (benchmark-style, e.g. SWE-Bench-like proble
  • ms).Evaluate LLM outputs for code generation, refactoring, debugging, and implementation ta
  • sks.Identify and document model failures, edge cases, and reasoning g
  • aps.Perform head-to-head evaluations between private LLMs (Mistral-based) and leading external mod
  • els.Build or configure coding environments to support evaluation and reinforcement learning (
  • RL).Follow detailed annotation and evaluation guidelines with high consiste

ncy.

What We're Lookin

  • g For5+ years of professional software development experi
  • ence.Strong Python skills (requi
  • red).Knowledge of at least one additional programming language (bo
  • nus).1+ year of coding annotation and/or LLM evaluation experience (part-time OK) for a major frontier AI lab or AI infrastructure com
  • pany.Prior code reviewer experience is a
  • plus.Proven ability to apply structured evaluation criteria and write clear technical feed
  • back.Fluent in English (written and spo
  • ken).Team lead or mentoring experience is a strong

plus.

Why Thi

  • s RoleWork hands-on with cutting-edge
  • LLMs.Apply real-world engineering judgment to model evaluation and improv
  • ement.High-impact, technical work with a focused, senior

team.



  • Greater Paris Metropolitan Region, France Monk AI Temps plein

    The Opportunity:Monk provides visual expertise through Computer Vision and Deep Learning algorithms. Our AI is able to detect and classify damage on any vehicle, with photos taken by a smartphone. The company is already working with international leading players, such as our parent company ACV Auctions (USA), but also CAT logistics (Europe), Getaround...

  • Staff Mechanical Engineer

    il y a 2 semaines


    Greater Paris Metropolitan Region, France Genesis AI Temps plein

    What You'll DoDesign, prototype, and build robotic hardware systems optimized for performance and reliabilityDevelop scalable data collection platforms with robotics engineers to accelerate model learningImprove simulation fidelity through system identification and close collaboration with the simulation teamMaintain and evolve in-house hardware...

  • Software Engineer

    il y a 2 semaines


    Greater Paris Metropolitan Region, France Kleep AI Temps plein

    We're a product-first scale-up on a mission to transform the e-commerce experience with AI.Kleep AI is an AI-powered SaaS helping fashion brands make their e-commerce more personalized, more efficient, and more profitable. Our solutions leverage advanced AI and Computer Vision to improve product discovery, performance, and user experience while supporting a...


  • Greater Paris Metropolitan Region, France Genesis AI Temps plein

    What You'll DoBuild low-latency inference pipelines for on-device deployment, enabling real-time next-token and diffusion-based control loops in roboticsDesign and optimize distributed inference systems on GPU clusters, pushing throughput with large-batch serving and efficient resource utilizationImplement efficient low-level code (CUDA, Triton, custom...


  • Greater Paris Metropolitan Region, France Genesis AI Temps plein

    What You'll DoDevelop and optimize a learning-based robotic manipulation control stackDesign and maintain a teleoperation system with smooth, precise motion and low latencyTrain robotic policies for manipulation and locomotion with reinforcement learning and imitation learningDeploy robotic policies and diagnose latency or bottlenecks in the control...


  • Greater Paris Metropolitan Region, France Cherry Pick Temps plein

    En quelques motsCherry Pick est à la recherche d'un "Data Platform Enablement Engineer" pour un client dans le secteur BancaireDescription Le Contexte : La Tech au service de la RechercheVotre mission est au cœur de notre réacteur technologique. Vous ne construisez pas seulement des outils ; vous donnez le pouvoir à notre communauté de plus de 50 Data...

  • Lead AI Engineer

    il y a 3 semaines


    Greater Paris Metropolitan Region, France Talent Seed Temps plein

    **This role requires relocation to Riyadh/Dubai**You’ll shape the technical direction and bring ambitious AI concepts to life, turning prototypes into scalable, reliable products that power next-generation customer experiences. Working closely with engineers, tech leads, and the CTO, you’ll design systems that combine cutting-edge AI, strong...


  • Greater Paris Metropolitan Region, France Adeptis Group Temps plein

    Senior Sales Engineer – Cloud Networking & Security France | CDI Département : Sales / Avant-VenteAdeptis Group recrute pour l’un de ses clients, un éditeur international de solutions cloud de nouvelle génération, un(e) Sales Engineer pour accompagner sa croissance sur le marché français.Ce rôle s’inscrit dans un contexte d’innovation forte,...


  • Greater Paris Metropolitan Region, France Cherry Pick Temps plein

    En quelques motsCherry Pick est à la recherche d'un "Data Platform Enablement Engineer" pour un client dans le secteur BancaireDescription Le Contexte : La Tech au service de la RechercheVotre mission est au cœur de notre réacteur technologique. Vous ne construisez pas seulement des outils ; vousdonnez le pouvoirà notre communauté de plus de 50 Data...

  • FULL REMOTE

    il y a 1 jour


    Greater Paris Metropolitan Region, France Ubby AI Temps plein

    SeniorFull Stack Engineer - React/FastAPIJOB AVAILABLE FOR FRENCH RESIDENT ONLY -PROFIL RECHERCHÉ - MENTION SPÉCIALENous recherchons prioritairement des profils entrepreneuriaux :Entrepreneurs ayant créé leur propre entreprise ou startupAnciens membres d'équipes de startups early-stage (premiers employés)Développeurs ayant lancé et déployé leur...