Emplois actuels liés à Evaluation Scenario Writer - Paris, Île-de-France - Mindrift

finance operations

il y a 1 semaine

Paris, Île-de-France STATION F Temps plein

About repense la cybersécurité pour la rendre encore pluspertinente,efficaceetaccessible. L'un des principaux défis que nous relevons est d'analyser et de comprendre en permanence lesmenaces émergentesafin de définir des stratégies appropriées et d'avoir la capacité de les exécuter à grande échelle. En combinant latechnologieet uneéquipe...
Senior Manager

il y a 1 semaine

Paris, Île-de-France NG Audit Temps plein

Présentation du cabinetNG AUDIT est un cabinet d'audit et de conseil en forte croissance, reconnu pour son expertise auprès des PME, ETI et groupes familiaux. Nous accompagnons nos clients dans leurs projets stratégiques : audit légal et contractuel, conseil en organisation, conseil stratégique incluant valorisation, transmission et opérations TS....
Ingénieur Prévention Spécialisé Secteur Pétrole

il y a 1 semaine

Paris, Île-de-France Marsh McLennan Temps plein

Company:MarshDescription:Marsh, leader mondial en gestion des risques et courtage d'assurance, recherche un(e) Ingénieur(e) Prévention spécialisé(e) avec une xpertise confrmée dans le secteur Pétrole & Gaz. Basé(e) à Paris ou en province proche d'une grande ville bien desservie, ce poste en CDI requiert un minimum de 5 ans d'expérience. Le rôle...
Data Scientist Intern, AI Core Team

il y a 1 semaine

Paris, Île-de-France Back Market Temps plein

Hi, we're Back Market.We're here to help make tech reliable, affordable, and better than new. We're a global marketplace for refurbished devices, helping lower our collective environmental impact by providing trustworthy, affordable tech with 92% less carbon emissions than new.Yep, you read that right. Turns out refurbished tech is way better for the planet...
Architecte Solutions IA F/H

il y a 1 semaine

Paris, Île-de-France Onepoint Temps plein

du poste et MissionsContribuez aux grandes transformations des entreprises et des acteurs publics en alliant innovation technologique et expertise métier, au service de nos clients et de la société pour les faire avancer durablement.Au-delà de la RSE, nous avons développé notre propre approche, RESET, qui englobe l'ensemble de nos engagements en...
Concepteur/conceptrice pédagogique e-learning et média

il y a 1 semaine

Paris, Île-de-France ASSAS EXECUTIVE EDUCATION Temps plein

Informations générales :Titre du poste : Concepteur/conceptrice e-learning et médiaType de contrat : CDILieu : Université de Paris-Panthéon-Assas, Paris, France, site Notre Dame des ChampsRattachement hiérarchique : Directrice pédagogiqueContexte de l'entreprisePour accompagner les défis du monde contemporain, l'Université de Paris Panthéon-Assas,...
Business Analyst

il y a 1 semaine

Paris, Île-de-France Crédit Agricole Technologies et Services Temps plein

Tu cherches à trouver du sens à ton travail et à avoir un impact positif sur le quotidien de millions de clients bancaires en France ?Tu souhaites évoluer dans un environnement agile, collectif et technologique en évolution permanente ?Alors notre entreprise,Crédit Agricole Technologies et Services, pourrait te correspondre Que faisons-nous ?Nous...
Founder Associate Program

il y a 2 semaines

Paris, Île-de-France Fairmat Temps plein

Chez FAIRMAT, notre mission est de construire un écosystème circulaire qui préserve les matériaux avancés loin des décharges pour les remettre au cœur de la conception des produits. Nous sommes une deeptech pionnière dans le recyclage de la fibre de carbone, offrant une solution complète reposant sur des technologies disruptives et des innovations...
Azure DataOps

il y a 1 semaine

Paris, Île-de-France Talan Temps plein

Description de l'entreprise Talan est un groupe international de conseil et d'expertises technologiques qui accélère la transformation de ses clients par les leviers de l'innovation, la technologie et la dataDepuis plus de 20 ans, Talan conseille et accompagne les entreprises et les institutions publiques dans la mise en œuvre de leurs projets de...
Ingénieur Planning Infrastructures civiles

il y a 1 semaine

Paris, Île-de-France ALDEBARAN Group Temps plein

Ingénieur Planning Infrastructures civiles– ref. JOB-1514RecapitulatifNous recherchons, pour le compte de l'un de nos clients basé enÎle-de-France, unIngénieur Planningintervenant sur desprojets de Génie Civil d'envergure.Le poste s'inscrit dans le cadre de grands projets industriels et/ou d'infrastructures, avec pour mission principale...

Evaluation Scenario Writer

il y a 3 semaines

Paris, Île-de-France Mindrift Temps plein

This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English.

At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI.

What we do

The Mindrift platform connects specialists with AI projects from major tech innovators. Our mission is to unlock the potential of Generative AI by tapping into real-world expertise from across the globe.

About the Role

We're looking for someone who can design realistic and structured evaluation scenarios for LLM-based agents. You'll create test cases that simulate human-performed tasks and define gold-standard behavior to compare agent actions against. You'll work to ensure each scenario is clearly defined, well-scored, and easy to execute and reuse. You'll need a sharp analytical mindset, attention to detail, and an interest in how AI agents make decisions.

Although every project is unique, you might typically:

Create structured test cases that simulate complex human workflows.
Define gold-standard behavior and scoring logic to evaluate agent actions.
Analyze agent logs, failure modes, and decision paths.
Work with code repositories and test frameworks to validate your scenarios.
Iterate on prompts, instructions, and test cases to improve clarity and difficulty.
Ensure that scenarios are production-ready, easy to run, and reusable.

How to get started

Simply apply to this post, qualify, and get the chance to contribute to projects aligned with your skills, on your own schedule. From creating training prompts to refining model responses, you'll help shape the future of AI while ensuring technology benefits everyone.

Bachelor's and/or Master's Degree in Computer Science, Software Engineering, Data Science / Data Analytics, Artificial Intelligence / Machine Learning, Computational Linguistics / Natural Language Processing (NLP), Information Systems or other related fields.
Background in QA, software testing, data analysis, or NLP annotation.
Good understanding of test design principles (e.g., reproducibility, coverage, edge cases).
Strong written communication skills in English.
Comfortable with structured formats like JSON/YAML for scenario description.
Can define expected agent behaviors (gold paths) and scoring logic.
Basic experience with Python and JS.
Curious and open to working with AI-generated content, agent logs, and prompt-based behavior.

Nice to Have

Experience in writing manual or automated test cases.
Familiarity with LLM capabilities and typical failure modes.
Understanding of scoring metrics (precision, recall, coverage, reward functions).

Contribute on your own schedule, from anywhere in the world. This opportunity allows you to:

Get paid for your expertise, with rates that can go up to $50/hour depending on your skills, experience, and project needs.
Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.
Participate in an advanced AI project and gain valuable experience to enhance your portfolio.
Influence how future AI models understand and communicate in your field of expertise.

Amériques

Europe

Asie / Océanie

Afrique

Emplois actuels liés à Evaluation Scenario Writer - Paris, Île-de-France - Mindrift

Evaluation Scenario Writer