Conversational AI Evaluator

il y a 1 jour


SaintMartinLacaussade, Nouvelle-Aquitaine, France Mercor Temps plein

About The Job
Mercor
connects elite creative and technical talent with leading AI research labs. Headquartered in San Francisco, our investors include
Benchmark
,
General Catalyst
,
Peter Thiel
,
Adam D'Angelo
,
Larry Summers
, and
Jack Dorsey
.

Position:
AI Model Evaluator

Type:
Full-time or Part-time Contract Work
Compensation:
$36/hour
Location:
Europe, USA
Role Responsibilities

  • Evaluate LLM-generated responses on their ability to effectively answer user queries.
  • Conduct fact-checking using trusted public sources and external tools.
  • Generate high-quality human evaluation data by annotating response strengths, areas for improvement, and factual inaccuracies.
  • Assess reasoning quality, clarity, tone, and completeness of responses.
  • Ensure model responses align with expected conversational behavior and system guidelines.
  • Apply consistent annotations by following clear taxonomies, benchmarks, and detailed evaluation guidelines.

Qualifications
Must-Have

  • Bachelor's degree
  • Native speaker or ILR 5/primary fluency (C2 on the CEFR scale) in Italian
  • Significant experience using large language models (LLMs)
  • Excellent writing skills
  • Strong attention to detail
  • Adaptable and comfortable moving across topics, domains, and customer requirements
  • Background or experience in domains requiring structured analytical thinking (e.g., research, policy, analytics, linguistics, engineering)
  • Excellent college-level mathematics skills

Preferred

  • Prior experience with RLHF, model evaluation, or data annotation work
  • Experience writing or editing high-quality written content
  • Experience comparing multiple outputs and making fine-grained qualitative judgments
  • Familiarity with evaluation rubrics, benchmarks, or quality scoring systems

Application Process (Takes 20–30 mins to complete)

  • Upload resume
  • AI interview based on your resume
  • Submit form

Resources & Support

  • For details about the interview process and platform information, please check:
  • For any help or support, reach out to:

PS: Our team reviews applications daily. Please complete your AI interview and application steps to be considered for this opportunity.
,


  • Bilingual LLM Evaluator

    il y a 24 heures


    Saint-Martin-Lacaussade, Nouvelle-Aquitaine, France Mercor Temps plein

    About The JobMercorconnects elite creative and technical talent with leading AI research labs. Headquartered in San Francisco, our investors includeBenchmark,General Catalyst,Peter Thiel,Adam D'Angelo,Larry Summers, andJack Dorsey.Position:AI Model EvaluatorType:Full-time or Part-time Contract WorkCompensation:$40/hourLocation:Geography restricted to Europe,...


  • Saint-Martin-Lacaussade, Nouvelle-Aquitaine, France Welocalize Temps plein

    About the RoleWe are looking for experienced annotators with deep cultural, linguistic, and audio catalog expertise to help improve a major music platform's personalized experiences across multiple languages and regions. In this role, you will create high-quality ground truth and evaluate the quality of both training data and model-generated output...


  • Saint-Martin-Lacaussade, Nouvelle-Aquitaine, France Mercor Temps plein

    About The JobMercorconnects elite creative and technical talent with leading AI research labs. Headquartered in San Francisco, our investors includeBenchmark,General Catalyst,Peter Thiel,Adam D'Angelo,Larry Summers, andJack Dorsey.Position:AI Model EvaluatorType:Full-time or Part-time Contract WorkCompensation:$36/hourLocation:Geography restricted to Europe,...

  • Legal Expert

    il y a 2 jours


    Lacaussade, France Mercor Temps plein

    Legal Expert 1 day ago Be among the first 25 applicants Get AI-powered advice on this job and more exclusive features. This range is provided by Mercor. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Base pay range $105.00/hr - $105.00/hr Direct message the job poster from Mercor About The Job Mercor...

  • Legal Expert

    il y a 2 jours


    Lacaussade, France Mercor Temps plein

    Legal Expert - AI Evaluation 1 day ago Be among the first 25 applicants This range is provided by Mercor. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Base pay range $105.00/hr - $105.00/hr Direct message the job poster from Mercor About The Job Mercor connects elite creative and technical talent...


  • Lacaussade, France Mercor Temps plein

    Lawyer | Upto $105/hr Hourly 1 day ago Be among the first 25 applicants This range is provided by Mercor. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Base pay range $105.00/hr - $105.00/hr About The Job Mercor connects elite creative and technical talent with leading AI research labs. Headquartered...

  • Senior Lawyer

    il y a 2 jours


    Lacaussade, France Mercor Temps plein

    1 day ago Be among the first 25 applicants This range is provided by Mercor. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Base pay range $105.00/hr - $105.00/hr Direct message the job poster from Mercor About The Job Mercor connects elite creative and technical talent with leading AI research labs....

  • Corporate Lawyer

    il y a 2 jours


    Lacaussade, France Mercor Temps plein

    1 day ago Be among the first 25 applicants This range is provided by Mercor. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Base pay range $105.00/hr - $105.00/hr Direct message the job poster from Mercor About The Job Mercor connects elite creative and technical talent with leading AI research labs....


  • Saint-Rome-de-Tarn, France Sigma AI Temps plein

    German Linguistic Projects (Remote) Sigma AI, Saint-Rome-de-Tarn, Occitanie, France (Remote) What Will You Do? Categorization – Annotation – Correction – Transcription – Evaluation – Conversational interactions – Voice recording – Content creation – Localization – Validation of audio, video, images, sentences, or words. All tasks are remote...