← all jobs
AI

AI Evaluation Engineer

Decides whether the model actually got better.

the gist

Eval engineers design the systems that score model outputs — golden datasets, automated graders, human review pipelines. Without them, AI teams can't tell whether yesterday's prompt change made things better or worse. A relatively new but increasingly critical role.

a typical day

Building a new eval set, running models against it, designing a rubric for human reviewers, and presenting eval results to the team.

you'd like this if…

You like rigor, you don't trust vibes-based decisions, and you find building benchmarks satisfying.

curious enough to apply?

We'll send you to a fresh search of open AI Evaluation Engineer roles.

browse open AI Evaluation Engineer jobs ↗

More from AI