Prompt Engineer
Coaxes good answers out of large language models, professionally.
Decides whether the model actually got better.
Eval engineers design the systems that score model outputs — golden datasets, automated graders, human review pipelines. Without them, AI teams can't tell whether yesterday's prompt change made things better or worse. A relatively new but increasingly critical role.
Building a new eval set, running models against it, designing a rubric for human reviewers, and presenting eval results to the team.
You like rigor, you don't trust vibes-based decisions, and you find building benchmarks satisfying.
We'll send you to a fresh search of open AI Evaluation Engineer roles.
browse open AI Evaluation Engineer jobs ↗