Member of Technical Staff – Applied AI

  • Full Time

Apply for job

Drop your file here, or Browse. Max. file size: 1 MB.

  • Direct access to the world’s leading AI Frontier Labs!
  • Rapid career acceleration, promotional opportunities available in the first year
  • Early stage, meaningful equity, be part of the founding engineering team

This company sits in a part of the AI ecosystem that most people haven’t really seen yet.

They’re not building a single product. They’re building the layer that sits behind how AI systems are evaluated, improved, and ultimately trusted in real-world environments.

They’re already working closely with some of the most advanced AI Frontier Labs globally, helping tackle a problem that’s still wide open: how do you reliably evaluate whether an AI system is actually performing well, and how do you generate the data that makes it better over time?

At a surface level, they combine domain expertise with structured evaluation — bringing in highly specialised perspectives to assess outputs in real-world contexts. But that’s not really where the long-term value sits.

The deeper layer is what they’re building internally. They define what “good” looks like, design evaluation frameworks, and create feedback loops that directly improve how models behave. That’s the part that actually shapes how systems get better in production.

Alongside that, there’s a genuine research angle. The work feeds into how modern AI systems are being tested, evaluated, and iterated on in practice — not just in theory.

The reason this role stands out is that you’re not sitting in one slice of that world.

You’re seeing how different teams approach building AI systems, how evaluation and data actually drive performance, and how things move from something that looks good in a demo to something that works reliably in production.

This is a genuinely early hire. You’d be working directly with the founders, helping define how these systems are built and evaluated, while also contributing to product and technical direction. It’s not a role where you come in and pick up tickets — you’re shaping the system itself.

Experience required:

  • Built and maintained agentic AI harnesses in production, including evaluation frameworks, benchmarking pipelines, replay testing, regression testing, and automated quality scoring for LLM-powered systems.
  • Designed verification and validation frameworks capable of programmatically assessing model outputs, implementing reward functions, golden datasets, success metrics, and guardrails for high-stakes financial workflows.
  • Strong software engineering fundamentals, with proven experience building scalable Python applications, APIs, tooling, and backend systems rather than purely experimentation-focused ML or prompt engineering work.
  • Experience translating complex financial workflows into structured AI tasks, working with subject matter experts to define business logic, evaluation criteria, correctness standards, and measurable outcomes.
  • Hands-on experience operationalising LLM systems, including agent orchestration, tool use, synthetic data generation, observability, failure analysis, and iterative improvement through robust evaluation harnesses and feedback loops.

SIMILAR JOBS

CONTACT US

Please contact for any additional information or for updates.

SUBMIT A VACANCY

Send us the details of your job opening and one of our consultants will be in touch to discuss suitable candidates.

UPLOAD YOUR CV

Send us your details and one of our consultants will be in touch to discuss suitable roles.