other
Posted May 1Research, Post-Training
at Cognition
San Francisco, United StatesOn-site
Responsibilities
- Measure how choices compound across evals and production performance, not just isolated benchmarks. - Evaluation Design and Integrity: Build evals that actually capture what matters.
Requirements
- WHO WE ARE We are an applied AI lab building end-to-end software agents.
- We're the team behind Devin, the first AI software engineer, and Windsurf, an AI-native IDE.
- These products represent our vision for AI that doesn't just assist engineers, but works alongside them as a genuine teammate.
- Our team is small and talent-dense: world-class competitive programmers, former founders, and researchers from the frontier of AI, including Scale AI, Palantir, Cursor, Google DeepMind, and others.
- You'll be responsible for making numbers go up and making sure the numbers mean something. - Deep Understanding: When training produces results that don't make sense, you dig until you understand why.
- EXCEPTIONAL CANDIDATES HAVE DEMONSTRATED - A track record of advancing ML systems through post-training, alignment, or related methods: RLHF, RLAIF, preference modeling, reward learning, or equivalent - Strong fundamentals in probability, statistics, and ML theory.
- The ability to look at experimental data and distinguish real effects from noise and bugs - Evidence of original contributions: publications at top venues, open-source impact, or equivalent industry results -
- Experience with large-scale distributed training and the debugging that comes with it - Systems-level thinking: not just model optimization, but understanding how training pipelines, data, and evaluation interact - Comfort with ambiguity and fast-moving research environments where priorities shift quickly - We care more about demonstrated capability than credentials.
- A PhD is one signal among many.
- RESOURCES & ENVIRONMENT - Small, highly selective team where research and product move together; prototypes reach real deployment quickly - Compute is not a constraint: large allocations with training jobs routinely running across thousands of GPUs from day one - The environment rewards speed, autonomy, and technical depth with minimal process overhead; this is one of the most competitive and fast-moving problems in AI - Everything needed to operate at frontier scale from day one.