Research, Post-Training

at Cognition

San Francisco, United StatesOn-site

Responsibilities

Measure how choices compound across evals and production performance, not just isolated benchmarks. - Evaluation Design and Integrity: Build evals that actually capture what matters.

Requirements

WHO WE ARE We are an applied AI lab building end-to-end software agents.
We're the team behind Devin, the first AI software engineer, and Windsurf, an AI-native IDE.
These products represent our vision for AI that doesn't just assist engineers, but works alongside them as a genuine teammate.
Our team is small and talent-dense: world-class competitive programmers, former founders, and researchers from the frontier of AI, including Scale AI, Palantir, Cursor, Google DeepMind, and others.
You'll be responsible for making numbers go up and making sure the numbers mean something. - Deep Understanding: When training produces results that don't make sense, you dig until you understand why.
EXCEPTIONAL CANDIDATES HAVE DEMONSTRATED - A track record of advancing ML systems through post-training, alignment, or related methods: RLHF, RLAIF, preference modeling, reward learning, or equivalent - Strong fundamentals in probability, statistics, and ML theory.
The ability to look at experimental data and distinguish real effects from noise and bugs - Evidence of original contributions: publications at top venues, open-source impact, or equivalent industry results -
Experience with large-scale distributed training and the debugging that comes with it - Systems-level thinking: not just model optimization, but understanding how training pipelines, data, and evaluation interact - Comfort with ambiguity and fast-moving research environments where priorities shift quickly - We care more about demonstrated capability than credentials.
A PhD is one signal among many.
RESOURCES & ENVIRONMENT - Small, highly selective team where research and product move together; prototypes reach real deployment quickly - Compute is not a constraint: large allocations with training jobs routinely running across thousands of GPUs from day one - The environment rewards speed, autonomy, and technical depth with minimal process overhead; this is one of the most competitive and fast-moving problems in AI - Everything needed to operate at frontier scale from day one.

Research, Post-Training

Responsibilities

Requirements

Browse by category

Browse by skills

Browse by role

Additional details

Browse by location