data
Posted Dec 23, 2025AI Engineer, Product
at Mistral AI
Paris, FranceOn-site
Responsibilities
- Design and run evaluations for your product area: reference tests, heuristics, model-graded checks tailored to search relevance, chat quality, document understanding, or audio performance.
- Define and track metrics that matter: task success, helpfulness, hallucination proxies, safety flags, latency, cost.
- Own prompt and orchestration design: write, test, and iterate on prompts and system prompts as a core part of your work.
- Run A/B tests on prompts, models, and configurations; analyze results; make rollout or rollback decisions from data.
- Set up observability for LLM calls: structured logging, tracing, dashboards, alerts.
- Operate model releases: canary and shadow traffic, sign-offs, SLO-based rollback criteria, regression detection.
- Improve core behaviors in your product area, whether that's memory policies, intent classification, routing, tool-call reliability, or retrieval quality.
- Create templates and documentation so other teams can author evals and ship safely.
- Release operations: canary/shadowing, automated rollbacks, experiment platforms.
Requirements
- About Mistral At Mistral AI, we believe in the power of AI to simplify tasks, save time, and enhance learning and creativity.
- We democratize AI through high-performance, optimized, open-source and cutting-edge models, products and solutions.
- Our comprehensive AI platform is designed to meet enterprise needs, whether on-premises or in cloud environments.
- Our offerings include le Chat, the AI assistant for life and work.
- We are a dynamic, collaborative team passionate about AI and its potential to transform society.
- Join us to be part of a pioneering company shaping the future of AI.
- See more about our culture on https://mistral.ai/careers.
- Role summary Embedded directly in a product team as search, chat, documents, or audio, you'll improve AI-powered features through rigorous evaluation, prompt and orchestration design, and rapid experimentation.
- You'll own your domain's AI quality end-to-end: define what "good" looks like, measure it, run experiments, and ship what works.
- 3-4 years of experience; backgrounds that fit well include ML engineers moving closer to product, or software engineers with real AI/ML production experience.