data
Posted Dec 3, 2025Senior Data Scientist, AI
at Poshenergy
New York City, United StatesOn-site
Responsibilities
- Define what good looks like, build the rubrics and benchmarks, and own the feedback loops that drive iteration.
- Build ETL/ELT pipelines that transform raw behavioral, transactional, and interaction data into clean evaluation inputs. - Preparing High-Quality Data for AI Models: Own the data that feeds our evaluation pipeline, from ground truth datasets and labeled examples to behavioral signals and the semantic layer.
- Ensure every input is reliable, well-structured, and built to last. - Instrumenting Agent Tests, Experiments, and Success Metrics: Build the testing infrastructure to evaluate agent performance across accuracy, relevance, and user satisfaction.
- Run structured experiments and pre/post analyses to assess the impact of model and product changes, and build dashboards that keep the team aligned on performance trends and regressions before they become problems. - Collaborating with Product and Engineering on Instrumentation: Work closely with Engineering to ensure accurate logging of agent interactions and user signals.
Requirements
- ABOUT THE ROLE We are looking for an experienced Senior Data Scientist to own the evaluation framework for our AI agent, the data that feeds it, and the success analysis, testing, and metrics that determine how well it's working.
- As one of the early data hires at Posh, you'll shape the technical direction of our AI quality strategy and set the standards for how agent performance is defined, measured, and improved over time.
- Your work will directly inform how we iterate on our AI agent and how we know when it's ready to ship.
- This role offers a high-growth opportunity as we expand our AI and data capabilities.
- AT A HIGH LEVEL, YOU’LL BE IN CHARGE OF: - Building and Owning the AI Agent Evaluation Framework: Design and maintain the systems and methodologies we use to measure AI agent quality.
- experience in data science or analytics engineering.
- Demonstrates a strong ability to design, build, and optimize scalable data systems. - Expert in SQL and Python: Demonstrates strong proficiency in SQL and Python, with deep
- experience cleaning data, engineering features, and building efficient, production-ready modeling pipelines. - Strong Ability to Analyze and Evaluate Models or AI Systems: Skilled in designing experiments, interpreting model performance, and communicating insights clearly to both technical and non-technical stakeholders. -