data
Posted 2 weeks agoLead Data Scientist
at Middesk
San Francisco, United StatesHybrid
Responsibilities
- - Design and implement knowledge graph solutions: Leveraging LLMs for graph construction, querying, and retrieval to enhance entity resolution and business identity use cases.
Requirements
- Middesk came out of Y Combinator, is backed by Sequoia Capital and Accel Partners, and was recently named to Forbes Fintech 50 List.
- ABOUT THE ROLE: We are actively building AI-driven applications that streamline customer workflows, focusing on business onboarding.
- With our proprietary identity data assets and deep domain expertise, we are uniquely positioned to expand into a broader set of AI-powered solutions that drive long-term growth.
- We’re looking for a hands-on applied ML expert to help build the technical foundation for these efforts.
- This is a highly technical, hands-on role with wide influence on how we design, build, and scale ML at Middesk.
- WHAT YOU'LL DO: - Build risk & fraud ML applications: Deliver production ML models in fraud, trust & safety, KYB, and compliance domains, with measurable impact on customer workflows.
- - Innovate in feature engineering & labeling: Use graph-based techniques, weak supervision, LLMs, and AI agents to improve signal extraction and automate labeling process.
- - Establish ML infrastructure foundations: Partner with the ML infra team to design feature services, model training pipeline, model serving standards, and orchestration to scale multiple ML use cases.
- experience in one or more of the following areas: - Building Production ML for risk, fraud, credit, or trust & safety: Track record of shipping external-facing ML applications in one or more of these domains. - Knowledge graph applications: Hands-on
- Experience disambiguating and linking records across noisy, incomplete, or conflicting data sources—particularly in KYB, KYC, AML, or identity verification contexts where the same real-world entity may appear under different names, addresses, or tax IDs. - Expertise in classification with real-world ML challenges, for example: imbalanced labels, sparse signals, cold start, and production version management. - Hands-on ML infrastructure
- experience: feature stores, model management, ML training/serving pipelines. - Comfort as a senior IC: setting technical direction, mentoring peers, and establishing best practices.