data
Posted Apr 23Senior Data Scientist - Big Data R&D, Identity Graph & KYC
at Socure
United StatesHybrid
Responsibilities
- - Architect and optimize graph-based identity representations (identity graph structure, linkage rules, clustering) to improve match rates, reduce false positives/negatives, and support downstream fraud and KYC models.
- - Lead A/B tests and offline/online experimentation for new models, features, and data sources; define success metrics, design experiments, and ensure rigorous validation before rollout.
Requirements
- As a Senior Data Scientist I, you will lead the design and deployment of advanced ML and graph algorithms on large-scale PII datasets, own end‑to‑end projects from problem definition through production validation, and serve as a key technical partner to Product, Engineering, and Client‑facing teams.
- WHAT YOU'LL DO - Own the design, development, and evaluation of machine learning, statistical, and graph-based algorithms for entity-resolution, identity trust scoring, and anomaly detection on massive datasets.
- WHAT YOU BRING - Master’s degree with 3+ years of relevant industry experience, or Ph.D. with 1+ years of
- experience in applied ML / data science roles; background in Computer Science, Statistics, Mathematics, or related quantitative fields preferred. - Strong proficiency in Python (preferred) or Scala, including
- experience with ML libraries such as scikit‑learn, XGBoost, TensorFlow or PyTorch. - Extensive
- experience with Spark or PySpark and distributed data systems (e.g., AWS EMR, Databricks) working on very large, messy datasets. - Deep understanding of supervised and unsupervised learning, feature engineering, model evaluation, and experiment design (A/B testing, holdout strategies, stratification). -
- Experience developing production-quality data pipelines and automated workflows using Airflow or similar orchestration tools. - Practical familiarity with graph databases and/or graph frameworks (Neo4j, AWS Neptune, GraphFrames, DGL, PyTorch Geometric) and graph algorithms for clustering, link prediction, and community detection is strongly preferred. - Solid SQL skills and
- experience working with large-scale analytical data stores. -
- Experience in at least one of: identity verification, fraud detection, credit risk, or adjacent high‑stakes domains is a plus. - Demonstrated ability to lead medium‑to‑large projects end‑to‑end, make sound trade‑off decisions under ambiguity, and influence cross‑functional stakeholders with data and clear reasoning.