data
Posted Aug 22, 2025AI Engineer, AIOps & Infrastructure
at Eloquent AI
San Francisco, United StatesOn-site
You are nearing today's limit. Upgrade for unlimited access.
Responsibilities
- - Automate LLMOps and MLOps workflows, ensuring seamless model training, fine-tuning, deployment, and monitoring.
Requirements
- MEET ELOQUENT AI At Eloquent AI, we’re building the next generation of AI Operators—multimodal, autonomous systems that execute complex workflows across fragmented tools with human-level precision.
- Headquartered in San Francisco with a global footprint, Eloquent AI is a fast-growing company backed by top-tier investors.
- Join us to work alongside world-class talent in AI, engineering, and product as we redefine the future of financial services.
- Your Role As a Senior Software Engineer, AIOps & Infrastructure at Eloquent AI, you will be responsible for designing, building, and optimizing scalable, high-performance AI infrastructure to support the deployment and operation of our enterprise AI agents.
- Your work will enable machine learning engineers and AI teams to train, fine-tune, and deploy LLMs efficiently while ensuring stability, observability, and performance at scale.
- You’ll play a key role in automating LLMOps and MLOps workflows, optimizing GPU workloads, and ensuring resilient, production-ready AI systems.
- This role requires deep expertise in cloud infrastructure, Kubernetes, and LLM and ML deployment pipelines.
- If you’re passionate about scalable AI systems and optimizing ML models for real-world applications, this is your opportunity to work at the frontier of LLMOps.
- You will: - Design and build scalable ML infrastructure for deploying and maintaining AI agents in production.
- - Optimize GPU and cloud compute workloads, improving efficiency and reducing latency for large-scale AI systems.
- - Develop Kubernetes-based solutions, including custom operators for ML model orchestration.
- - Improve system observability and reliability, implementing logging, monitoring, and performance tracking for AI models.
- - Work with ML and engineering teams to streamline data pipelines, model serving, and inference optimizations.
- - Ensure security, compliance, and reliability in AI infrastructure, maintaining high availability and scalability.
- - Participate in on-call rotations, ensuring 24/7 reliability of critical AI systems.
- experience in software engineering, MLOps, or infrastructure development. - Strong expertise in Kubernetes and
- experience managing containerized ML workloads. - Deep understanding of cloud platforms (AWS, GCP, Azure) and distributed computing. - Proficiency in Python, with
- experience developing services for ML/AI applications. -
- Experience with ML model deployment pipelines, including model serving, inference optimization, and monitoring. - Familiarity with vector databases, retrieval systems, and RAG architectures is a plus. - Strong problem-solving skills and the ability to work in a high-scale, production-focused AI environment.
- experience with LLMOps, fine-tuning, and deploying large-scale AI models. - You’ve worked with GPU workload optimization, ML model parallelization, or distributed training strategies. - You have
- experience building infrastructure for AI-powered applications. - You’ve contributed to open-source MLOps tools or AI infrastructure projects. - You thrive in a fast-moving startup environment and enjoy solving complex technical challenges.
Experience
- REQUIREMENTS - 5+ years of
Benefits
- Bonus Points If… - You have
Additional details
- Our technology goes far beyond chat: it sees, reads, clicks, types, and makes decisions—transforming how work gets done in regulated, high-stakes environments.
- We’re already powering some of the world’s leading financial institutions and insurers, fundamentally changing how millions of people manage their finances every day.
- From automating compliance reviews to handling customer operations, our Operators are quietly replacing repetitive, manual tasks with intelligent, end-to-end execution.