Senior Machine Learning Ops Engineering

Bangalore, India

97 days ago

ResMed

Global Technology Solutions (GTS) at ResMed is a division dedicated to creating innovative, scalable, and secure platforms and services for patients, providers, and people across ResMed. The primary goal of GTS is to accelerate well-being and growth by transforming the core, enabling patient, people, and partner outcomes, and building future-ready operations.

The strategy of GTS focuses on aligning goals and promoting collaboration across all organizational areas. This includes fostering shared ownership, developing flexible platforms that can easily scale to meet global demands, and implementing global standards for key processes to ensure efficiency and consistency.

Sr. Machine Learning Ops Platform Engineer

As a Sr. Machine Learning Ops Platform Engineer, you will be responsible for building automation and leading-edge architecture around Data and AI/ML engineering on the ResMed AI platform. Specifically, you will code and help architect a production-grade, scalable platform to be used by dozens of data scientists. You will help define and ensure best coding and CI/CD practices within a team of excellent and engaged engineers. You will be given creative freedom and will work in a supportive team environment. This is a hands-on role that involves coding and regular interaction with business stakeholders.

Let's talk about responsibilities:

Ensure AI platforms are reliable, scalable, and resilient by establishing foundational blueprints for upgrade and release strategies, implementing comprehensive logging, monitoring, and metrics, and automating critical system management tasks.

Work with Generative AI development, including embeddings and fine-tuning of generative models.

Build and maintain systems using DevOps, LLMOps, and AIOps practices (Kubernetes, Docker), AWS, Python, and Terraform.

Push the boundaries of what’s possible with AI by thinking beyond current technology and stack constraints, and collaboratively delivering innovative yet practical solutions.

Participate in and set up Proofs of Concept (POCs) to demonstrate proposed solutions.

Enable team members through training, culture-building, and mentorship.

Identify, design, and implement internal process improvements: automate manual processes, re-architect infrastructure for greater scalability, etc.

Build infrastructure for the AWS platform, including Lambdas, ECS, EC2, SNS/SQS, Bedrock, ML pipeline engineering, data monitoring, alerting, and networking.

Implement observability stacks such as Prometheus, Loki, VictoriaLogs, Grafana, and Datadog.

Design, build, and support Data & ML model pipelines using the latest CI/CD and deployment technologies.

Collaborate with stakeholders including Executive, Product, Data, and Design teams to resolve technical issues and support infrastructure needs.

Participate in code reviews and process improvements.

Let's talk about qualifications and experience:

7+ years of experience in a complex, technical environment. Proven experience developing production-grade code in Python, SQL, and Pandas.

Deep expertise in Kubernetes as a foundational platform for deploying, scaling, and managing AI/ML workloads in production.

Experience with 3 or more of the following AWS tools: Lambda, EC2, EMR, S3, Glue, Athena, RDS, Networking, IAM, Batch Processing, SageMaker, Airflow.

Proficiency in Terraform and common DevOps/DevSecOps tools and techniques such as Docker, GitHub, GitHub Actions, SonarQube, Checkmarx, and JFrog.

Experience with Kubeflow and MLflow is a strong advantage.

Skilled in creating and managing CI/CD pipelines and APIs tailored for AI/ML workloads.

Experience with both relational (SQL) and NoSQL databases; Snowflake experience is a plus.

Solid understanding of the OAuth 2.0 protocol for secure authorization.

Hands-on experience implementing A/B testing for models and AI applications.

Familiarity with AI governance, observability, and compliance practices across AI/ML workflows.

Experience working with LLMs, AI agents, and multi-agent coordination platforms (MCP); strong hands-on exposure to AI platform engineering.

Exposure to LLM orchestration frameworks such as Flowise, Langflow, and LangGraph is a plus.

Familiarity with AgentOps tools and methodologies is a strong advantage.

Demonstrated ability to work with AI/ML teams and cross-functional groups in fast-paced, dynamic environments.

All listed duties, requirements and responsibilities are considered as essential functions to this position; however, business conditions may require reasonable accommodation for added tasks and responsibilities.

Let’s talk about what you can expect:
• A supportive environment that focuses on people development and best implementation.
• Opportunity to design, influence, and be innovative.
• Work with inclusive global teams and the open sharing of new ideas. We want your ideas!
• Be supported both inside and outside of the work environment.
• The opportunity to build something meaningful and see a direct positive impact on people’s lives!
Dream big, iterate and experiment to drive innovation!!

Joining us is more than saying “yes” to making the world a healthier place. It’s discovering a career that’s challenging, supportive and inspiring. Where a culture driven by excellence helps you not only meet your goals, but also create new ones. We focus on creating a diverse and inclusive culture, encouraging individual expression in the workplace and thrive on the innovative ideas this generates. If this sounds like the workplace for you, apply now! We commit to respond to every applicant.

Mostrar Mais

Salvar & Candidatar-se depois Applying Later... Click to ApplyI AppliedDidn't Apply

Confirmar seu email: Enviar Email

Candidatar-se à essa vaga

Próxima Vaga »

Todos os Empregos de ResMed

Vagas de emprego de 24 ResMed em Bangalore