USA
2 days ago
Director, Site Reliability Engineering (Hybrid/Flexible)

The Director of Site Reliability Engineering (SRE) will provide strategic leadership and technical direction for the reliability, scalability, and performance of our mission‑critical systems and services. This role combines deep SRE expertise with strong engineering leadership, driving organizational transformation toward reliability-first principles. The ideal candidate brings a strong software engineering foundation, a passion for automation, and a proven ability to lead and develop high‑performing teams.

The Director will partner with engineering, product, operations, and business stakeholders to design, deliver, and operate resilient, high‑availability systems that support our customers and business objectives at scale.

Responsibilities

Provide strategic direction for the organization-wide adoption, evolution, and maturity of SRE principles, cultivating a culture centered on reliability, efficiency, and continuous improvement.

Develop and oversee automation strategies, tools, and frameworks that improve system reliability, reduce operational toil, and enhance team productivity.

Architect and evolve robust observability, monitoring, and alerting systems to ensure availability, performance, and real‑time operational insight.

Lead and govern high‑severity incident response practices—ensuring rapid triage, thorough root cause analysis, and follow‑through on corrective and preventative actions.

Analyze reliability, performance, and capacity metrics to drive proactive optimization and long‑term system resilience.

Partner with engineering, product, and operations teams to embed SRE practices throughout the development lifecycle and influence architectural decisions for reliability.

Build, mentor, and develop a high‑performing SRE organization, fostering technical excellence, career growth, and a strong culture of knowledge sharing.

Oversee capacity planning, scalability assessments, and future‑state demand forecasting across critical systems.

Establish and maintain comprehensive documentation of SRE processes, standards, frameworks, and best practices.

Key Decision Rights

Define the technical strategy and roadmap for SRE, including automation, reliability frameworks, tooling, monitoring, and operational best practices.

Serve as final decision-maker during major incidents, including prioritization of remediation and long‑term reliability actions.

Allocate resources, manage staffing decisions, and oversee budget planning for SRE initiatives.

Establish, approve, and enforce service level objectives (SLOs), error budgets, and performance standards for systems and services.

Drive process and operational improvements that enhance system reliability and organizational efficiency.

Evaluate, select, and govern the adoption of third‑party tools and platforms related to observability, incident response, reliability testing, and automation.

Define and approve training, development programs, and readiness standards for the SRE organization.

Required Leadership & Interpersonal Skills

Ability to define a clear reliability vision and inspire teams and stakeholders toward long‑term reliability goals.

Demonstrated sound judgment and calm decision‑making under pressure, particularly during high‑severity incidents.

Strong people leadership skills, with experience coaching, mentoring, and developing engineering talent.

Strategic planning skills with a track record of aligning technical direction with organizational objectives.

Excellent communication skills; able to translate complex technical issues into clear, actionable insights for executive and non‑technical audiences.

Highly collaborative, with the ability to work effectively across engineering, product, operations, and business functions.

Skilled at navigating conflict and fostering healthy team dynamics.

Proactive problem solver who identifies risks and drives innovative solutions.

Strong sense of accountability for team outcomes, reliability standards, and operational excellence.

Required Skills and Competencies

Expertise with observability and monitoring platforms such as Datadog, Prometheus, Dynatrace, Grafana, ELK, or similar.

Strong proficiency in programming languages such as Python, Go, or Java.

Deep understanding of cloud platforms (AWS, Azure, GCP) and container orchestration technologies (Docker, Kubernetes).

Advanced knowledge of AWS services including VPC, Lambda, IAM, ELB, EC2, ECS, CloudWatch, API Gateway, S3, SQS, SNS, WAF, and Route53.

Hands-on experience with infrastructure‑as‑code tools such as Terraform, Ansible, or equivalents.

Expert troubleshooting and problem-solving skills across distributed systems.

Strong leadership and communication skills with a proven ability to work cross-functionally.

Demonstrated success leading and mentoring engineering teams.

Strong understanding of security best practices, compliance frameworks, and implementation of security controls.

Experience with chaos engineering, resilience testing, and failure-injection methodologies.

Familiarity with applying AI/ML approaches to reliability, operations, and incident management.

Education and Experience

Bachelor’s in computer science, Engineering, or a related field.16 years of experience in the field including 6+ Site Reliability Engineering, DevOps, or a similar role.Proven experience architecting and managing highly available, scalable, and fault-tolerant systems.
 

NOTE: This position is eligible for hybrid working arrangements and requires on-site work from an Insulet office.

Additional Information:



Compensation & Benefits:

For U.S.-based positions only, the annual base salary range for this role is $188,300.00 - $282,500.00

This position may also be eligible for incentive compensation.

We offer a comprehensive benefits package, including:
• Medical, dental, and vision insurance
• 401(k) with company match
• Paid time off (PTO)
• And additional employee wellness programs

Application Details:
This job posting will remain open until the position is filled.
To apply, please visit the Insulet Careers site and submit your application online.

Actual pay depends on skills, experience, and education.

Insulet Corporation (NASDAQ: PODD), headquartered in Massachusetts, is an innovative medical device company dedicated to simplifying life for people with diabetes and other conditions through its Omnipod product platform. The Omnipod Insulin Management System provides a unique alternative to traditional insulin delivery methods. With its simple, wearable design, the tubeless disposable Pod provides up to three days of non-stop insulin delivery, without the need to see or handle a needle. Insulet’s flagship innovation, the Omnipod 5 Automated Insulin Delivery System, integrates with a continuous glucose monitor to manage blood sugar with no multiple daily injections, zero fingersticks, and can be controlled by a compatible personal smartphone in the U.S. or by the Omnipod 5 Controller. Insulet also leverages the unique design of its Pod by tailoring its Omnipod technology platform for the delivery of non-insulin subcutaneous drugs across other therapeutic areas. For more information, please visit insulet.com and omnipod.com.

We are looking for highly motivated, performance-driven individuals to be a part of our expanding team. We do this by hiring amazing people guided by shared values who exceed customer expectations. Our continued success depends on it!

At Insulet Corporation all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or status as a protected veteran.

(Know Your Rights)

Confirmar seu email: Enviar Email
Todos os Empregos de Insulet Corporation