Hyderabad, IND
3 days ago
Reliability Engineering Lead
**Position Title:** Reliability Engineering Lead + **Location: Hyderabad** **Role Description (Process-First Responsibilities)** **1. Service Level Management & Reliability Framework** + **Process Owner:** SLO-driven reliability decision making across digital services + **Establish SLO Foundation:** Define, implement, and maintain Service Level Indicators (SLIs) and Service Level Objectives (SLOs) for critical services, ensuring alignment with business impact and patient safety requirements + **Error Budget Management:** Implement error budget policies that balance feature velocity with reliability, using budget consumption as the primary decision-making tool for release management and incident prioritization + **Reliability Governance:** Create and maintain reliability standards that comply with GxP, SOX, and other pharmaceutical regulatory frameworks while enabling innovation velocity + **Business Impact Correlation:** Translate technical reliability metrics into business language, demonstrating clear connections between SLO compliance and revenue, patient safety, or operational efficiency **2. Incident Management & Learning Culture** + **Process Owner:** Blameless incident response and organizational learning + **Incident Command:** Lead critical incident response using structured protocols, focusing on rapid detection, mitigation, and recovery while maintaining detailed audit trails for regulatory compliance + **Blameless Postmortem Leadership:** Facilitate blameless postmortems that focus on system improvements rather than individual accountability, creating a culture of psychological safety for honest analysis + **Learning Repository Management:** Maintain and curate incident learning repositories with transparent sharing across digital units, enabling pattern recognition and systemic improvement + **Predictive Issue Prevention:** Implement proactive monitoring and alerting systems that identify potential failures before they impact users, shifting from reactive to preventive operations **3. Toil Elimination & Engineering Balance** + **Process Owner:** Systematic automation of operational overhead + **Toil Measurement & Reduction:** Maintain operational work (toil) below 50% of time through systematic identification, measurement, and elimination of manual, repetitive tasks + **Automation Strategy:** Design and implement automation solutions using cost-benefit analysis, prioritizing work that scales linearly with service growth and requires minimal human judgment + **Engineering Project Delivery:** Dedicate minimum 50% of time to engineering projects that improve reliability, performance, or developer experience, delivering measurable improvements quarterly + **Knowledge Transfer:** Create self-service documentation, runbooks, and automation tools that reduce dependency on human intervention and enable team scaling **4. Platform Engineering Integration & AI Enablement** + **Process Owner:** Reliability integration in AI-first platform services + **AI Workload Reliability:** Design and implement reliability practices for AI/ML workloads, including agent-to-agent communication systems, model serving infrastructure, and data pipeline reliability + **Platform Collaboration:** Partner with platform teams to embed reliability principles into Internal Developer Platforms (IDPs), enabling self-service infrastructure with built-in reliability guardrails + **Agentic System Support:** Provide reliability engineering expertise for Sanofi's agentic AI ecosystem, ensuring conversational AI systems meet enterprise reliability and compliance standards + **Developer Experience Enhancement:** Contribute to CI/CD pipeline reliability, infrastructure-as-code best practices, and observability integration that accelerates developer productivity **5. Observability & Performance Engineering** + **Process Owner:** Comprehensive system visibility and performance optimization + **Full-Stack Observability:** Implement and maintain observability platforms covering metrics, logs, traces, and business KPIs, providing end-to-end visibility into service health and user experience + **Performance Optimization:** Conduct systematic performance engineering including capacity planning, bottleneck identification, and scalability improvements aligned with business growth projections + **Intelligent Monitoring:** Deploy AI-powered monitoring and alerting systems that reduce noise, provide intelligent root cause analysis, and enable predictive maintenance + **Cross-System Correlation:** Establish monitoring federation across diverse technology stacks (cloud, on-premises, legacy) while maintaining regulatory audit trails **6. Security & Compliance Integration** + **Process Owner:** Reliability practices within regulatory frameworks + **Secure Reliability Engineering:** Implement reliability practices that enhance rather than compromise security posture, integrating DevSecOps principles with pharmaceutical compliance requirements + **Compliance Automation:** Automate compliance checks, audit trail generation, and regulatory reporting while maintaining system reliability and performance + **Risk Assessment Integration:** Conduct reliability impact assessments for changes affecting GxP systems, balancing innovation speed with regulatory validation requirements + **Disaster Recovery:** Design and test disaster recovery procedures that meet both technical recovery objectives and regulatory continuity requirements **7. Team Leadership** + **Process Owner:** Represent the reliability engineering discipline + **Team Grooming:** Groom a team of SREs that can work independently across the key SRE principles + **Communication:** Provide crisp and strategic updates to the leadership team + **Lead by Example:** Demonstrate expertise by taking on complex scenarios and providing innovative solutions that can be leveraged by the team, documented for knowledge sharing, and scaled across the organization to drive systematic reliability improvements **Pursue** **_progress_** **, discover** **_extraordinary_** Better is out there. Better medications, better outcomes, better science. But progress doesn’t happen without people – people from different backgrounds, in different locations, doing different roles, all united by one thing: a desire to make miracles happen. So, let’s be those people. At Sanofi, we provide equal opportunities to all regardless of race, colour, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, ability or gender identity. Watch our ALL IN video (https://www.youtube.com/watch?v=SkpDBZ-CJKw&t=67s) and check out our Diversity Equity and Inclusion actions at sanofi.com (https://www.sanofi.com/en/our-responsibility/equality-and-inclusiveness) ! Global Terms & Conditions and Data Privacy Statement (https://www.sanofi.com/en/careers/global-terms-and-conditions/) Sanofi is dedicated to supporting people through their health challenges. We are a global biopharmaceutical company focused on human health. We prevent illness with vaccines, provide innovative treatments to fight pain and ease suffering. We stand by the few who suffer from rare diseases and the millions with long-term chronic conditions. With more than 100,000 people in 100 countries, Sanofi is transforming scientific innovation into healthcare solutions around the globe. Discover more about us visiting www.sanofi.com or via our movie We are Sanofi (https://youtu.be/96EwNjb1TLo) As an organization, we change the practice of medicine; reinvent the way we work; and enable people to be their best versions in career and life. We are constantly moving and growing, making sure our people grow with us. Our working environment helps us build a dynamic and inclusive workplace operating on trust and respect and allows employees to live the life they want to live. All in for Diversity, Equity and Inclusion at Sanofi - YouTube (http://www.youtube.com/watch?v=SkpDBZ-CJKw&t=2s)
Confirmar seu email: Enviar Email