Reliability Engineer (SRE) + Scrum Master
Pepsi
Overview We are seeking a highly skilled and analytically strong Site Reliability Engineer (SRE) and Scrum with 6+ years of experience. The ideal candidate will have a proven track record in managing SRE responsibilities across multiple teams, with deep expertise in Active Directory (AD) groups, Databricks, architecture design, and enterprise tools like Clarity and ServiceNow. Strong Scrum delivery experience and cross-functional collaboration are essential. Responsibilities Key Responsibilities: Lead SRE operations across distributed teams, ensuring system reliability, scalability, and performance. Design and implement robust monitoring, alerting, and observability frameworks. Lead Scrum ceremonies Manage and optimize Active Directory (AD) group structures and access controls. Collaborate with data engineering teams to support Databricks environments. Contribute to architectural discussions and decisions for high-availability systems. Drive incident response, root cause analysis, and continuous improvement initiatives. Integrate and manage workflows using Clarity PPM and ServiceNow for change, incident, and problem management. Actively participate in Scrum ceremonies (daily stand-ups, sprint planning, reviews, retrospectives). Collaborate with Product Owners and Scrum Masters to ensure timely and quality. Qualifications Education: Bachelor’s or Master’s degree in Computer Science, Information Systems, Business Analytics, or a related field. Experience: 6+ years of experience in SRE, DevOps, or Infrastructure Engineering roles. Strong analytical thinking and troubleshooting skills. Hands-on experience with: Active Directory (AD): group policy management, access provisioning. Databricks: cluster management, job orchestration, performance tuning. Architecture: designing scalable, fault-tolerant systems. Clarity PPM: project tracking, resource planning. ServiceNow: incident/change/problem management workflows. Proficiency in monitoring tools (e.g., Prometheus, Grafana, Datadog). Experience with CI/CD pipelines and infrastructure as code (Terraform, Ansible). Familiarity with cloud platforms (Azure, AWS, or GCP). Strong scripting skills (Python, Bash, PowerShell). Solid understanding of Agile/Scrum methodologies and tools like Jira or Azure DevOps. Preferred Qualifications: Certified Scrum Master or equivalent Agile certification. Experience working in a global delivery model. Exposure to digital product and reporting services is a plus.
Confirmar seu email: Enviar Email