Krakow, POL
2 days ago
Site Reliability Engineering Manager
**Introduction** A career in IBM Software means you’ll be part of a team that transforms our customer’s challenges into solutions. Seeking new possibilities and always staying curious, we are a team dedicated to creating the world’s leading AI-powered, cloud-native software solutions for our customers. Our renowned legacy creates endless global opportunities for our IBMers, so the door is always open for those who want to grow their career. IBM’s product and technology landscape includes Research, Software, and Infrastructure. Entering this domain positions you at the heart of IBM, where growth and innovation thrive. **Your role and responsibilities** As a site reliability engineering manager (SRE) in the IBM Software organization, you will be responsible for managing and leading a team of SRE engineers. Responsibilities include ensuring the reliability, scalability, and operational efficiency of IBM Asset Lifecycle Management services. You will do the hiring, training, and mentoring team members, assigning tasks, setting goals, and conducting performance evaluations. You will work closely with development teams, SRE peers and engineering managers to automate infrastructure management, optimize system performance, and enhance monitoring capabilities.. Overall, an SRE Manager plays a crucial role in aligning engineering and operations to achieve reliable software systems. Combine technical expertise with leadership and management skills to drive continuous improvement and ensure high-quality service delivery. Key Responsibilities: Leadership * Provide strategic guidance to engineering teams on architectural decisions and directions. * Empower teams to achieve technical excellence, with a focus on reliability, scalability, and simplicity. * Foster collaboration across engineering, product, and other cross-functional teams to deliver optimal solutions. Monitoring & Observability * Design and implement monitoring solutions to gain insights into system health, performance, and reliability. * Build and maintain intuitive dashboards for real-time visibility into critical system metrics. * Set up proactive alerting mechanisms to detect and resolve issues before they impact end users. Incident Management * Lead incident response, performing root cause analysis (RCA) and implementing long-term fixes to improve system resilience. * Build observability solutions with monitoring, logging, and alerting using tools like Prometheus, Grafana, Instana * Define and monitor Service Level Objectives (SLOs), and Service Level Agreements (SLAs) to ensure service reliability. Security & Compliance * Ensure compliance with security best practices and regulatory requirements across all infrastructure components. * Implement secret management, encryption, and access control for sensitive systems and data. * Participate in security audits, vulnerability assessments, and compliance automation efforts. Cross-Team Collaboration & DevOps Culture * Collaborate closely with development, operations, and security teams to design and implement resilient architectures. * Promote SRE best practices, such as blameless postmortems, incident retrospectives, and operational readiness reviews. * Mentor junior engineers and contribute to knowledge sharing across teams to build a strong SRE culture. **Required technical and professional expertise** * Bachelor's degree in computer science engineering/information technology * 5+ years' of experience working in global organizations with the ability to effectively communicate with executives, leaders, and individual contributors across the organization. * 5+ years of SRE experience working with telemetry, observation, self-healing solutions, and platform automation. * Cloud & Infrastructure: Expertise in Kubernetes, OpenShift, Docker, IBM Cloud and other cloud platforms IBM is committed to creating a diverse environment and is proud to be an equal-opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, gender, gender identity or expression, sexual orientation, national origin, caste, genetics, pregnancy, disability, neurodivergence, age, veteran status, or other characteristics. IBM is also committed to compliance with all fair employment practices regarding citizenship and immigration status.
Confirmar seu email: Enviar Email
Todos os Empregos de IBM