Site Reliability Engineering Professional
IBM
**Introduction**
A career in IBM Software means you’ll be part of a team that transforms our customer’s challenges into solutions.
Seeking new possibilities and always staying curious, we are a team dedicated to creating the world’s leading AI-powered, cloud-native software solutions for our customers. Our renowned legacy creates endless global opportunities for our IBMers, so the door is always open for those who want to grow their career.
IBM’s product and technology landscape includes Research, Software, and Infrastructure. Entering this domain positions you at the heart of IBM, where growth and innovation thrive.
**Your role and responsibilities**
As a Site Reliability Engineer, you will work in an agile, collaborative environment to build, deploy, configure, and maintain systems for the IBM client business. In this role, you will lead the problem resolution process for our clients, from analysis and troubleshooting, to deploying the latest software updates & fixes.
Your primary responsibilities include:
· Troubleshoot, monitor, and support critical production systems.
· Perform root cause analysis and manage incidents to ensure timely resolution.
· Provision and deploy environments in a cloud infrastructure
· Handle initial intake for [1] [CG1] customer ticket requests for configuration changes, ensuring SLA commitments are met.
· Provide on-call support, sharing rotation duties with global resources ensuring minimized MTTR (Mean Time to Recovery).
· Perform regular patching and upgrades and collaborate with product support to resolve issues.
· Execute on a number of tasks in an interrupt-driven environment without losing site of the customer requirements
-------------------------
References
Visible links
1. #_msocom_1
**Required technical and professional expertise**
* Hands-on experience as a DevOps or SRE Engineer.
* Experience with at least one major public cloud provider or large scale private/hybrid cloud using container orchestration.
* Proven experience in providing on-call support for critical production systems, with a focus on determining root cause analysis (RCA).
* Familiarity with Kubernetes, EKS, ROSA, AKS, GKS, OpenShift.
* Strong problem-solving skills and attention to detail.
* Proficiency in scripting languages like Python and related tools.
* Good understanding of CI/CD processes and tools (e.g., Jenkins).
* Hands-on experience with Linux systems administration.
**Preferred technical and professional experience**
* Familiarity with customer case management software a processes.
* Experience with monitoring tools and incident management platforms.
* Ability to work efficiently in a global, distributed team environment.
IBM is committed to creating a diverse environment and is proud to be an equal-opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, gender, gender identity or expression, sexual orientation, national origin, caste, genetics, pregnancy, disability, neurodivergence, age, veteran status, or other characteristics. IBM is also committed to compliance with all fair employment practices regarding citizenship and immigration status.
Confirmar seu email: Enviar Email
Todos os Empregos de IBM