Heredia, CRI
2 days ago
Site Reliability Engineering Professional
**Introduction** A career in IBM Software means you’ll be part of a team that transforms our customer’s challenges into solutions. Seeking new possibilities and always staying curious, we are a team dedicated to creating the world’s leading AI-powered, cloud-native software solutions for our customers. Our renowned legacy creates endless global opportunities for our IBMers, so the door is always open for those who want to grow their career. IBM’s product and technology landscape includes Research, Software, and Infrastructure. Entering this domain positions you at the heart of IBM, where growth and innovation thrive. **Your role and responsibilities** As a Site Reliability Engineer, you will work in an agile, collaborative environment to build, deploy, configure, and maintain systems for the IBM client business. In this role, you will lead the problem resolution process for our clients, from analysis and troubleshooting, to deploying the latest software updates & fixes. Your primary responsibilities include: Infrastructure & Cloud Management: • Design, build, and manage scalable cloud infrastructure using IBM Cloud, AWS, GCP, Azure. • Implement Infrastructure as Code using Terraform. • Deploy and configure applications using container orchestration platforms like Kubernetes/OpenShift. Automation & CI/CD: • Develop and maintain automation scripts and tools using Python, Groovy, and Ansible. • Build and manage robust CI/CD pipelines using tools like Jenkins, IBM Continuous Delivery, and ArgoCD. System Monitoring & Reliability: • Monitor health and performance of production systems (24x7 observability). • Use tools like Instana, Grafana/Prometheus, and New Relic to build alerts and dashboards. • Troubleshoot and resolve production issues in collaboration with engineering and support teams. Security & Compliance: • Perform regular patching, upgrades, and collaborate with product support to resolve issues. Database & Middleware: • Manage open-source middleware and databases such as PostgreSQL, CouchDB, Redis, Kafka, and Spark. • Participate in incident response and on-call rotations. **Required technical and professional expertise** Required Technical and Professional Expertise: - Strong working knowledge of Kubernetes and cloud infrastructures, with a preference for AWS. - Proven experience in providing on-call support for critical production systems, with a focus on determining root cause analysis (RCA). - Proficiency in scripting languages like Python and related tools. - Strong problem-solving skills and attention to detail. - Expertise in automation platforms such as AWX. **Preferred technical and professional experience** Familiarity with Salesforce infrastructure and case management processes. - Experience with monitoring tools and incident management platforms. - Ability to work efficiently in a global, distributed team environment. IBM is committed to creating a diverse environment and is proud to be an equal-opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, gender, gender identity or expression, sexual orientation, national origin, caste, genetics, pregnancy, disability, neurodivergence, age, veteran status, or other characteristics. IBM is also committed to compliance with all fair employment practices regarding citizenship and immigration status.
Confirmar seu email: Enviar Email