BANGALORE, IND
1 day ago
Site Reliability Engineer – Compute Operations
**Introduction** At IBM Infrastructure & Technology, we design and operate the systems that keep the world running. From high-resiliency mainframes and hybrid cloud platforms to networking, automation, and site reliability. Our teams ensure the performance, security, and scalability that clients and industries depend on every day. Working in Infrastructure & Technology means tackling complex challenges with curiosity and collaboration. You’ll work with diverse technologies and colleagues worldwide to deliver resilient, future-ready solutions that power innovation. With continuous learning, career growth, and a supportive culture, IBM provides the opportunities to build expertise and shape the infrastructure that drives progress. Site Reliability engineers apply Software Engineering principles to perform infrastructure management tasks more eHiciently. They are focused on reliability and resiliency, and build systems which proactively detect issues before they cause customer impact. They are responsible for maintaining a high-performance, secure, and stable infrastructure for our clients. Additionally, SREs resolve customer issues and problems detected through monitoring. They participate in datacenter build and configuration activities, performing tests, and deploy new features and capacity. **Your role and responsibilities** As a Site Reliability Engineer, you will work in an agile, collaborative environment to build, deploy, configure, and maintain systems for the IBM client business. In this role, you will lead the problem resolution process for our clients, from analysis and troubleshooting, to deploying the latest software updates & fixes. Site Reliability Engineering (SRE) professionals are engineers who specialize in reliability and resiliency with the right mix of knowledge and skills in software and systems, responsible to analyze business needs, problem determination, advise & design, build, test, deploy, changes and maintenance of a well-engineered information system and ecosystems. Responsibilities: As a compute Operations Site Reliability Engineer, working in US Shift timing, you perform the following tasks: • Monitor provisioning tests and investigate/resolve any failures • Perform code stack updates on infrastructure systems (VIOS, firmware, PowerVC, HMC, Novalink, NIM servers) as well as cloud supporting systems (jump servers, sobox, network nodes, gateways, TSM servers) • Upload/maintain stock images • Maintain UserIDs(Add/delete) and passwords • Monitor daily/weekly backups to ensure they are working • Manage and maintain Nagios monitoring environment, troubleshoot scripts/plug-ins if there is an issue • Perform periodic LPMs, inactive migrations, or remote restarts of customer VMs to perform system maintenance, balance workloads, or free up resources • Monitor and provide details of Capacity utilized in each Data enter • Attend scheduled meetings planned by customer for cutover/maintenance windows • Verify capacity requirements in case of provisioning failure issues by customers • Work with customers to resolve any RSCT issues so that LPM activities can be performed without impacting customer workloads. **Required technical and professional expertise** The candidate should be willing to work in US shift timings. Relevant Industry work experience of 5-7years • In-depth knowledge of Power server HW (Models, I/O Adapters etc) • HMC knowledge and experience operating • In-depth knowledge of PowerVM including installation/configuration and operating • Experience with PowerVC including installation/configuration and operating • Experience with Linux administration, commands and networking • Knowledge of Nova Link including minimal installation/configuration • High level knowledge of Power Systems supported Operating Systems (AIX and IBM) • In-depth knowledge of how storage is connected and allocated to Power systems via NPIV connections • Good understanding of Power Systems network configuration at the system level **Preferred technical and professional experience** • Experience with configuring and tuning PowerVS • Experience training new personnel on tooling and processes • Storage & Power RTS, MVS Network for Cisco, Juniper; general support skills IBM is committed to creating a diverse environment and is proud to be an equal-opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, gender, gender identity or expression, sexual orientation, national origin, caste, genetics, pregnancy, disability, neurodivergence, age, veteran status, or other characteristics. IBM is also committed to compliance with all fair employment practices regarding citizenship and immigration status.
Confirmar seu email: Enviar Email
Todos os Empregos de IBM