Some careers have more impact than others.
If you’re looking for a career where you can make a real impression, join HSBC and discover how valued you’ll be.
We are currently seeking an experienced professional to join our team in the role of SR. Associate Director, Software Engineering Specialist.
Business: CTO Infrastructure
Principal responsibilities
The Internal Kubernetes Platform (IKP) team follows a Site Reliability Engineering (SRE) model, this role will be working as an Engineer for IKP, the platform is based on GKE Kubernetes clusters, as well as the management of bundled services such as Istio and Prometheus. Members of the SRE team are expected to work closely with L1 support, Services Engineering, and the IKP Core team. Team members should address incidents and resolve issues, while striving to improve monitoring and build automation.
Objectives of this Role:
Run the IKP clusters by monitoring availability and taking a holistic view of system health.Build tools and automation to manage platform infrastructure and service.Improve reliability, quality, and time to upgrade cluster and service versions.Measure and optimize system performance and resource utilization, and plan for future capacity.Build dashboards and visualizations to graph system health.Define system alerts and automate responses where possible.Provide operational support and engineering for multiple software development teams.Work with the senior team members on driving the platform forward to align with the Banks 2027 goals.Work closely with application teams that consume the IKP Hosting platform, however the IKP team is not responsible for the Application development of services deployed onto IKP.Daily and Monthly Responsibilities:
Gather and analyze metrics from cluster components and services to assist in performance tuning and fault finding.Partner with Core Engineering and Services Engineering teams to improve services through rigorous testing and release procedures.Participate in system design consulting, platform management, and capacity planning.Create sustainable systems and services through automation and uplifts.Balance feature development speed and reliability with well-defined service level objectives.Confirm the health of clusters to assist developers when they have issues deploying a new workload.Be proactive and identify opportunities where you and the team can provide increased customer service, reliability, and scalability of the IKP Platform.