Shanghai, Shanghai, China
12 days ago
Senior Production Support Engineer

Digital Business Services (DBS)

Our GCIO organisation plays a critical role for the bank. This team partners with the businesses to build the platforms, systems, and products that our customers use every day. We keep people’s money and data safe, and are at the forefront of driving innovation for our businesses, customers, and colleagues.

We are currently seeking an experienced professional to join our team.

In this role, you will:

Incident Management:
•Monitor, troubleshoot, and resolve production incidents for local and global banking applications in a timely manner to minimize downtime.
•Provide L1 and L2 support, including initial triage, diagnostics, and resolution, and collaborate with application support teams and vendors for partial L3 support to address complex issues.
Global and Regional Coordination:
•Act as the primary point of contact between local teams in China and global/regional system teams, including SRE and DevOps teams, to ensure seamless incident resolution and system alignment.
•Coordinate with global teams to manage incidents affecting distributed banking systems, ensuring consistency in processes and standards.
Collaboration:
•Work closely with application support teams to resolve escalated issues and implement fixes for production systems.
•Engage with the bank's operation resilience project team to align on initiatives for system robustness, disaster recovery, and regulatory compliance.
•Collaborate with internal IT/tech center staff and external vendors to manage service-level agreements (SLAs) and ensure effective incident resolution.
Problem Management:
•Lead post-incident root cause analysis (RCA) and coordinate with problem management teams to identify and implement preventive measures.
•Drive initiatives to reduce recurring incidents and improve system stability.
Monitoring and Reporting:
•Oversee monitoring systems (e.g., Splunk, Nagios) to proactively detect issues and analyze performance metrics.
•Provide regular reports to senior management on system health, incident trends, and SLA adherence.
Process Improvement:
•Enhance support processes, tools, and documentation to improve operational efficiency and response times.
•Collaborate with SRE and DevOps teams to integrate automation and resilience practices into production support workflows.
Compliance and Security:
•Ensure compliance with China's regulatory requirements (e.g., Cybersecurity Law, data localization) and global banking standards.
•Work with security teams to protect sensitive financial data during incident resolution.
Team Leadership:
•Manage and mentor a team of production support engineers, fostering a culture of collaboration, accountability, and technical excellence.
•Ensure team readiness for on-call support and efficient incident handling.

Confirmar seu email: Enviar Email