Site Reliability Engineer III- Data and AWS - GLASGOW, LANARKSHIRE, United Kingdom

GLASGOW, LANARKSHIRE, United Kingdom

16 hours ago

Site Reliability Engineer III- Data and AWS

JP Morgan

There’s nothing more exciting than being at the center of a rapidly growing field in technology and applying your skillsets to drive innovation and modernize the world's most complex and mission-critical systems.

As a Site Reliability Engineer III at JPMorgan Chase within the AIML Data Platforms and Chief Data and Analytics Team, you will solve complex and broad business problems with simple and straightforward solutions. Through code and cloud infrastructure, you will configure, maintain, monitor, and optimize applications and their associated infrastructure to independently decompose and iteratively improve on existing solutions. You are a significant contributor to your team by sharing your knowledge of end-to-end operations, availability, reliability, and scalability of your application or platform.

Job responsibilities

Assists in operating and maintaining the managed AWS and Data platforms; provides day-to-day engineering and operational support to SRE and application teams under guidance.Supports platform design, setup, and configuration; performs workspace administration, resource monitoring, and basic troubleshooting for data engineering, Data Science/ML, and application/integration teams.Participates in evaluation activities with external vendors, startups, and internal teams; documents findings and recommendations for senior review.Contributes to improvements in system observability, alerting, and capacity planning by building dashboards, updating runbooks, and implementing basic automation.Collaborates with engineering and data teams to optimize infrastructure and deployment processes, focusing on automation and operational excellence; writes and maintains scripts or pipelines following standards.Implements and troubleshoots software solutions; contributes to design and development tasks and escalates complex issues appropriately.Writes secure, high-quality production code for features and fixes; performs basic peer reviews and debugs own code when needed.Identifies recurring issues and proposes or implements automation and remediation steps to improve operational stability of applications and systems.Contributes to a team culture of inclusion, respect, and continuous learning.Applies Site Reliability Engineering best practices (e.g., SLIs/SLOs, error budgets, incident response) with direction from senior engineers to support reliability, scalability, and performance of data platforms.Participates in incident response following established procedures; assists with root-cause analysis, postmortem documentation, and implementation of corrective actions.

Required qualifications, capabilities, and skills

Formal training or certification on software engineering concepts and applied experienceProficient in site reliability culture and principles and familiarity with how to implement site reliability within an application or platformExperience in observability such as white and black box monitoring, service level objective alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, and othersUnderstanding of SRE principles, including SLIs, SLOs, error budgets, and incident management.Experience with monitoring tools, automation frameworks, and CI/CD pipelines.Experience writing Python applications or scripts and using automated unit testing frameworks.Experience with terraform development and understanding of terraform enterprise.Experience contributing to system design discussions, application development, testing, and supporting operational stability.Familiarity with big data distributed compute frameworks such as Apache Spark, AWS Glue, and MapReduce.Strong troubleshooting, analytical, and communication skills. Preferred qualifications, capabilities, and skillsFamiliarity with distributed systems and large-scale data processing.Experienced with AWS and PythonKnowledge of containerization (Docker, Kubernetes) and orchestration.

Mostrar Mais

Salvar & Candidatar-se depois Applying Later... Click to ApplyI AppliedDidn't Apply

Confirmar seu email: Enviar Email

Candidatar-se à essa vaga

Próxima Vaga »

Todos os Empregos de JP Morgan

Vagas de emprego de 91 JP Morgan em Glasgow, LANARKSHIRE Vagas de emprego de 91 JP Morgan em LANARKSHIRE Vagas de emprego de 736 JP Morgan em GB