Jersey City, NJ, USA
1 day ago
Lead Site Reliability Engineer

Firm wide Planning & Analysis (FW P&A) is a key Finance function within the Office of the CFO (Chief Finance Office)that supports each Line of Business CFO and the Firm wide CFO. We are a data engineering team responsible for financial reporting, forecasting, budgeting, and strategic oversight. Our applications and platform and Data based and are on a combination of On-Premise (Linux) and Public cloud (AWS). We use cloud technologies such as DataBricks, RDS Postgres, EKS, ECS, S3, Lambda, Step Functions etc all written with Python. 

As a Lead Site Reliability Engineer at JPMorgan Chase within the Corporate Technology, you hold a leadership role by providing advice and mentorship to other engineers on your team and line of business. It is critical to demonstrate strong knowledge across multiple technical domains and advise others on the technical and business issues facing them. You will facilitate resiliency design reviews, deconstruct complex problems into digestible work for other engineers, and act as a technical lead for medium to large sized products.

 Job responsibilities

Consistently models and champions site reliability culture and practices and exerts technical influence throughout your teamLeads initiatives to improve the reliability and stability of your team’s applications and platforms using data-driven analytics to improve service levelsDrives collaboration with your team to identify comprehensive service level indicators and the stakeholder partners to establish reasonable service level objectives and error budgets with your customersOffers a high level of technical expertise within one or more technical domains and proactively identifies and solves for technology-related bottlenecks in your areas of expertiseServes as the main point of contact during major incidents for your application and have the skills to identify and solve the issue quickly to avoid financial loss to the businessDocuments and shares knowledge within your organization via internal forums and communities of practice 

Required qualifications, capabilities, and skills

Demonstrated proficiency in reliability, scalability, performance, security, enterprise system architecture, toil reduction, and other site reliability best practicesExtensive experience with cloud platform (AWS) in setting up infrastructure using Terraform.Fluent in at least one programming language such as: Python, Java/Spring Boot, .Net Advanced knowledge of software applications and technical processes with emerging depth in one or more technical disciplinesProficient knowledge and experience in observability such as white and black box monitoring, service level objective alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, and othersProficient with continuous integration and continuous delivery tools like Jenkins, GitLab, or TerraformProficient with container and container orchestration: (ECS, Kubernetes, Docker)Experience with troubleshooting common networking technologies and issuesExperience identifying and solving complex data structures and algorithms-related problemsActively self-educates, evaluates new technology, and recommends suitable onesPossess 7+ years of experience, ideally working with Data/Python applications in Production environment.Experience with automation tool/solution such as Ansible, Autosys, Control-M etc. 
Confirmar seu email: Enviar Email