System Development Engineer (Level 5), CTOS
Amazon.com
Amazon Central Technical Operations Services (CTOS) maintains high availability for the Amazon Retail Website and is the team that provides the first line of incident response to protect it. We make customer impacting events shorter, less frequent, severe, and impactful by providing large scale incident and response management. The Amazon Retail Website has hundreds of millions of customers globally who can be impacted by these types of incidents; the work we do to mitigate them helps real people at a tremendous scale.
It is a complex and constantly changing space, operating across dozens of countries, consisting of thousands of cloud-based services, built and maintained by tens of thousands of engineers, and serving hundreds of millions of customers. When it experiences major issues, your team will respond within minutes to ensure the best course of action is taken and impacts are minimized. This experience will expose you to everything Amazon has to offer, providing opportunity to interact with leaders from across the Stores and Corporate businesses.
This position will be part of a globally distributed team of 45+ professionals across Seattle, Austin, Dublin, and Sydney providing around the clock coverage.
As a System Development Engineer, you will build tooling to automate the detection and resolution of issues within Amazon’s Retail Website infrastructure. You will also spend a portion of your time of your time directing the resolution of high visibility incidents by leading conference calls, taking notes to collect data and help improve our processes. Using data and insights learned from those incidents you will drive further improvements into our automation, tooling, and processes so that the next event is shorter, less severe, or avoided entirely. You will participate on project teams to expand use of our tooling to additional areas across Amazon. This position will be part of a globally distributed team of 20+ engineers across Austin, Dublin, and Sydney to allow for 24x7 coverage. Each group will work 10-hour shifts for 4 days a week. If you're looking for a team with great growth potential and an opportunity to make a huge impact, this is the team to join.
Key job responsibilities
• Drive the resolution of large-scale customer impacting issues as part of a globally rotating team
• Design, build, and enhance incident detection and management tools
• Participate in Agile sprints to evolve business processes and technologies
• Create and review documentation, design new standard operating procedures
• Identify and troubleshoot recurring platform issues and own projects to drive improvements
• Mentor peers in your areas of technical and operational strength
It is a complex and constantly changing space, operating across dozens of countries, consisting of thousands of cloud-based services, built and maintained by tens of thousands of engineers, and serving hundreds of millions of customers. When it experiences major issues, your team will respond within minutes to ensure the best course of action is taken and impacts are minimized. This experience will expose you to everything Amazon has to offer, providing opportunity to interact with leaders from across the Stores and Corporate businesses.
This position will be part of a globally distributed team of 45+ professionals across Seattle, Austin, Dublin, and Sydney providing around the clock coverage.
As a System Development Engineer, you will build tooling to automate the detection and resolution of issues within Amazon’s Retail Website infrastructure. You will also spend a portion of your time of your time directing the resolution of high visibility incidents by leading conference calls, taking notes to collect data and help improve our processes. Using data and insights learned from those incidents you will drive further improvements into our automation, tooling, and processes so that the next event is shorter, less severe, or avoided entirely. You will participate on project teams to expand use of our tooling to additional areas across Amazon. This position will be part of a globally distributed team of 20+ engineers across Austin, Dublin, and Sydney to allow for 24x7 coverage. Each group will work 10-hour shifts for 4 days a week. If you're looking for a team with great growth potential and an opportunity to make a huge impact, this is the team to join.
Key job responsibilities
• Drive the resolution of large-scale customer impacting issues as part of a globally rotating team
• Design, build, and enhance incident detection and management tools
• Participate in Agile sprints to evolve business processes and technologies
• Create and review documentation, design new standard operating procedures
• Identify and troubleshoot recurring platform issues and own projects to drive improvements
• Mentor peers in your areas of technical and operational strength
Confirmar seu email: Enviar Email
Todos os Empregos de Amazon.com