SRE, Chaos Engineering, Search Resilience
Amazon.com
Join the Chaos Engineering team in Amazon Search. We perform experiments in production to harden Search against outages and make sure that whenever a customer searches for products, they find what they are looking for.
In this role you will:
- Design, implement, execute, and automate chaos experiments to continuously test Amazon Search' resilience against hardware failures, dependency outages, traffic spikes and more.
- Collaborate with service owners to remedy vulnerabilities, minimize blast radius and harden Amazon Search.
- Research tools and practices in resilience engineering and adopt them as appropriate.
Joining this team, you’ll experience the benefits of working in an entrepreneurial environment, while leveraging the resources of Amazon.com (AMZN), one of the world's leading internet companies. We are a diverse, customer-obsessed and passionate team located in Meguro, Tokyo.
Key job responsibilities
- Develop and maintain our chaos experiment orchestrator
- Design, execute, automate, and maintain chaos experiments
- Develop and maintain our distributed load generator
- Develop and maintain our petabyte-scale log archival and query service
- Join a 12/12 on-call rotation for incident response and mitigation
In this role you will:
- Design, implement, execute, and automate chaos experiments to continuously test Amazon Search' resilience against hardware failures, dependency outages, traffic spikes and more.
- Collaborate with service owners to remedy vulnerabilities, minimize blast radius and harden Amazon Search.
- Research tools and practices in resilience engineering and adopt them as appropriate.
Joining this team, you’ll experience the benefits of working in an entrepreneurial environment, while leveraging the resources of Amazon.com (AMZN), one of the world's leading internet companies. We are a diverse, customer-obsessed and passionate team located in Meguro, Tokyo.
Key job responsibilities
- Develop and maintain our chaos experiment orchestrator
- Design, execute, automate, and maintain chaos experiments
- Develop and maintain our distributed load generator
- Develop and maintain our petabyte-scale log archival and query service
- Join a 12/12 on-call rotation for incident response and mitigation
Confirmar seu email: Enviar Email