Lead Engineer, ML Network Stack - Annapurna Labs
Amazon.com
We are seeking an experienced engineer and technical leader to join our team that owns the network stack for EC2 distributed AI/ML systems. The team develops support for a variety of frameworks and communication libraries including NCCL, NVSHMEM, NIXL, NCCL GIN, and Perplexity kernels. Solid knowledge of Linux, networking, and performant coding is important. Experience with embedded systems is valued, and experience with high-speed networking or HPC/RDMA interconnects is highly valued.
If you like solving hard problems, want to work with HPC and ML customers, iterate fast and deliver meaningful solutions at scale, then come join us! This truly is a role at the forefront of AI/ML—you'll be working on features for the largest clusters, with the largest customers, for the largest AI models. This is a role for a technical lead with the expectation to grow into a technical manager role. We are specifically seeking candidates who want to develop their career as a technical manager.
The organization you would be joining is Annapurna Labs, an integral part of AWS that develops hardware and software components that are critical building blocks for EC2 infrastructure. Every instance in EC2 is running some type of hardware designed by Annapurna Labs. We specialize in designing software, systems, and chips that optimize the AWS customer experience.
Key job responsibilities
Be the lead engineer on a team that builds and maintains the infrastructure that monitors and reports on functionality and performance of massive testing workloads run at scale. Use internal Amazon CI/CD tools, Linux, and public AWS products to automate the delivery of our software to customers, saving developer time. Write Python code that effortlessly spools up large clusters and runs benchmarks and applications for ML and HPC workloads. Use AWS Managed Grafana and Athena to digest the massive amount of performance data generated by these workloads and create dashboards for developers and stakeholders. Invent automatic mechanisms to alert developers to functional and performance regressions so they never reach reach customers. Manage the complexity of infrastructure that covers many instance types, software stacks, Linux operating systems, cutting-edge releases and make it easy to evolve.
About the team
The organization you would be joining is Annapurna Labs, an integral part of AWS that develops hardware and software components that are critical building blocks for EC2 infrastructure. Every instance in EC2 is running some type of hardware designed by Annapurna Labs. We specialize in designing software, systems, and chips that optimize the AWS customer experience.
Diverse Experiences
AWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying.
Work/Life Balance
We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud.
Mentorship & Career Growth
We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional.
If you like solving hard problems, want to work with HPC and ML customers, iterate fast and deliver meaningful solutions at scale, then come join us! This truly is a role at the forefront of AI/ML—you'll be working on features for the largest clusters, with the largest customers, for the largest AI models. This is a role for a technical lead with the expectation to grow into a technical manager role. We are specifically seeking candidates who want to develop their career as a technical manager.
The organization you would be joining is Annapurna Labs, an integral part of AWS that develops hardware and software components that are critical building blocks for EC2 infrastructure. Every instance in EC2 is running some type of hardware designed by Annapurna Labs. We specialize in designing software, systems, and chips that optimize the AWS customer experience.
Key job responsibilities
Be the lead engineer on a team that builds and maintains the infrastructure that monitors and reports on functionality and performance of massive testing workloads run at scale. Use internal Amazon CI/CD tools, Linux, and public AWS products to automate the delivery of our software to customers, saving developer time. Write Python code that effortlessly spools up large clusters and runs benchmarks and applications for ML and HPC workloads. Use AWS Managed Grafana and Athena to digest the massive amount of performance data generated by these workloads and create dashboards for developers and stakeholders. Invent automatic mechanisms to alert developers to functional and performance regressions so they never reach reach customers. Manage the complexity of infrastructure that covers many instance types, software stacks, Linux operating systems, cutting-edge releases and make it easy to evolve.
About the team
The organization you would be joining is Annapurna Labs, an integral part of AWS that develops hardware and software components that are critical building blocks for EC2 infrastructure. Every instance in EC2 is running some type of hardware designed by Annapurna Labs. We specialize in designing software, systems, and chips that optimize the AWS customer experience.
Diverse Experiences
AWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying.
Work/Life Balance
We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud.
Mentorship & Career Growth
We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional.
Confirmar seu email: Enviar Email
Todos os Empregos de Amazon.com