Austin, TX, US
11 hours ago
Sr. Manufacturing Engineer, Trainium Manufacturing, Quality and Reliability
Leading the reverse manufacturing engineering operations for AI Servers and Systems based on Trainium chips across cross-geographical reverse logistics sites, ODMs and CMs. As part of the Manufacturing, Quality and Reliability Team in AWS Annapurna Labs focused on Machine Learning products that designs cutting AI platforms for the world’s largest Cloud Services provider.

AWS Utility Computing (UC) provides product innovations — from foundational services such as Amazon’s Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2), to consistently released new product innovations that continue to set AWS’s services and features apart in the industry. As a member of the UC organization, you’ll support the development and management of Compute, Database, Storage, Internet of Things (Iot), Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for their cloud services.

Within AWS, Annapurna Labs team is building the next generation cloud server infrastructure. Our success depends on delivering world-class server infrastructure; we're handling massive scale and rapid integration of emergent technologies. Our servers include accelerators such as AWS Trainium and AWS Inferentia which are machine learning products designed to deliver high performance at low cost.

The Trainium Manufacturing, Quality and Reliability Team is part of AWS Annapurna Labs focused on Machine Learning products that designs cutting AI platforms for the world’s largest Cloud Services provider. We are seeking a talented and motivated Manufacturing Engineer with a proven track record of implementing best in class test techniques and processes within a complex supply chain. As a member of the Cloud-Scale Machine Learning Acceleration team, you will be the interface between the system engineering team and the ODM and CM partners.
As a Senior Manufacturing Engineer you will engage with an experienced cross-disciplinary staff to own and drive reverse logistics operations across datacenter reverse logistics and ODM RMA programs as a unified program. Drive program ownership for reverse manufacturing operations equivalent to forward manufacturing. You will work closely with an internal inter-disciplinary team, and outside partners to drive key aspects of failure triage, manufacturing retest, test infrastructure and execution. A successful candidate will be responsive, flexible and able to succeed within an open collaborative peer environment. You will:

* Be responsible for the failure triage and retest of servers and components that have failed during forward manufacturing or in the datacenter.
* Drive manufacturing process improvements to address reliability issues and concerns.
* Lead identifying and validating product/component risks and work with design teams to mitigate them and define the test methodology and test coverage to improve product quality
* Establishing and maintaining re-test capacity, infrastructure and requirements across datacenter reverse logistics and ODM sites.
* Provide technical leadership and mentor engineers.
* Working with multiple vendors and ODMs to standardize component manufacturing and reliability expectations.

The successful candidate will be capable of making wide-ranging business decisions on behalf of the organization and willing to “roll up sleeves and do what needs to get done” to consistently deliver results. We’re changing an industry, and we want individuals who are ready for this challenge.

Key job responsibilities
- Manage warehouse inventory tracking, including card locations and ownership assignments for internal tracking
- Oversee spare parts inventory for testing equipment deployed at ODMs
- Review test logs for customer-returned cards and conduct technical assessments for RMA acceptance
- Generate comprehensive 8D reports while monitoring manufacturing and PCB revision changes to communicate resolved identified issues to customer
- Analyze failure analysis reports on RMA FA requests; correlating data with yield metrics to identify patterns across component vendors, manufacturing sites, and individual testers
- Track RMA cases from creation through final closure
- Provide disposition path for unique customer returns based on failure mode/mechanism and rework process
- Coordinate with manufacturing teams on revision changes and issue resolution
- Conduct and coordinate reliability validation for reworked components on product; validating reliability of rework process in terms of solderability and ensuring no unintended consequences
- Evaluate, investigate and introduce new manufacturing technology and methodology to enhance product quality and production efficiency at ODM and CM
- Develop or adapt manufacturing process at the ODM and CM, including defining fixture requirements, critical assembly requirements, test methodology, signal integrity, power and heat management requirement
- Work with engineering teams to clearly represent process and reviews to enable smooth New Product Introduction and changes
- Support cost reduction and sustaining activities

About the team
Annapurna Labs is a wholly owned subsidiary of AWS, focused on developing custom silicon and servers including the Nitro(K2), Graviton, Inferentia, and Trainium families of processors.
Machine Learning Annapurna functions as a vertically integrated team including software, firmware, hardware, and silicon design in a single organization.
We are the Trainium Servers and Systems organization under MLA focused on Hardware Development, Software Development, Fleet Ops Systems, and Manufacturing, Quality, and Reliability.
This position is in the Manufacturing, Quality and Reliability team.
Confirmar seu email: Enviar Email