Seattle, WA, US
1 day ago
Sr. Quality & Reliability Engineer, Hardware Engineering Services
AWS Infrastructure Services owns the design, planning, delivery, and operation of all AWS global infrastructure. In other words, we’re the people who keep the cloud running. We support all AWS data centers and all of the servers, storage, networking, power, and cooling equipment that ensure our customers have continual access to the innovation they rely on. We work on the most challenging problems, with thousands of variables impacting the supply chain — and we’re looking for talented people who want to help.

We are seeking a Senior Component Quality & Reliability Engineer to own the end-to-end quality and reliability of liquid cooling components used in next-generation server systems. This role will focus on ensuring that liquid cooling solutions meet performance, durability, and field reliability targets across both new product introduction (NPI) and sustaining phases.

You will serve as the primary owner for liquid cooling component quality and reliability, including cold plates, pumps, manifolds, hoses, quick disconnects, and CDU interfaces. You will drive qualification strategy, monitor supplier quality, and provide field performance insights. This is a highly cross-functional role requiring strong technical judgment, structured problem-solving, and the ability to influence design and manufacturing decisions.


Key job responsibilities
• Define and execute reliability validation strategies for liquid cooling components and subsystems
• Develop test plans (e.g., HALT, stress testing, lifecycle validation) aligned to real-world use conditions
• Lead DFMEA and Design for Reliability (DFR) activities to identify and mitigate risks early
• Partner with system teams to ensure proper integration testing and margin validation
• Drive root cause analysis for component qualification and fleet failures
• Drive corrective and preventive actions with internal teams, manufacturing partners and component suppliers
• Translate failure mechanisms into actionable design, material, and process improvements
Lead Supplier Quality
• Define critical-to-quality requirements and process controls for suppliers
• Conduct supplier audits and support qualification readiness
• Partner with suppliers to improve process capability, reliability performance, and defect detection
• Analyze fleet performance data (e.g., failure trends, AFR, ARR) for liquid cooling components
• Apply statistical methods (e.g., Weibull analysis) to predict risk and inform actions
• Drive systemic issue identification and resolution across platforms
Influence Cross-Functional Decisions
• Communicate risks clearly
• Provide data-driven input to design and architecture tradeoffs


About the team
This team is responsible for the end-to-end lifecycle of mechanical components that enable high-performance, large-scale data center systems. Engineers work as primary owners of their components, partnering closely with suppliers and manufacturing teams to ensure solutions perform reliably from initial development through deployment in the field.
Our mission is to deliver highly reliable, scalable solutions that support increasingly demanding compute workloads. This includes solving challenges related to performance and long-term operation in real-world data center environments.
Engineers develop deep expertise in their component domains, enabling more effective design reviews, targeted validation strategies, and rapid resolution of issues observed during development and in production systems.
Confirmar seu email: Enviar Email