Austin, TX, US
13 hours ago
Senior Software Engineer - SoC DevOps, MLA-MI - Annapurna Labs
The Senior SoC Software DevOps Engineer role centers on enabling the rapid and reliable development of software for AWSs most advanced custom machine learning chips. This position is critical to supporting the Trainium and Inferentia families of silicon which power large scale AI training at AWS. The engineer will serve as the primary owner of infrastructure that directly affects how quickly software teams can iterate on code for both pre silicon simulation environments and post silicon production deployments. By building robust automation and tooling the role ensures that tape outs for new chips stay on schedule and that software is ready to function immediately when first silicon becomes available. This work has a direct impact on AWSs ability to deliver advanced ML infrastructure to its largest customers.

This role operates at the intersection of hardware and software requiring deep expertise in infrastructure engineering to solve unique challenges such as coordinating releases across isolated environments and validating firmware on real silicon. It is a foundational position for the SoC software teams as it frees engineers from infrastructure burdens allowing them to focus on feature development. Success in this role will be measured by improvements in development velocity release quality and the stability of systems that support multiple teams. The position demands a proactive approach to identifying bottlenecks and a strong ability to operate within novel technical contexts without prior domain knowledge in machine learning or chip design.

Key job responsibilities
The engineer will own the end to end CI/CD pipelines and release processes for all SoC software components including firmware hardware abstraction layers and modeling tools. This involves designing maintaining and evolving systems that produce reliable releases for both internal verification teams and external AWS services. A key task is ensuring these pipelines function across heterogeneous environments such as Corp networks and VPC. The role requires building qualification workflows that guarantee software meets strict quality standards before reaching customers or verification teams.

Another core duty is developing hardware in the loop test infrastructure that validates SoC software on actual silicon in laboratory and automated testing settings. This includes creating frameworks to run tests on real chips simulate pre silicon environments and integrate results into continuous integration workflows. Additionally the engineer must build observability tools such as dashboards that track build health test coverage and pipeline performance along with alerting systems that notify teams of regressions. A significant focus will be on identifying and removing friction in development workflows such as slow build times or complex release steps using data driven insights to prioritize improvements that accelerate team productivity. The role also involves solving novel problems like bridging disconnected environments and orchestrating synchronized releases across multiple domains.

About the team
We're part of the SoC Software organization within Annapurna Labs (AWS). Our three software teams — uCode, HAL (Hardware Abstraction Layer), and Modeling — build the firmware, drivers, and virtual platforms for AWS's custom ML accelerator chips. We operate like a startup: small teams, high ownership, direct impact on AWS's most strategic silicon programs. This DevOps engineer will work across all three teams, with a mandate to improve velocity, quality, and developer experience for the entire SoC software organization.

Our team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we're building an environment that celebrates knowledge-sharing and mentorship.
Confirmar seu email: Enviar Email