Systems Development Engineer, Managed Edge Compute (Amazon Robotics)

Austin, TX, US

2 hours ago

Amazon.com

We're seeking a Systems Development Engineer to join the Unified Workcell Compute team. This is a hands-on, high-impact role where you'll design and build systems that manage Amazon's edge device fleet — over a million devices across thousands of locations worldwide. You'll work at the intersection of cloud infrastructure, device management, robotics systems, and operational excellence, solving complex technical problems that enable Amazon's robotics and fulfillment operations to scale globally.

As a SysDE II, you'll be a strong individual contributor who delivers high-quality technical solutions, contributes to architectural discussions, and builds reliable systems that enable robotics and automation teams to deploy and manage their edge compute solutions with the same ease as deploying to AWS. You'll work within established technical strategies while identifying opportunities for improvement, translating well-scoped business problems into concrete technical solutions, and balancing short-term delivery with long-term system health. This role requires solid technical depth across multiple domains — Linux systems, AWS services, IoT platforms, robotics compute infrastructure, and distributed systems — combined with the ability to partner effectively with engineers across the team and organization.

Key job responsibilities

- Build and maintain resilient, scalable distributed systems that operate at Amazon scale, contributing to the management of robotics device fleets across thousands of sites with 99.99%+ availability requirements.
- Contribute to the technical strategy for your team's systems within the UWC architecture, participating in decisions around hyperscale deployments, robotics compute patterns, fleet management, and edge device automation.
- Participate in architectural reviews and design discussions across UWC and robotics customer teams, contributing technical input on device lifecycle management, software distribution, multi-compute workcell assistance, and operational excellence patterns.
- Develop automation solutions using Python, Rust, CDK, and AWS services that eliminate entire classes of operational load and enable self-service for robotics solution teams.
- Implement and optimize Linux-based systems, OS image creation pipelines (Yocto/mkosi), and BSP solutions for diverse robotics hardware platforms including x86, ARM, NVIDIA GPU systems, and embedded devices.
- Create tooling and frameworks that enable robotics teams to provision, configure, and manage their edge compute fleets — from AI perception systems to manipulation robotics — with minimal hands-on-keyboard time.
- Apply established standards for engineering, testing, and operational excellence best practices, and suggest improvements to processes within your team.
- Identify and implement opportunities to streamline or eliminate excess processes, improving agility and reducing complexity for robotics teams building on UWC.
- Proactively identify and escalate risks at the product and service level, contributing to the resilience, performance, and cost efficiency of UWC systems aiding critical robotics operations.
- Troubleshoot complex production issues across the full stack — from robotics device hardware and Linux kernel to AWS cloud services — identifying patterns and implementing solutions that prevent future incidents.
- Partner with robotics solution teams (Amazon Robotics, manipulation systems, AI perception, workcell automation) to translate their device management challenges and contribute to solutions that meet their specific requirements.
- Foster the growth of peers on your team through code reviews, knowledge exchange, and collectively problem-solving that raises the technical bar.
- Deliver solutions that are inventive, resilient, and extensible, making it easier for robotics teams to build on UWC.
- Participate in hiring and contribute to technical assessm

A day in the life
Your day might start by investigating an issue where robotics devices across multiple fulfillment centers are experiencing intermittent kernel panics during high-load operations. You dive deep into kernel logs, memory dumps, and device telemetry, correlating the failures with a recent driver update for NVIDIA GPU systems. You develop a Python or Rust-based diagnostic tool to capture more granular system metrics and partner with senior engineers to roll back the problematic driver version while working on a fix that addresses the underlying memory management issue.

Mid-morning, you're troubleshooting why a new OS image isn't booting correctly on ARM-based manipulation robotics devices. You boot into a recovery environment, examine the initramfs, trace through systemd unit reliances, and discover a race condition in the device initialization sequence. You modify the Yocto recipe to fix the boot ordering, test across multiple hardware variants, and document the pattern for other teams building custom images. You then join a sync with an Amazon Robotics team to help them debug why their software components are failing to deploy — walking through IoT certificate validation, network linkage from the edge device, and AWS IAM permissions until you identify a misconfigured security group.

After lunch, you're participating in a code review for a new credential rotation service — providing written feedback on error handling patterns, memory safety, and how to better structure the state machine for resilience. You spend time optimizing a Linux system configuration that's causing performance bottlenecks on AI perception systems — configuring and tuning Linux system parameters to enable high-performance compute workloads. You pair with a teammate who's working through a complex Yocto build failure, exchanging what you know about layer reliances and BitBake recipe inheritance while partnering on debugging techniques.

The afternoon includes answering to a page where devices in a specific building can't link to AWS IoT Core. You systematically eliminate possibilities — checking DNS resolution, testing TLS handshakes, examining certificate chains, and analyzing network packet captures — until you discover a misconfigured firewall rule blocking MQTT traffic. You implement a monitoring enhancement to detect this class of issue proactively across all sites. You then contribute to a technical design document proposing improvements to UWC's device provisioning workflow that will reduce provisioning time from 20 minutes to under 10 minutes by parallelizing certificate generation and optimizing the Linux boot sequence. You'll end your day reviewing system metrics across the fleet, flagging devices with degraded disk I/O that need proactive maintenance, and syncing with your team on priorities for tomorrow.

Amazon offers a full range of benefits that support you and eligible family members, including domestic partners. Benefits can vary by location, the number of regularly scheduled hours you work, length of employment, and job status such as seasonal or temporary employment. The benefits that generally apply to regular, full-time employees include:
1. Medical, Dental, and Vision Coverage
2. Maternity and Parental Leave Options
3. Paid Time Off (PTO)
4. 401(k) Plan

If you are not sure that every qualification on the list above describes you exactly, we'd still love to hear from you! At Amazon, we value people with unique backgrounds, experiences, and skillsets. If you’re passionate about this role and want to make an impact on a global scale, please apply!

About the team
The Unified Workcell Compute (UWC) team is at the forefront of Amazon's robotics and automation efforts, building and operating the foundational device management platform for Amazon's on-premise edge compute fleet. Our services manage over a million robotic devices across thousands of locations worldwide - from the latest NVIDIA GPU offerings enabling AI perception efforts to bleeding-edge manipulation robotics systems, industrial PCs, thin clients, Drive Units, and embedded devices across Amazon's global fulfillment network.
Our mission is to enable robotics solution teams to deploy to Operations buildings with the same self-service, ownership, and accountability as deploying to AWS cloud. We're revolutionizing Amazon's logistics and fulfillment operations by pushing the boundaries of what's possible in automation and compute management at unprecedented scale.
We're a team of builders who value automation, operational excellence, and customer obsession. We own a critical technology ecosystem that powers device provisioning, software distribution, credential management, and fleet operations for robotics workcells and fulfillment systems. Our work directly impacts millions of customer orders and enables Amazon's promise to fast, reliable delivery. We're solving problems that few organizations face, building systems that have never existed before, and defining the future of edge compute management for robotics at Amazon scale.
We foster a culture that encourages personal and professional growth, empowering our team members to continually expand their skills and knowledge. Work-life balance is a priority for us, and we strive to create an environment where our team can thrive both professionally and personally.

Mostrar Mais

Salvar & Candidatar-se depois Applying Later... Click to ApplyI AppliedDidn't Apply

Confirmar seu email: Enviar Email

Candidatar-se à essa vaga

Próxima Vaga »

Todos os Empregos de Amazon.com

Vagas de emprego de 350 Amazon.com em Austin, TX Vagas de emprego de 7,976 Amazon.com em US