DoorDash is building the world’s most reliable on-demand logistics engine. Behind the scenes, our Machine Learning Platform (MLP) powers critical real-time decision-making for millions of orders each day, supporting business-critical use cases like Ads, Groceries, Logistics, Fraud, and Search.
About the RoleWe’re looking for a Staff Software Engineer with deep expertise in ML model serving to drive the next generation of our inference platform. This is a highly technical, hands-on role: you’ll design and build systems that power real-time predictions across millions of requests per second, tackling challenges in reliability, efficiency, and cost-aware scaling. Success in this role requires both technical mastery and the ability to lead through collaboration. You’ll collaborate with core infrastructure teams (compute, storage, networking, dev platform) and with applied ML teams across Ads, Fraud, Logistics, Search, and more who depend on our platform to bring their models to production. You’ll also tap into the best of open-source frameworks and vendor solutions - contributing back where it makes sense - to accelerate innovation. As Staff Software Engineer, you’ll pair deep technical execution with influence on the roadmap, ensuring our serving systems scale reliably as model architectures and business needs evolve.
In this role, you will: Scale richer models at low latency - Design serving systems that handle large, complex models while balancing cost, throughput, and strict latency SLOs. Bring modern inference optimizations into production - Operationalize advances from the ML serving ecosystem (e.g. efficient caching, attention optimizations, batching, quantization) to deliver better user experience, latency, and cost efficiency across our fleet. Enable platform-wide impact - Build abstractions and primitives that let serving improvements apply broadly across many workloads, rather than point solutions for individual models. Leverage and contribute to OSS - Apply the best of the open-source serving ecosystem and vendor solutions, and contribute improvements back where it helps the community. Drive cost & reliability - Design autoscaling and scheduling across heterogeneous hardware (GPU/TPU/CPU), with strong isolation, observability, and tail-latency control. Collaborate broadly - Partner with ML engineers, infra teams, external vendors, and open-source communities to ensure our serving stack evolves with the needs of the business. Raise the engineering bar - Establish metrics & processes that improve developer velocity, system reliability, and long-term maintainability. We’re excited about you because… Have 8+ years of engineering experience, including building or operating large-scale, high-QPS ML serving systems. Bring deep familiarity with ML inference and serving ecosystems. Know how to leverage and extend open-source frameworks and evaluate vendor solutions pragmatically. Balance hands-on execution with long-term platform thinking, making sound trade-offs. Care deeply about reliability, performance, observability, and security in production systems. Lead by example - collaborating effectively, mentoring peers, and setting a high bar for craftsmanship. Nice To Haves GPU serving expertise - Experience with frameworks like NVIDIA Triton, TensorRT-LLM, ONNX Runtime, or vLLM, including hands-on use of KV caching, batching, and memory-efficient inference. Familiarity with deep learning frameworks (PyTorch, TensorFlow) and large language models (LLMs) such as GPT-OSS or BERT. Hands-on experience with Kubernetes/EKS, microservice architectures, and large-scale orchestration for inference workloads. Cloud experience (AWS, GCP, Azure) with a focus on scaling strategies, observability, and cost optimization. Prior contributions to OSS serving ecosystems (e.g., vLLM, Triton plugins, KServe) or active participation in developer communities.
Notice to Applicants for Jobs Located in NYC or Remote Jobs Associated With Office in NYC Only
We use Covey as part of our hiring and/or promotional process for jobs in NYC and certain features may qualify it as an AEDT in NYC. As part of the hiring and/or promotion process, we provide Covey with job requirements and candidate submitted applications. We began using Covey Scout for Inbound from August 21, 2023, through December 21, 2023, and resumed using Covey Scout for Inbound again on June 29, 2024.
The Covey tool has been reviewed by an independent auditor. Results of the audit may be viewed here: Covey