USA
101 days ago
Staff Site Reliability Engineer
Attentive® is the AI-powered mobile marketing platform transforming the way brands personalize consumer engagement. Attentive enables marketers to craft tailored journeys for every subscriber, driving higher recurring revenue and maximizing campaign performance. Activating real-time data from multiple channels and advanced AI, the platform personalizes content, tone, and timing to deliver 1:1 messages that truly resonate.
With a top-rated customer success team recognized on G2, Attentive partners with marketers to provide strategic guidance and optimize SMS and email campaigns. Trusted by leading global brands like Neiman Marcus, Samsung, Wayfair, and Dyson, Attentive ensures enterprise-grade compliance and deliverability, supporting trillions of interactions across more than 70 industries. To learn more or request a demo, visit or follow us on , (formerly Twitter), or .
Attentive’s growth has been recognized by , and all thanks to the hard work from our global employees!
About the Role Our Platform Infrastructure team is the backbone of everything we do at Attentive, providing a resilient and cost-effective platform that seamlessly handles billions of events from over 100 million customers daily. We own everything from compute, persistence, and networking to observability and deployments. Joining our team offers a high-growth career opportunity to collaborate with some of the world’s most talented engineers in a high-performance, high-impact culture.
As part of the Infrastructure and Platform organization, the Production Engineering Team is focused on delivering a fast and reliable platform that empowers Attentive engineers to deliver solutions quickly and safely. We build scalable systems that automate routine tasks so we can focus on other impactful efforts. Reliability, scalability, and security are our areas of expertise. We focus on release, observability, and cost optimization. Our mission is to create robust platforms and tools that allow stakeholders to concentrate on delivering exceptional products.
As a Staff Engineer, you will take a strategic role in designing and implementing solutions that enhance the reliability and scalability of our systems, while mentoring others and influencing technical roadmaps across the organization.What You'll AccomplishDesign and Deliver High-Impact Solutions: Design and implement systems that enhance reliability, observability, traceability, and incident management, ensuring the platform scales effectivelyLead Strategic Initiatives: Take ownership of cross-team collaborations and drive impactful projects by providing technical leadership and guidancePartner Across Teams: Collaborate with engineers from AI/ML, Data, Platform, and Product teams to develop best-in-class servicesPartner with engineers from AI/ML, Data, Platform, Product, and other groups to deliver best-in-class servicesEstablish Standards and Best Practices: Define and enforce production standards, processes, and tools to ensure operational excellenceChampion Reliability Goals: Advocate for and implement SLIs, SLOs, and other reliability-focused metrics across the engineering organizationMentorship and Knowledge Sharing: Guide and mentor team members, fostering technical growth and helping to develop the next generation of engineering leadersInnovate and Inspire: Drive continuous improvement by bringing creative ideas and challenging the status quoYour Expertise7+ years of experience in Production Engineering, Backend Engineering, SRE, DevOps or similar roleProficient Problem-Solver: Strong coding ability in at least one language (e.g., Golang, Python, Java, Typescript) with the capability to solve complex issues through codeTrack Record of Success: Demonstrated experience delivering medium to large-scale projects that drive meaningful improvements in platform reliability and scalabilityReliability Expertise: Deep understanding of production reliability concepts, including SLIs, SLOs, and incident managementStrong Communicator: Excellent verbal and written communication skills with the ability to influence and collaborate across technical and non-technical teamsFast-Paced Experience: Familiarity with working in dynamic, reliability-focused production environments (preferred)What We UseOur infrastructure runs primarily in Kubernetes hosted in AWS’s EKSInfrastructure tooling includes Istio, Datadog, Terraform, CloudFlare, and HelmOur backend is Java / Spring Boot microservices, built with Gradle, coupled with things like DynamoDB, Kinesis, AirFlow, Postgres, Planetscale, and Redis, hosted via AWSOur frontend is built with React and TypeScript, and uses best practices like GraphQL, Storybook, Radix UI, Vite, esbuild, and PlaywrightOur automation is driven by custom and open source machine learning models, lots of data and built with Python, Metaflow, HuggingFace
Confirmar seu email: Enviar Email
Todos os Empregos de Attentive