At Schwab, you’re empowered to make an impact on your career. Here, innovative thought meets creative problem solving, helping us “challenge the status quo” and transform the finance industry together.
We believe in the importance of in-office collaboration and fully intend for the selected candidate for this role to work on site in the specified location(s).
In this role, you’ll lead the technical vision and architecture for our Site Reliability Engineering (SRE) and AIOps function, shaping how reliability, automation, and intelligent operations scale across the enterprise. This is not a traditional production support role. It requires engineering / coding experience. You’ll work at the intersection of cloud-native platforms, distributed systems, and AI-driven operations—partnering closely with Engineering, Product, Security, and Infrastructure leaders to build resilient, self-healing systems that support millions of clients. This is a highly visible leadership role where your expertise influences both technology strategy and how teams operate day to day.
Key Responsibilities
SRE Architecture & Reliability Strategy — Define and own the end-to-end reliability architecture, including SLO/SLI frameworks, error budget policies, observability standards, and resilience patterns across distributed microservices environments.AIOps Platform Architecture — Design and architect the AIOps platform encompassing ML-driven anomaly detection, predictive alerting, automated root cause analysis, event correlation, and intelligent remediation workflows.Infrastructure & Platform Design — Lead architecture decisions for cloud-native infrastructure (GCP/AWS/Azure), Kubernetes orchestration, service mesh (Istio/Envoy), infrastructure-as-code (Terraform/Pulumi), and multi-region disaster recovery strategies.Observability & Monitoring Architecture — Architect the unified observability stack integrating metrics, logs, traces, and events using technologies such as OpenTelemetry, Grafana, Datadog, and custom ML pipelines for intelligent alerting.Automation & Self-Healing Systems — Drive the architecture of automated remediation frameworks, self-healing infrastructure, chaos engineering pipelines, and progressive deployment strategies (canary, blue-green, feature flags) to achieve zero-touch operations.Technical Leadership & Governance — Establish architecture review boards, technical standards, design patterns, and reference architectures; lead technical due diligence and drive consistency across SRE and platform teams.Team Development & Mentorship — Build, mentor, and grow a team of senior SRE architects and engineers; foster a culture of engineering excellence, continuous learning, and innovation in reliability and AI-driven operations.Stakeholder & Executive Engagement — Partner with Engineering, Product, Security, and Infrastructure leadership to align reliability and AIOps investments with business priorities; present technical strategies to executive stakeholders.What you haveRequired Qualifications
12+ years of experience in software development and engineering, infrastructure, or SRE, with 5+ years in a senior architecture or technical leadership role.Deep expertise in distributed systems, cloud-native architectures, and large-scale production environments.Hands-on experience with Kubernetes, Docker, service mesh, CI/CD pipelines, and infrastructure-as-code tools.Strong understanding of ML/AI concepts and their application to operational intelligence — anomaly detection, predictive scaling, log analysis, and automated diagnostics.Proven experience designing observability platforms using OpenTelemetry, Prometheus, Grafana, Datadog, Splunk, or equivalent.Expertise in incident management frameworks, chaos engineering, and SLO-driven reliability practices.Experience with major cloud platforms (AWS, GCP, Azure) at scale.Strong communication and executive presence with the ability to translate complex technical concepts for non-technical stakeholders.In addition to the salary range, this role is also eligible for bonus or incentive opportunities.
What’s in it for you
At Schwab, you’re empowered to shape your future. We champion your growth through meaningful work, continuous learning, and a culture of trust and collaboration—so you can build the skills to make a lasting impact. Our Hybrid Work and Flexibility approach balances our ongoing commitment to workplace flexibility, serving our clients, and our strong belief in the value of being together in person on a regular basis.
We offer a competitive benefits package that takes care of the whole you – both today and in the future:
401(k) with company match and Employee stock purchase planPaid time for vacation, volunteering, and 28-day sabbatical after every 5 years of service for eligible positionsPaid parental leave and family building benefitsTuition reimbursementHealth, dental, and vision insurance Apply Save job