Plano, TX, United States
17 hours ago
Sr Lead SRE

We are seeking a Delivery SRE leader who will ensure security applications are delivered with strong SDLC discipline and measurable reliability. This role partners closely with Product Owners and engineering leadership to challenge assumptions, sharpen the Definition of Done, and bake SRE requirements into design and build phases. The leader will govern operational readiness, quality gates, and resilience practices so that every release meets agreed SLOs and is production ready.

 

Key Responsibilities:

Define and enforce quality gates across requirements, design, secure coding, testing, release, and post-production monitoring, translate business objectives into clear, testable requirements that include reliability, availability, performance, security, and observability.Establish and manage SLOs/SLIs and error budgets; ensure they are integrated into product roadmaps and delivery plans, challenge Product Owners and teams to meet a rigorous, objective Definition of Done before release.Sample DoD checklist: SLOs defined and monitored; alerts tuned; runbooks and escalation paths in place; automated tests (unit, integration, security) passing; performance and capacity validated; resilience and failover tested; rollback verified; vulnerability findings remediated; compliance controls and audit artifacts complete; documentation and support readiness confirmed.Lead operational readiness reviews and triage risks; ensure timely remediation and prevention of recurrence through root-cause analysis and auto-remediation.Maintain logging, alerting, and monitoring platforms; ensure dashboards provide health and performance visibility. Govern CI/CD pipeline controls for security, reliability, and change management; promote automation to eliminate toil.Lead and participate in critical incident response (including outside business hours when needed); drive post-incident reviews and resilience improvements. Monitor delivery health and operational KPIs; lead continuous improvement across teams and productsOversee capacity planning and resilience management for large-scale, distributed systems, Partner with engineering on public cloud best practices (AWS or equivalent) for compute, storage, networking, messaging, automation (CloudFormation, Terraform), and data services.Build a culture of collaboration, reliability, and continuous improvement; coach teams to adopt DevOps and SRE principles. Partner with regional engineering leaders to drive operational best practices and consistent execution. Provide concise, outcome-focused updates to management and stakeholders; influence decisions across Product, Engineering, SRE, and Security.

 

Required Qualifications, Capabilities, and Skills

Formal training or certification with 5+ years supporting critical security-focused applications in large-scale environments and managing and mentoring teams.Experience with monitoring/logging tools (e.g., Splunk, AppDynamics) and dashboard technologies; Splunk Administrator certification desired.Strong grasp of SDLC, secure development, DevOps/CI/CD tooling; capable of implementing top-tier continuous improvement with root-cause analysis and auto-remediation.Effective under pressure; accountable, with excellent stakeholder management and communication skills.This position may require HSA system access. Enhanced screening (criminal and credit background checks, and/or other screening) is required prior to employment and annually thereafter.

Global team collaboration with flexibility to engage during critical incidents outside standard business hours

 

Preferred Qualifications, Capabilities, and Skills

Experience implementing and managing SLOs/SLIs, error budgets, and operational readiness reviews for distributed systems, including leading post-incident analysis and resilience improvements.Deep expertise in public cloud platforms (AWS or equivalent), infrastructure automation tools (CloudFormation, Terraform), and capacity planning for large-scale environments, with a track record of driving DevOps and SRE adoption across teams.

 

#CTC

 

 

Confirmar seu email: Enviar Email