Prague, Czechia
143 days ago
Lead Site Reliability Engineer

As a Lead Site Reliability Engineer, you’ll be at the forefront of building scalable, resilient, and observable systems that power Tricentis SaaS products globally. This is a hands-on engineering leadership role—balancing technical delivery, process ownership, and team mentorship.

You will drive initiatives across multiple products, shape SRE standards, and serve as a trusted partner to both engineering and product leaders. You will be responsible for elevating engineering quality and reliability while enabling scale and speed.

Your Impact as an 🚀 

Lead and deliver cross-cutting initiatives to improve platform scalability, resilience, and cost efficiency.

Architect and implement cloud-native infrastructure that supports multi-region, multi-tenant deployments.

Improve observability strategy across systems and teams—including SLOs, error budgets, and alerting standards.

Coach and mentor engineers, guiding technical design reviews and promoting engineering excellence.

Own post-incident analysis and ensure learning loops are completed with preventive action.

Influence product reliability from early-stage design to production readiness reviews.

Establish and evolve standards for deployments, operational readiness, and incident response.

Serve as a technical advisor for engineering and product managers across the org.

As a valuable member of our SRE team, you'll have the opportunity to 💪 

Drive architectural discussions and make decisions that influence the SRE org and wider engineering teams.

Define and evolve technical roadmaps and execution plans aligned with company goals.

Partner with peers in security, infrastructure, and product to drive platform-wide improvements.

Lead incident response for high-impact outages and continuously reduce incident recurrence.

Contribute to SRE hiring through interviews, onboarding, and process refinement.

Guide the adoption of modern tooling and practices across teams (e.g., GitOps, self-service platforms, chaos engineering).

Represent SRE in leadership forums, bringing insights, trade-offs, and forward-looking strategies.

Our Tech Stack 🌐 

AZURE , AWS, Terraform, GitHub Actions, Kubernetes, DataDog, Prometheus, Grafana, Betterstack, All-in-one incident management platform | incident.io , Jira and more

Our Culture 🦄 

We don't just preach our values; we embody them in everything we do. We are committed to creating an environment that empowers, supports, and includes individuals, where trust, transparency, creativity, curiosity, and continuous improvement thrive on a daily basis. 

About You 🎯 

6+ years of experience in SRE, Infrastructure, or DevOps roles, including technical leadership.

Expertise in building and operating production systems in public cloud (AWS or Azure).

Deep understanding of observability principles (SLOs, SLIs, metrics, traces, logs).

Strong experience with infrastructure-as-code, container orchestration, and CI/CD (Terraform, K8s, GitHub Actions).

Proven track record in leading technical projects, influencing architecture, and mentoring engineers.

Excellent communication and cross-functional collaboration skills.

Proactive, ownership-driven mindset with a passion for reliability and continuous improvement.

You can look forward to:

Flexible working schedule (no core hours) Learning and career growth opportunities 25 days paid time off 3 Sick Days 2 days of paid Volunteering Leave per year to get involved in your local community or in a cause that matters to you Hybrid work environment, with home-office allowanceMeal allowancePension Contribution  Life & Disability InsurancePaid Sickness leave  A team of passionate professionals who are experts in their fieldsEvents for employees to learn, celebrate and socialize (training sessions, hackathons, parties, sports events, board game gatherings, BBQs) and much more
Confirmar seu email: Enviar Email