Site Reliability Engineer II - (Microsoft 365 Enterprise + Cloud)
We are looking for a Site Reliability Engineers (SRE) with the right mix of systems engineering, data science, software development, AI, on-line services experience, and passion for quality to envision, design, and deliver Microsoft 365 (M365) Enterprise + Cloud service offerings.
Team Overview: Within the vast framework of M365 Office Engineering Direct (OED), our SRE team is instrumental to the success of Exchange Online. With the service spanning hundreds of components, our goal is clear: ensure unmatched service availability and continually elevate user satisfaction.
What We Do & Our Impact: Our approach is layered and precise. By implementing proactive engineering solutions, we identify and tackle incidents head-on, ensuring limited disruptions. Monitoring, both comprehensive and nuanced, remains our cornerstone, adeptly capturing anomalies beyond the scope of conventional systems. As swift diagnostics steer our course, we channel our efforts towards automation, efficiently managing the incident lifecycle from detection to resolution. Additionally, with a commitment rooted in understanding our users, we meticulously prioritize and execute Design Change Requests, ensuring Exchange Online's evolution aligns with user expectations.
The Future – Artificial Intelligence (AI) & Machine Learning (ML) in Focus: As we look to the horizon, the fusion of AI and ML with our SRE practices beckons a transformative era for Online Cloud Services in M365. We are in the initial stages of integrating predictive analytics to anticipate issues before they manifest, allowing us to stay a step ahead. Customized ML models are being developed to intelligently sift through vast data lakes, identifying patterns and correlations previously overlooked. Our journey with AI and ML is not just about enhancement; it is about redefining reliability, precision, and the user experience in the M365 suite.