Lenovo is a US$57 billion revenue global technology powerhouse, ranked #248 in the Fortune Global 500, and serving millions of customers every day in 180 markets. Focused on a bold vision to deliver Smarter Technology for All, Lenovo has built on its success as the world’s largest PC company with a full-stack portfolio of AI-enabled, AI-ready, and AI-optimized devices (PCs, workstations, smartphones, tablets), infrastructure (server, storage, edge, high performance computing and software defined infrastructure), software, solutions, and services. Lenovo’s continued investment in world-changing innovation is building a more equitable, trustworthy, and smarter future for everyone, everywhere. Lenovo is listed on the Hong Kong stock exchange under Lenovo Group Limited (HKSE: 992) (ADR: LNVGY).
This transformation together with Lenovo’s world-changing innovation is building a more inclusive, trustworthy, and smarter future for everyone, everywhere. To find out more visit www.lenovo.com, and read about the latest news via our StoryHub. Description and Requirements
We are seeking a skilled Cloud Observability Engineer to design and implement comprehensive monitoring and observability solutions for our cloud infrastructure. This role is responsible for building scalable monitoring systems that provide real-time visibility into system health and performance across Linux, OpenStack, and Kubernetes environments.
Key Responsibilities:
1. Monitoring System Design & Implementation:
Design and deploy end-to-end monitoring solutions for cloud infrastructure and core services.
Implement monitoring pipelines using Prometheus for metrics collection and Zabbix for alerting.
Ensure comprehensive coverage of system health and performance metrics.
2. Architecture Optimization & Troubleshooting:
Optimize monitoring architectures for scalability and low-latency data processing.
Troubleshoot complex monitoring issues including metric collection failures and performance bottlenecks.
Implement high-availability monitoring solutions with efficient resource utilization.
3. Automation & Tool Development:
Develop automation tools for monitoring workflows using Golang and Python.
Create dynamic alert generation systems and anomaly detection capabilities.
Integrate monitoring solutions with CI/CD pipelines and cloud-native workflows.
4. Collaboration & Integration:
Work closely with infrastructure and DevOps teams to align observability strategies with product requirements.
Integrate monitoring systems with AI-driven analytics and cloud platform services.
Provide monitoring insights to support performance optimization and capacity planning.
Qualifications:
- Bachelor's degree or above in Computer Science, Software Engineering, or related fields.
- 3+ years of experience in cloud monitoring engineering with expertise in Prometheus and Zabbix.
- Strong proficiency in Linux system monitoring and cloud platform metrics (OpenStack, Kubernetes).
- Expertise in programming languages: Golang for high-performance applications and Python for automation.
- Experience with distributed tracing, logging frameworks, and time-series databases.
- Knowledge of CI/CD pipelines and GitOps practices for monitoring infrastructure.
- Analytical mindset with strong problem-solving and performance tuning skills.
- Experience with Grafana dashboards and log analysis tools (ELK Stack) preferred.