Bangalore, Karnataka, India
17 hours ago
Associate Site Reliability Engineer

Role Overview (Learning):

 

The Associate Site Reliability Engineer’s (SRE) primary focus will be on acquiring and honing the essential skills required to excel in the role. They will work closely with more experienced engineers who will mentor and guide them throughout their journey. The responsibilities will encompass various facets of site reliability and cloud engineering, from incident response, application deployment, and configuration to system monitoring and security protocols.

Key Responsibilities:

Cloud Fundamentals: Build a foundational understanding of cloud design, hosting, and delivery in AWS, GCP and/or Azure. Contribute to CI/CD pipelines and developing IaC for our products and services. Gain an understanding of the vast array of service offerings from our cloud provider partners. Tooling & Workflow: Build proficiency in the team’s tech stack tooling to automate provisioning and manage infrastructure components efficiently using Infrastructure as Code (IaC). Using GIT you apply the best practices for version control, branching, and collaborative development to maintain an organized and efficient code management process. Additionally, gain the skills to utilize Jira effectively for issue tracking and streamlining workflows. Automation: Acquire scripting skills to automate routine tasks, data collection, and deployments. Automation will streamline operations and enhance efficiency. Peer Review: Participate in the code review process, scrutinizing contributions from peers and receiving valuable feedback on submissions which will continually improve coding and troubleshooting skills.  Security Protocols: Under the guidance of experienced SREs, individuals will gain familiarity with security measures and assist in their implementation. Learning how to safeguard systems will be an essential skill in site reliability engineering. Monitoring & Alerting: Contribute to setting up, configuring, and maintaining monitoring and alerting systems. The focus will be on understanding and improving key performance indicators, crucial for ensuring system reliability. Incident Response: Collaborate with other engineers to diagnose and resolve incidents. This will involve data gathering, issue tracking, and the application of problem-solving skills, vital for SRE success. Post-Incident Reviews: Actively engage in post-incident discussions to understand the root causes of issues and learn from the insights shared by senior team members. This learning process is integral to continuous improvement. Collaboration: Foster collaboration with team members across various roles, including developers, operations, and other SREs. Sharing knowledge and working towards team objectives will be a key to professional growth. Basic Troubleshooting: Develop skills in identifying and resolving straightforward issues using monitoring tools and logs. Cost Optimization: Assist in collecting, analyzing, and interpreting cloud cost data to identify trends, anomalies, and cost-saving opportunities. Agile & Scrum Practices: Learn and develop Agile methodologies and Scrum frameworks as they are integral to the engineering workflow. Actively participate in sprint planning, daily stand-ups, and sprint reviews. Documentation: Contribute to the creation and updating of procedural guides, processes, and troubleshooting documentation. On-Call Support: Participate in the on-call rotation and learn how to effectively respond to incidents and troubleshoot issues under high-pressure scenarios. This experience will build the ability to maintain system stability even during challenging situations.

Skills:

Ability to work effectively under pressure Basic understanding of system monitoring tools Eagerness to learn and adapt Strong communication skills Receptiveness to constructive feedback Foundational knowledge of cloud services and essential networking principles Time management

 

Confirmar seu email: Enviar Email