Site Reliability Engineer - Lead
- Drive SRE team by defining the scope & process w.r.t availability, reliability and scalability of critical production systems by aligning with the SLA needs of the business.
- Closely work with Prod-Ops team to refine the process for Production incident reporting, post-mortem report and conduct retrospective sessions. Assist Prod-Ops for resolving critical incidents in production environment.
- Define the application monitoring, alerting and reporting framework with required documentation.
- Collaborate with the solution architects and application development leads to co-create robust and scalable application framework by providing best practices in design & coding.
- Collaborate with development & other relevant teams to define non-functional-requirements for various applications and ensure adherence.
- Lead the resiliency validation exercises to identify area of improvements w.r.t availability, reliability and scalability. Also ensure the comprehensiveness of contingency plan.
- Finding the automation opportunities in various mundane tasks related to monitoring & reporting of application/system KPIs.
- Engage with the Infra/ProdOps team to forecast capacity requirements.
- Minimum Bachelor of Computer Science Degree with equivalent work experience of 12 years.
- Minimum of 3+ years of hands on experience in container technology such as Red Hat Openshift, Docker, Kubernetes and Dev Ops Tools such as Jenkins, Ansible, Bitbucket.
- Minimum 3+ years of hands on experience in application monitoring technology such as CA Wily, Grafana, Kibana, Prometheus, Elastic Search.
- Thorough understanding & practical experience in Microservices ecosystem and Service oriented architecture.
- Possess good technical knowledge in implementing, troubleshoot, performance tuning of physical & virtual infrastructure.
- Hands on experience in chaos engineering.
- Ability to work effectively in small (often ad-hoc) teams , experience in managing a team of associates.
- Experience with Agile/Scrum software development approach and familiarity with TDD & BDD.
- Ability to manage stakeholders & experience in presenting to the senior management.
- Strong analytical and problem-solving skills.
- Experience in banking/finance industry is a plus.
- Strong interpersonal and communication skills.
- Positive attitude towards continuous learning.