Site Reliability Engineer - Lead Site Reliability Engineer - Lead …

in Singapore, Singapore, Singapore
Permanent, Full time
Be the first to apply
in Singapore, Singapore, Singapore
Permanent, Full time
Be the first to apply
Site Reliability Engineer - Lead
  • Drive SRE team by defining the scope & process w.r.t availability, reliability and scalability of critical production systems by aligning with the SLA needs of the business.
  • Closely work with Prod-Ops team to refine the process for Production incident reporting, post-mortem report and conduct retrospective sessions. Assist Prod-Ops for resolving critical incidents in production environment.
  • Define the application monitoring, alerting and reporting framework with required documentation.
  • Collaborate with the solution architects and application development leads to co-create robust and scalable application framework by providing best practices in design & coding.
  • Collaborate with development & other relevant teams to define non-functional-requirements for various applications and ensure adherence.
  • Lead the resiliency validation exercises to identify area of improvements w.r.t availability, reliability and scalability. Also ensure the comprehensiveness of contingency plan.
  • Finding the automation opportunities in various mundane tasks related to monitoring & reporting of application/system KPIs.
  • Engage with the Infra/ProdOps team to forecast capacity requirements.

  • Minimum Bachelor of Computer Science Degree with equivalent work experience of 12 years.
  • At least 5+ years of hands on experience in JAVA/J2EE , Spring Boot, JavaScript, Ajax, SQL and Linux platform.
  • Minimum of 3+ years of hands on experience in container technology such as Red Hat Openshift, Docker, Kubernetes and Dev Ops Tools such as Jenkins, Ansible, Bitbucket.
  • Minimum 3+ years of hands on experience in application monitoring technology such as CA Wily, Grafana, Kibana, Prometheus, Elastic Search.
  • Thorough understanding & practical experience in Microservices ecosystem and Service oriented architecture.
  • Possess good technical knowledge in implementing, troubleshoot, performance tuning of physical & virtual infrastructure.
  • Hands on experience in chaos engineering.
  • Ability to work effectively in small (often ad-hoc) teams , experience in managing a team of associates.
  • Experience with Agile/Scrum software development approach and familiarity with TDD & BDD.
  • Ability to manage stakeholders & experience in presenting to the senior management.
  • Strong analytical and problem-solving skills.
  • Experience in banking/finance industry is a plus.
  • Strong interpersonal and communication skills.
  • Positive attitude towards continuous learning.