Site Reliability Engineer

  • negotiable
  • Singapore
  • Permanent, Full time
  • Keyteo Consulting
  • 14 Feb 19

Non production environments are today subject to a lot of instability and availability. This is perturbing the development or testing activities during working hours. Keyteo is seeking for a Site Reliability Engineer to help us to keep and running those environments.

You’ll spend time 

Analysing system performance and identifying ways to stabilize our environments. 

working on monitoring systems, 

learning how to leverage automation to drive efficiencies, 

and troubleshooting critical infrastructure, 

 

Successful candidates will possess an innate desire to take on challenging problems and enjoy working cross functionally with members of Support, Release Management & Application Development teams. 

 

We're looking for engineers who are passionate about building infrastructure, planning projects and love diving head first into challenging problems.

 

Responsibilities 

Collaborating with software engineers & APAC infrastructure to design a robust and performant infrastructure

Working with external vendors to plan for and integrate new technologies; seamlessly

Designing and documenting procedures to be used as standard operating guidelines

Real-time support of critical service disruptions

Working alongside our production support Team, provide post mortem analysis of why services broke or became degraded

Proactively analyse client environments and identify opportunities to improve performance

Leveraging your diverse technical skills to educate others

Demonstrate the ability to provide exceptional verbal and written customer communications

Facilitate the restoration of services

Facilitate and support lessons learned reviews

Responsible for ensuring that all security, availability, confidentiality and privacy policies and controls are adhered to

 

SKILLS

 

Skills & Experience: 

Bachelor’s Degree or equivalent experience required

5+ years of overall experience 

2-4+ years of operations experience in a high-availability Linux environment (CentOS/RHEL)

 

Qualifications 

Foundational knowledge of VMWare or other virtualization solutions

Expertise with one or more of the following scripting languages: Ruby, Bash, Powershell, Node.js or Python

A passion for automated, scalable, and repeatable infrastructures. 

Good knowledge of standards of web such as Tomcat, HAProxy, Nginx, Redis, etc

Deep knowledge of Linux system internals and the command line.

Hands on experience with monitoring and logging tools such as Datadog, Logstash, etc

The ability to prioritize tasks, work independently, and respond to emergent issues accordingly

Exceptional interpersonal communication skills and work well within a team

A strong sense of ownership over system uptime and performance

Basic knowledge of Windows Systems Administration

Experience working with ticket management and knowledge base systems like Service Now, JIRA and Confluence

Self-driven and effective in communication

World-class problem-solving skills

 

Bonus points for: 

VMware vCenter experience

Storage technologies

A history of capacity planning and establishing technical roadmaps for future scaling

Experience in managing automated build, test, and deployment infrastructures

Any open-source side projects that display your passion and prowess

Networking experience (LB, WAF..)

File transfer protocols