Site Reliability Engineer
- Permanent, Full time
- Keyteo Consulting
- 14 Feb 19
Non production environments are today subject to a lot of instability and availability. This is perturbing the development or testing activities during working hours. Keyteo is seeking for a Site Reliability Engineer to help us to keep and running those environments.
You’ll spend time
• Analysing system performance and identifying ways to stabilize our environments.
• working on monitoring systems,
• learning how to leverage automation to drive efficiencies,
• and troubleshooting critical infrastructure,
Successful candidates will possess an innate desire to take on challenging problems and enjoy working cross functionally with members of Support, Release Management & Application Development teams.
We're looking for engineers who are passionate about building infrastructure, planning projects and love diving head first into challenging problems.
• Collaborating with software engineers & APAC infrastructure to design a robust and performant infrastructure
• Working with external vendors to plan for and integrate new technologies; seamlessly
• Designing and documenting procedures to be used as standard operating guidelines
• Real-time support of critical service disruptions
• Working alongside our production support Team, provide post mortem analysis of why services broke or became degraded
• Proactively analyse client environments and identify opportunities to improve performance
• Leveraging your diverse technical skills to educate others
• Demonstrate the ability to provide exceptional verbal and written customer communications
• Facilitate the restoration of services
• Facilitate and support lessons learned reviews
• Responsible for ensuring that all security, availability, confidentiality and privacy policies and controls are adhered to
Skills & Experience:
• Bachelor’s Degree or equivalent experience required
• 5+ years of overall experience
• 2-4+ years of operations experience in a high-availability Linux environment (CentOS/RHEL)
• Foundational knowledge of VMWare or other virtualization solutions
• Expertise with one or more of the following scripting languages: Ruby, Bash, Powershell, Node.js or Python
• A passion for automated, scalable, and repeatable infrastructures.
• Good knowledge of standards of web such as Tomcat, HAProxy, Nginx, Redis, etc
• Deep knowledge of Linux system internals and the command line.
• Hands on experience with monitoring and logging tools such as Datadog, Logstash, etc
• The ability to prioritize tasks, work independently, and respond to emergent issues accordingly
• Exceptional interpersonal communication skills and work well within a team
• A strong sense of ownership over system uptime and performance
• Basic knowledge of Windows Systems Administration
• Experience working with ticket management and knowledge base systems like Service Now, JIRA and Confluence
• Self-driven and effective in communication
• World-class problem-solving skills
Bonus points for:
• VMware vCenter experience
• Storage technologies
• A history of capacity planning and establishing technical roadmaps for future scaling
• Experience in managing automated build, test, and deployment infrastructures
• Any open-source side projects that display your passion and prowess
• Networking experience (LB, WAF..)
• File transfer protocols