Site Reliability Engineer
What is the opportunity?
Everything is cool when you're part of an awesome team. We are looking to add a Site Reliability Engineer to the API team at RBC. The API team delivers enterprise services to both internal applications and to external partners, built on a modern hybrid cloud platform using the latest micro-services architecture. We design, build and operate the services we own and we give our DevOps teams all the autonomy they need to move fast and innovate.
As a Site Reliability Engineer (SRE), you will work with our DevOps teams to ensure their services are delivered securely, are available and performing well for customers 24x7. We are very focused on service quality and the role of the SRE is to own that quality and innovate with new tools and processes to ensure that RBC customers continue to have a seamless and delightful experience. We expect an SRE to own production problems through to resolution and lead retrospectives with our DevOps teams to ensure we take action to avoid similar problems in the future. This is not a traditional operations job, however. We are looking for a top notch engineer with both systems administration experience and real coding chops. You will get plenty of opportunity to innovate with our DevOps teams on code scalability, resiliency architecture, deployment automation, fault injection testing and more. More than anything else, we're looking for people who want to be part of an elite engineering team and have the opportunity to learn and grow.
RBC believes in building diverse teams. We actively try to bring together people with a wide variety of backgrounds, experiences, and perspectives. We encourage collaboration with internal partners and team members to achieve a collective result. We want our team members to think big and simplify things that are complicated. Our engineering culture is built on continuous learning and supported by transparency, trust and cooperation. We have a great mix of senior and junior engineers which gives everyone a chance through teamwork and mentorship to build their knowledge and experience. What you will do?
What do you need to succeed? Must Have
- Support services before they go live through activities such as capacity planning, monitoring setup, logging and production readiness reviews.
- Monitor existing systems and scale for growth.
- Troubleshoot production incidents and practice sustainable incident response, conduct blameless post-mortems and drive issue resolution with our DevOps teams.
- Participate in on-call rotation with other members.
- Improve processes through automation and optimize infrastructure utilization and cost.
- Develop effective tooling, alerts, and automated responses to identify and address reliability risks.
- Steadily improve performance, availability, and security (site reliability) through active analysis, development\\management of service health dashboards and design of production systems.
- In-depth data analysis to gauge service trends and drive improvements.
Nice To Have
- B.S. degree in Computer Science or related technical field (e.g. EE, physics or mathematics), or equivalent practical experience.5+ years of experience in a Systems Engineering/Operations role
- Experience with Unix/Linux (RedHat, CentOS or Ubuntu) and API technologies and platforms (Apigee, IBM Bluemix, SOAP, REST).
- Experience with monitoring and alerting tools (Grafana, Graphite, Carbon, Elasticsearch, Logstash, Kibana, CloudWatch) and centralized logging services (Splunk, Logstash).
- Experience with 24x7 production support and troubleshooting within an on-call rotation model.
- Experience with automating builds and releases (CI/CD), configuration management, infrastructure as code
- Experience with orchestration and automation tools (Puppet, Salt, Chef, Ansible, etc. ), continuous integration platforms (Jenkins, etc.) and containerized application management using Docker.
- Knowledge of web and application server frameworks (e.g. Tomcat, Apache HTTP) as well as relational and NoSQL datastores (MySQL, DynamoDB).
- Knowledge of SCM and SCM tools (e.g. Bitbucket, Git, SVN)
- Experience with databases (Oracle, Cassandra, Postgress, CouchBase, Solr).
- Experience with Ansible, Chef, Puppet or other deployment/provisioning tools.
- Experience with version control tool like SVN and GIT.
- Experience with monitoring tools like Nagios, Zabbix, Redis etc.
- Experience in managing high volume (10000+ requests per second) transactions services.Experience in PCF Pivotal (Cloud Foundry) Platform experience
- Experience in debugging and maintainence of Java based applications
DIG http://www.rbc.com/techjobs/?utm_campaign=jobpostingupdate_tech Learn more about RBC Tech Jobs