SRE Engineer, Group Consumer Banking and Big Data Analytics Technology, Technology & Operations
Group Technology and Operations (T&O) enables and empowers the bank with an efficient, nimble and resilient infrastructure through a strategic focus on productivity, quality & control, technology, people capability and innovation. In Group T&O, we manage the majority of the Bank's operational processes and inspire to delight our business partners through our multiple banking delivery channels.
At DBS, we see ourselves as a start up, leveraging start up thinking while relying on the latest innovation to design and develop technology solutions for our customers and people. With a strong culture of innovation, experimenting with new technology and collaboration with the FinTech community, we aim to simplify payment so we can help others Live More, Bank Less. With such ambition, we invented DBS PayLah!, one of our digital Lifestyle apps.DBS PayLah!, which is more than just Singapore's favorite payments app. It's your everyday app for booking a ride, ordering lunch, scoring seats to a show and finding all your favorite DBS/POSB Cards rewards and deals. You can even track and redeem your DBS/POSB Cards points and enjoy personalized rewards, all on PayLah!
Anyone of any age or from any bank can enjoy the convenience of PayLah! Students under 16 can now also register for their first digital wallet with parental consent.There's an ever-growing list of partners and over 180,000 acceptance points like hawker centers, retail outlets sports centers, and restaurants. Use PayLah! to discover a world of DBS/POSB Cards rewards and exclusive deals on food, shopping, transport, movies and more!
In our payment digital transformation journey ahead, all of DBS Lifestyle apps are adopting common services, platforms, architectural principles, and design patterns. DBS PayLah! tech team intend to build loosely coupled but tightly aligned components that are built expecting to be reused while anticipating change. The common set of parameters and tools for software development provides a consistent approach to security, maintainability and reliability. In addition, architectural agility has a causal relationship with potential strategic and operational benefits. Responsibilities
- Site Reliability Engineering (SRE) is an engineering discipline that combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. This position is for a Site Reliability Engineer responsible for the development and implementation of processes necessary to improve application / system reliability along with operational support.
- The position would comprise of approximately equal focus on participating in both the software development lifecycle and operation disciplines. This position will also involve automation of operational processes and reducing toil.
- Establish SLI, SLO for enterprise applications, calculate error budgets, MTTD, and MTTR. Educate and implement observability culture in Dev community and assist them identifying golden signals.
- Responsible for the availability, performance, change management, monitoring, and capacity management of the services.
- Incident management, troubleshoot business critical incidents, conduct postmortems and ensure permanent closure of the incidents.
- Analyze patterns of production incidents, develop permanent remediation plans, and implement automation to prevent future incidents from occurring through software engineering.
- Implement and integrate micro service application with monitoring/logging tools like ELK, Grafana, AppDynamics, Alog and etc.
- Engage with both the development and support teams throughout the life cycle to help build for reliability. Close working collaboration with them to maintain and improve the service against established Service Level Objectives by applying software engineering principles.
- Contribute to design and architecture towards a highly resilient open source stack based micro service application. Enhance, optimize and migrate to new solutions if required.
- Manage the efforts to split between manual operational work and engineering work.
- Work with partner organizations and vendors to provide solutions to current business issues.
- Participate in a shift model covering 24x7x365 support.
- Bachelor or higher degree in Computing / Computer Science / Engineering.
- Minimum 8 years experience in IT/software production support and minimum 3 years experience in a lead support role for one or more enterprise applications, with a track record of promoting a culture of collaboration and teamwork.
- Experience of Site Reliability Engineering principles with regards to performance, reliability, monitoring, alerting and maintenance in a Production environment. Pro-active Capacity monitoring & Observability of production Infrastructure, automated alerting, performance monitoring and reporting tools.
- Experience in identifying golden signals, defining SLI, SLO for enterprise applications, calculate error budgets, MTTD, and MTTR.
- Technical experience in all aspects of technology like business applications, middle-ware, database technology, best practices, quality improvements and productivity improvements.
- Experience in supporting a highly resilient open-source stack and Java based micro service application in PCF or public cloud. An SRE and/or cloud practitioner certification would be a plus for the applicant.
- Working experience in production support and improvement, incident management is a must.
- Strong Problem-Solving skills and ability to solve unstructured problem and challenge status quo.
- Must be comfortable working in an extremely fast paced environment, with an ability to priorities accordingly to meet deadlines.
- Strong communication and interpersonal skills. Self-driven, committed, and reliable team player. Ability to contribute to discussions on design and strategy as well as processes.
- Operating System - Linux. AIX would be a plus
- Cloud platforms - Pivotal Cloud Foundry (PCF) / AWS
- Database - MariaDB, In-memory Redis is a plus
- Application Servers - Apache Tomcat, Jboss
- Monitoring and Observability: AppDynamics, Kibana & Grafana
- Automation: Shell Scripting, Python, Groovy is a plus
- CI/CD and software configuration, quality and version control. A plus would be GIT, JIRA, Bitbucket, Jenkins, Maven, Nexus Repo, SonarQube, Fortify, Nexus IQ
We offer a competitive salary and benefits package and the professional advantages of a dynamic environment that supports your development and recognises your achievements.