Incident Response Engineer

  • Location: Sydney, New South Wales, Australia
  • Salary: Competitive
  • Job Type: Full time

Incident Response Engineer

Amazon Web Services' Technical Operations team (TechOps) is Amazon's central defense against large-scale incidents as well as driving operational excellence across all of Amazon businesses. Our key offering to Amazon is best-in-class Incident Management. Our engineers are front-and-center in driving down event duration through experience in operational excellence, best current practices and incident management tools.

We're looking for engineers who have owned or participated in operational and incident management for at least one large-scale enterprise. You should have a passion for working with new technologies and are not afraid to exercise your creativity in pushing the boundaries of existing technologies. Running incident management for AWS is unique in that AWS supports more than 30% of the internet's businesses, and our ability to identify and mitigate issues is the most important aspect of every Amazon employee. Because of our unique role, you will have limitless exposure to all things Amazon.

TechOps engineers are encouraged to build solutions to problems while sharing the benefit of those solutions with other AWS service teams. This is an excellent opportunity to join one of Amazon's world-class team of engineers, and work with some of the best and brightest while also developing your skills and career within one of the most dynamic, innovative and progressive technology companies anywhere.

In addition to a stimulating and fun working environment, Amazon offers mentoring programs with experienced engineers, regular tech talks with technology Principals, and well-defined career paths for motivated engineers who want to contribute to our culture of operational excellence and relentless customer-focused technical innovation.

Responsibilities
• Provide critical support, incident response, and management to internal customers across all of Amazon including management of communications and coordination of service owners via conference calls
• Be a technology evangelist and use your deep knowledge to solve business problems
• Reduce mean time to resolution for all incident types
• Update and/or build world class listening systems
• Participate in Agile sprints to evolve business processes and technologies
• Get there first; be the first to detect and diagnose high-severity service-impacting events
• Identify and troubleshoot recurring platform issues and engage service owners to assist with resolution
• Automate tasks through creation and maintenance of scripts and tools
• Respond to and complete customer requests within SLA via a trouble ticketing system
• Take part in a "follow the sun" rotation split between Seattle, Dublin and Sydney sites, including weekends and holidays
• Create and review documentation, design new standard operating procedures
• Mentor peers in your areas of technical and operational strength
• Participate in the interviewing process

Basic Qualifications
Primary Qualifications
• A degree in Computer Science or at least two years relevant experience in a large-scale online technical operations environment
• Knowledge of the Linux operating system and good understanding of networking concepts
• Excellent troubleshooting skills and a commitment to document findings
• Knowledge of best current practice frameworks (ITIL, COBIT), particularly incident, problem and change management
• Development/scripting skills in at least one interpreted language (e.g. Perl/Python/Ruby) as well as shell. Working knowledge of a compiled language is a plus
• Understanding of routing protocols to help facilitate troubleshooting and remediation of networking issues
• Experience in Agile/Scrum or related collaborative workflow
• Excellent English language written and verbal communication skills to facilitate efficient and effective interaction with peers and customers
• Confidence to initiate, drive, and manage company-wide conference calls
• Effective organizational skills to maintain a consistently high standard of operations in a busy environment

Preferred Qualifications
Other Qualifications
• Experience driving collaborative projects from conception to delivery
• The ability to maintain a high level of alertness and attention to detail for extended periods
• Experience dealing effectively with customers during problem resolution and operating efficiently under pressure
• Effective prioritization and time management

we pioneer

We're a company of pioneers. It's our job to make bold bets, and we get our energy from inventing on behalf of customers. Success is measured against the possible, not the probable. For today’s pioneers, that’s exactly why there’s no place on Earth they’d rather build than Amazon.