Site Reliability Engineer

  • $200,000
  • New York, NY, USA
  • Permanent, Full time
  • Landover
  • 31 Jan 18 2018-01-31

In this role, you will build and maintain monitorable, performant, reliable and highly-scalable software systems. This is a small, fast-paced, growing team of engineers tackling challenging problems at scale. Software and systems engineers with interest and/or experience in system automation are encouraged to apply for this position.

The Role:

  • Evangelize best practices for building and operating highly reliable systems
  • Serve as subject matter expert in observability and monitoring
  • Consult in system design to meet reliability and capacity requirements
  • Automate infrastructure and configuration management 
  • Conduct timely post-mortems of production infrastructure incidents 
  • Assist with all aspects of operational security and compliance
  • Seek out potential threats to security and reliability and advocate solutions
  • Participate in an on-call rotation to receive escalations
  • Work with Amazon Web Services, Chef, Python, Ubuntu, Nginx, Jenkins, Terraform, Akamai, Elemental

Desired Skills & Expertise:

  • Know when to triage and when to dive down into a root-cause analysis
  • Passion for reliable, scalable, observable software with strong sense of ownership
  • Experience with Linux system administration
  • Experience developing and monitoring mission-critical systems
  • Experience with a programming language like Python, Perl, Ruby, Bash, Java, C
  • Working knowledge of a centralized configuration tool like chef, puppet, or ansible
  • Experience or interest in learning about streaming applications and media servers