Oh Snap!
This job is no longer active - but you can still view the details below.

Site Reliability Engineer

| Greater Denver Area

Please review the job details below.

 

 

The Site Reliability Engineer (SRE) is a combination of a software engineer and a systems enthusiast that provides technical leadership to a growing team focused on applying software engineering practices to operations at scale. SREs focus on operational procedures, code fixes, etc. increasing the automation, repeatability, and consistency of operational tasks. The successful candidate will have a breadth of knowledge to solve for complex problems across the entire technology stack.

 

 

Responsibilities

  • Design and architect operational solutions for managing applications and infrastructure
  • Monitor and report on service level objectives for system-wide application and infrastructure services. Work with service and product owners to establish KPIs to identify trends and quantify whether at the site/system level we are getting better, or not
  • Define standards for configuration, monitoring, reliability, and performance
  • Participating actively and critically in retrospectives that had broad impact and/or are leading indicators of potential site issues
  • Provide deep troubleshooting for production issues
  • Engage with service owners on root cause analysis for service interruption recovery and create preventive measures
  • Analyzing & interpreting metrics by using them

 

Required Skills

  • Strong DevOps background
  • Experience working in a large scale enterprise containing infrastructure with hundreds or thousands of servers and dozens of technologies
  • Experience with system engineering & SDLC
  • Bachelors in Computer Science/ Information Systems Management/or relevant work experience
  • Years of work experience required: 5 years
  • Experience with application of requirements from NISPOM, DCID 6/3, ICD 503, NIST 800-53 and related US Government standards and requirements a plus
  • Advanced knowledge of Unix/Linux systems: feel very comfortable at the command line
  • Proficient with at least one programming language (e.g., Python, Ruby, Java, etc)
  • Familiarity with configuration management and remote execution tools (Ansible, Puppet, etc)
  • Knowledge of continuous integration and continuous delivery tools such as Jenkins, with an understanding of Docker
  • Familiarity with distributed version control systems such as Git
  • Good understanding of networking fundamentals
  • In-depth understanding of cloud technologies, including AWS
  • Familiarity with infrastructure as code
  • US Citizen and ability to qualify for TS/SCI Clearance level

 

Preferred skills:

  • Make it happen attitude!
  • Effectively prioritize work and encourage best practices in others
  • The ability to “smell out” potential issues in the system (not just individual services) is important
  • A knack for troubleshooting tough problems: a high level of ownership and curiosity empower this skill
  • Meticulous and cautious: identify and consider all risks and balance those with performing the task efficiently
  • Organized - able to document and communicate ongoing work tasks and projects
  • Positive, flexible, and personable – adaptive to change
  • Receptive to giving, receiving, and implementing feedback in a highly collaborative environment
  • Learn rapidly in a faced paced environment while being extremely curious about how things work

 

Location

Westminster, CO / Longmont, CO

 

DigitalGlobe and Radiant solutions offer a generous compensation package including a competitive salary; choice of medical plan; dental, life, and disability insurance; a 401(K) plan with competitive company match; paid holidays and paid time off.

Read Full Job Description

Location

Our location is just steps away from plenty of expansive open space, restaurants, and bars. We are less than a 5 minute walk from an RTD Park-n-Ride.