Site Reliability Engineer

DigitalGlobe

Sorry, this job was removed at 4:09 p.m. (MST) on Tuesday, December 4, 2018

View 682 Jobs

Find out who's hiring remotely in Greater Denver Area.

See all Remote Developer + Engineer jobs in Greater Denver Area

View 682 Jobs

Apply

By clicking Apply Now you agree to share your profile information with the hiring company.

Save job

Please review the job details below.

The Site Reliability Engineer (SRE) is a combination of a software engineer and a systems enthusiast that provides technical leadership to a growing team focused on applying software engineering practices to operations at scale. SREs focus on operational procedures, code fixes, etc. increasing the automation, repeatability, and consistency of operational tasks. The successful candidate will have a breadth of knowledge to solve for complex problems across the entire technology stack.

Responsibilities

Design and architect operational solutions for managing applications and infrastructure
Monitor and report on service level objectives for system-wide application and infrastructure services. Work with service and product owners to establish KPIs to identify trends and quantify whether at the site/system level we are getting better, or not
Define standards for configuration, monitoring, reliability, and performance
Participating actively and critically in retrospectives that had broad impact and/or are leading indicators of potential site issues
Provide deep troubleshooting for production issues
Engage with service owners on root cause analysis for service interruption recovery and create preventive measures
Analyzing & interpreting metrics by using them

Required Skills

Strong DevOps background
Experience working in a large scale enterprise containing infrastructure with hundreds or thousands of servers and dozens of technologies
Experience with system engineering & SDLC
Bachelors in Computer Science/ Information Systems Management/or relevant work experience
Years of work experience required: 5 years
Experience with application of requirements from NISPOM, DCID 6/3, ICD 503, NIST 800-53 and related US Government standards and requirements a plus
Advanced knowledge of Unix/Linux systems: feel very comfortable at the command line
Proficient with at least one programming language (e.g., Python, Ruby, Java, etc)
Familiarity with configuration management and remote execution tools (Ansible, Puppet, etc)
Knowledge of continuous integration and continuous delivery tools such as Jenkins, with an understanding of Docker
Familiarity with distributed version control systems such as Git
Good understanding of networking fundamentals
In-depth understanding of cloud technologies, including AWS
Familiarity with infrastructure as code
US Citizen and ability to qualify for TS/SCI Clearance level

Preferred skills:

Make it happen attitude!
Effectively prioritize work and encourage best practices in others
The ability to “smell out” potential issues in the system (not just individual services) is important
A knack for troubleshooting tough problems: a high level of ownership and curiosity empower this skill
Meticulous and cautious: identify and consider all risks and balance those with performing the task efficiently
Organized - able to document and communicate ongoing work tasks and projects
Positive, flexible, and personable – adaptive to change
Receptive to giving, receiving, and implementing feedback in a highly collaborative environment
Learn rapidly in a faced paced environment while being extremely curious about how things work

Location

Westminster, CO / Longmont, CO

DigitalGlobe and Radiant solutions offer a generous compensation package including a competitive salary; choice of medical plan; dental, life, and disability insurance; a 401(K) plan with competitive company match; paid holidays and paid time off.

Read Full Job Description

Site Reliability Engineer

Please review the job details below.

Location

Similar Jobs