Site Reliability Engineer

Drud Technology

Sorry, this job was removed at 3:06 a.m. (MST) on Saturday, July 15, 2017

View 672 Jobs

Find out who's hiring in Greater Denver Area.

See all Developer + Engineer jobs in Greater Denver Area

View 672 Jobs

Apply

By clicking Apply Now you agree to share your profile information with the hiring company.

Save job

Site Reliability Engineer

DRUD Tech is currently looking for a motivated, proactive, team-centric individual to join our engineering team as a full-time employee that thrives on high performance distributed systems and open source.

About Us

We are a funded Denver-based start-up consisting of a passionate team of open source developers with a desire to build a fruitful and sustainable business that can impact the world as a whole. Our mission is to create open source, enterprise-grade products that help individuals and organizations unlock their potential and become top performers in their respective domains. To achieve this, we are building a suite of tools that span the entire web development lifecycle ranging from a best in class local development experience all the way through multi-cloud, high-availability hosting (PaaS or self-hosted). To learn more, please visit https://www.drud.com/, our GitHub (https://github.com/drud/), and governance (https://github.com/drud/community) pages.

Roles and Responsibilities

Be professional, courteous, kind and responsive to others.
Integrate with a fast-paced engineering team to design, develop and deliver our local development and hosting products.
Help maintain 24x7 uptime on existing AWS-based infrastructure.
Be a first responder during outages of client facing hosting in AWS.
Help design a transition from existing AWS infrastructure to our Kubernetes based hosting platform on GCE.

Requirements

An overall team-centric philosophy and strong Emotional Intelligence score is a must. Google spent a tremendous amount of effort to discover that the keys to high performing through Project Aristotle, and we feel that we have a lot to gain by standing on the shoulders of giants when building out our team. We have a strong affinity for organizations like the Cloud Native Computing Foundation that should be reflected by you. You must love highly distributed mission-critical computing using modern technologies and languages.

Qualifications

3+ years in a combination of DevOps or SRE roles.

Demonstrated an understanding of containers and container orchestration.
Troubleshooting skills that span systems, network (TCP/IP), and code.
Must have experience building or managing large-scale systems and application architectures.
Experience in one or more languages such as Go, Python, JavaScript, Java, C++, or similar.
Solid understanding of system performance and monitoring.
Working knowledge of cloud computing including virtualization, hosted services, multi-tenant cloud infrastructures, distributed storage systems and content delivery networks.
Experience in UI/Rest API technologies.
Excellent verbal and written communication skills.

Nice to Haves

Experience with modern container orchestration systems: Kubernetes, Mesos, Swarm.

Experience with messaging technologies: Kafka, RabbitMQ, ActiveMQ.
Experience with infrastructure configuration and automation processes and tools: Ansible, Fabric, Terraform, Puppet, Chef.
Experience with monitoring solutions: Prometheus, ELK, Splunk, SUMO, Nagios.
Experience with various data technologies including relational and nonrelational databases and message queues.
Experience with distributed file systems: Ceph, GlusterFS.

Benefits

Flexible vacation/time-off.
Competitive salaries and performance-based raises.
Health, vision and dental insurance.
Professional development opportunities.
A fantastic team of like-minded individuals to create with.

Read Full Job Description

Site Reliability Engineer

Location

Similar Jobs