Site Reliability Engineer
DRUD Tech is currently looking for a motivated, proactive, team-centric individual to join our engineering team as a full-time employee that thrives on high performance distributed systems and open source.
About UsWe are a funded Denver-based start-up consisting of a passionate team of open source developers with a desire to build a fruitful and sustainable business that can impact the world as a whole. Our mission is to create open source, enterprise-grade products that help individuals and organizations unlock their potential and become top performers in their respective domains. To achieve this, we are building a suite of tools that span the entire web development lifecycle ranging from a best in class local development experience all the way through multi-cloud, high-availability hosting (PaaS or self-hosted). To learn more, please visit https://www.drud.com/, our GitHub (https://github.com/drud/), and governance (https://github.com/drud/community) pages.
Roles and ResponsibilitiesBe professional, courteous, kind and responsive to others.
Integrate with a fast-paced engineering team to design, develop and deliver our local development and hosting products.
Help maintain 24x7 uptime on existing AWS-based infrastructure.
Be a first responder during outages of client facing hosting in AWS.
Help design a transition from existing AWS infrastructure to our Kubernetes based hosting platform on GCE.
An overall team-centric philosophy and strong Emotional Intelligence score is a must. Google spent a tremendous amount of effort to discover that the keys to high performing through Project Aristotle, and we feel that we have a lot to gain by standing on the shoulders of giants when building out our team. We have a strong affinity for organizations like the Cloud Native Computing Foundation that should be reflected by you. You must love highly distributed mission-critical computing using modern technologies and languages.
Qualifications3+ years in a combination of DevOps or SRE roles.
Demonstrated an understanding of containers and container orchestration.
Troubleshooting skills that span systems, network (TCP/IP), and code.
Must have experience building or managing large-scale systems and application architectures.
Experience in one or more languages such as Go, Python, JavaScript, Java, C++, or similar.
Solid understanding of system performance and monitoring.
Working knowledge of cloud computing including virtualization, hosted services, multi-tenant cloud infrastructures, distributed storage systems and content delivery networks.
Experience in UI/Rest API technologies.
Excellent verbal and written communication skills.
Experience with modern container orchestration systems: Kubernetes, Mesos, Swarm.
Experience with messaging technologies: Kafka, RabbitMQ, ActiveMQ.
Experience with infrastructure configuration and automation processes and tools: Ansible, Fabric, Terraform, Puppet, Chef.
Experience with monitoring solutions: Prometheus, ELK, Splunk, SUMO, Nagios.
Experience with various data technologies including relational and nonrelational databases and message queues.
Experience with distributed file systems: Ceph, GlusterFS.
Flexible vacation/time-off.
Competitive salaries and performance-based raises.
Health, vision and dental insurance.
Professional development opportunities.
A fantastic team of like-minded individuals to create with.