Senior Site Reliability Engineer
Posted: 1/13/21
AMP Robotics is a pioneer and industry leader in artificial intelligence and robotics for the recycling industry. Every day, we’re working to reimagine and actively modernize the world’s recycling infrastructure. Headquartered and manufacturing operations in Louisville, Colorado, we build and deploy cutting-edge technology solutions that solve many of the central challenges of recycling to make it more efficient, cost-effective, scalable, and sustainable.
We’re fostering an environment where talented, driven individuals can grow and create impact. We are looking for unconventional thinkers to join our mission; at AMP, your contributions have meaning and can spur change. With backing from top-tier investors including Sequoia Capital and recognition including Fortune’s Impact 20, Fast Company’s Most Innovative Companies, and Forbes’ most promising artificial intelligence companies in America, we’re always seeking ways to better our operations, raising the bar on innovation, and looking to collaborate and improve each day in what we do. Learn more at AMPRobotics.com.
AMP Robotics is hiring a Site Reliability Engineer reporting to the DevOps Technical Lead to focus on increasing the scale and reach of AMP’s tools and infrastructure. As our fleet grows, our infrastructure must be elastic enough to grow with it. Your job will be to make our robots easier to monitor and maintain than ever; to increase our aggregate fault tolerance; and to capitalize on opportunities to automate manual processes. You will also be empowered to help build an engineering culture at AMP that prioritizes reliability.
As the first member of our growing Site Reliability Engineering team, you will work to:
- Continuously improve visibility into our fleet
- Implement monitoring/observability technologies in a green field
- Develop SLIs and SLOs
- Automate and improve critical software engineering and operations processes
- Automating the software update rollout process and optimizing for minimal unit downtime
- Finding single points of failure and making them highly available
- Streamlining and consolidating CI pipelines
- Productionizing and promoting useful homegrown tools
- Examples of automation project work include:
- Build and maintain critical cloud and on-prem infrastructure
- We use Hashicorp Terraform and Packer for IaC and Ansible for bulk fleet maintenance
- We’re primarily a GCP shop
- Linux system administration skills are a must; we run Ubuntu
- “Connect the dots” for our other engineering teams to help them see the bigger picture
- You’ll spend some time embedded with other teams to help provide them with operational context for cross-functional projects
- You’ll also conduct blameless postmortems and write/improve our documentation
- Help us to cultivate a culture of Reliability, where fault-tolerance and monitoring are baked into every aspect of our software
Supervisory Responsibilities:
- None
The successful candidate will have:
Required:
- 2+ years experience programming with a scripting language like Python or Ruby
- 2+ years of experience with Linux system administration
- Good working familiarity with Docker and Docker-Compose
- Experience operating Kubernetes clusters
Preferred:
- 3-5 years experience programming with a scripting language like Python or Ruby
- 2+ years experience building and operating Kubernetes clusters
- We don’t expect you to be an expert in every SRE sub-discipline, but it would be advantageous to be highly experienced in at least 2 of the following areas (or to be a generalist, i.e., to have had moderate exposure to many of them):
- Software Engineering and/or Scripting (we use Python for most DevOps tasks)
- Containerization and Orchestration (e.g. Docker, Docker Swarm, Docker-Compose, Kubernetes, Nomad)
- SQL-like Database Administration
- TCP/IP networking (routing & switching, VPNs, managing VPCs, running and interpreting packet captures)
- Automation (e.g. Jenkins, Ansible/Tower)
- CI/CD (e.g. Gitlab CI, Bazel)
- Cloud Platform Administration (especially GCP)
- Infrastructure as Code (e.g. Terraform, Puppet, Chef)
- Security (e.g. network security, SDLC, AAA tools and techniques)
- Debian-like Unix/Linux System Administration (advanced filesystem management, iptables/networking configuration, server fleet management, bash scripting)
Bonus:
- Interest in/experience with Machine Learning/Artificial Intelligence and/or robotics
- Ability to read and/or write any of the following: C++, Rust, JS
Education:
- Bachelor’s Degree in Computer Science or equivalent experience
Experience:
- 5+ years experience in DevOps and/or Site Reliability Engineering
Working Conditions/Physical Demands:
The physical demands described here are representative of those that must be met by an employee to successfully perform the essential functions of this job. Reasonable accommodations may be made to enable individuals with disabilities to perform the essential functions.
- Prolonged periods of sitting at a desk and working on a computer.
- Must be able to lift up to 15 pounds at times.
Working Location(s):
- Louisville, Colorado
Travel Requirements:
- None
Affirmative Action/EEO Statement:
AMP Robotics is an equal opportunity employer. In order to provide equal employment and advancement opportunities to all individuals, employment decisions at the Company will be based on job openings, merit, qualifications, and abilities as required by the position. The Company does not discriminate, and does not permit its employees to discriminate against other employees, applicants, customers, or independent contractors because of:
- Race
- Color
- Religion
- Sex
- Sexual orientation (including gender identity or expression, including a person's orientation toward heterosexuality, homosexuality, bisexuality, or transgender status, or PeopleCare’s perception thereof)
- Pregnancy, childbirth, and related conditions
- Marital status
- National origin
- Citizenship
- Military or veteran status
- Ancestry
- Age (40 or over)
- Disability (including genetic information)
- Or, any other consideration made unlawful by applicable laws.
Equal employment opportunity will be extended to all persons in all aspects of the employer-employee relationship, including recruitment, hiring, upgrading, training, promotion, transfer, compensation, benefits, discipline, layoff, recall, and termination.
Other duties:
Please note this job description is not designed to cover or contain a comprehensive listing of activities, duties or responsibilities that are required of the employee for this job. Duties, responsibilities, and activities may change at any time with or without notice.
We recognize that there is more to work than the day-to-day responsibilities. In addition to a collaborative, high-performing team environment, we’re pleased to offer competitive base salaries; medical, dental and vision insurance; a 401(k) plan; paid time off and sick time; flexible work hours; and the opportunity to quickly accelerate your learning and growth.
Benefits information:
- Medical - The company covers up 85% to 100% of the premium for Cigna healthcare plans depending on the selection. Employees pay the difference in premium if they select a more expensive plan. Up to 75% for dependents.
- 401(k) retirement plan (non-matching).
- Seven (7) paid holidays – 7 company designated and 2 floating holidays. (salaried employees only)
- Referral bonuses for staff positions.
- Paid Vacation Leave – Accrues at a rate of 4.67 hours (0.58 days) per pay period (2 weeks). Unused PTO carries over each year with a 1-year limit.
- Paid Sick Leave – 64 hours per year, given in full on start date, refreshes on anniversary.