Site Reliability Engineer
Posted: 4/27/21
AMP Robotics is a pioneer and industry leader in artificial intelligence and robotics for the recycling industry. Every day, we’re working to re-imagine and actively modernize the world’s recycling infrastructure. Headquartered and manufacturing operations in Louisville, Colorado, we build and deploy cutting-edge technology solutions that solve many of the central challenges of recycling to make it more efficient, cost-effective, scalable, and sustainable.
We’re fostering an environment where talented, driven individuals can grow and create impact. We are looking for unconventional thinkers to join our mission; at AMP, your contributions have meaning and can spur change. With backing from top-tier investors including Sequoia Capital and recognition including Fortune’s Impact 20, Fast Company’s Most Innovative Companies, and Forbes’ most promising artificial intelligence companies in America, we’re always seeking ways to better our operations, raising the bar on innovation, and looking to collaborate and improve each day in what we do. Learn more at AMPRobotics.com.
We are looking for a Site Reliability Engineer to focus on automating the hard stuff, maintaining our fleet of robots, and reducing friction in the software deployment process. You’ll work closely with our other engineering teams to make sure that we can keep delivering the best possible accuracy with our sorting processes without missing a beat, and that our feature velocity doesn’t require us to sacrifice quality. You’ll touch just about every system at AMP, and you’ll take the initiative to “be the glue” and help every part of the engineering organization work better.
Job Responsibilities:
- Automating the software update rollout process and optimizing for minimal unit downtime
- Finding single points of failure and making them highly available
- Streamlining and consolidating CI pipelines
- Maintaining cloud and on-prem networks to ensure fleet responsiveness
- Productionizing and promoting useful homegrown tools
- Assisting the QA team to write and run tests
- Keep our fleet in top condition! This can take the shape of projects like:
- Keeping the fleet up to date with the latest GPU drivers and software packages--and then making sure that this task never has to be done manually again
- Developing metrics, monitoring, and alerting solutions and strategies, and tune monitors/alerts to maximize their utility
- Tracking performance across software versions on key metrics and working with other key software stakeholders to identify bottlenecks
- Continuously identifying and improving areas for improvement in our infrastructure (especially in the domains of networking, security, and high availability)
The ideal candidate will work to:
- Automate and improve critical software engineering and operations processes
- Continuously improve visibility into our fleet
- Implement monitoring/observability technologies in a green field
- Develop SLIs and SLOs
- Build and maintain cloud infrastructure to support a rapidly growing fleet of remote devices
- Gain familiarity with the technical domains of our other software engineering teams, and help them to achieve their goals
- “Connect the dots” for our other engineering teams to help them see the bigger picture
- You’ll spend some time embedded with other teams to help provide them with operational context for cross-functional projects
- You’ll also conduct blameless postmortems and write/improve our documentation
- Help us to cultivate a culture of Reliability, where fault-tolerance and monitoring are baked into every aspect of our software
Requirements (absolutely must have; 3-5 is preferred):
- 2+ years experience in a Software Engineering, Site Reliability Engineering, Network Engineering, DevOps, and/or Systems Administration role
- 2+ years experience programming with a scripting language like Python or Ruby
- 2+ years experience building CI pipelines
- 2+ years experience with system administration (Linux preferred)
- Good working familiarity with container orchestration and cloud/on-prem networking.
Desired Qualifications (nice to have):
- 3-5 years experience in a relevant technical role
- Expertise in at least 2 of the following software engineering sub-disciplines (or to be a generalist, i.e., to have had moderate exposure to many of them):
- Scripting (we use Python for most tasks)
- Containerization and Orchestration (e.g. Docker, Docker Swarm, Docker-Compose, Kubernetes, Nomad)
- TCP/IP networking (routing & switching, VPNs, managing VPCs, running and interpreting packet captures)
- Automation & CI/CD (e.g. Jenkins, Ansible/Tower, GitlabCI)
- Cloud Platform Administration (especially GCP)
- Infrastructure automation (e.g. Terraform, Ansible, Puppet, Chef)
- Relational and Time-series Database Administration
- Security (e.g. network security, SDLC, AAA tools and techniques)
- Debian-like Unix/Linux System Administration (advanced filesystem management, iptables/networking configuration, server fleet management, bash scripting)
Bonus (Opportunity to show some personality, and push some of the very far reaching desired qualifications in here):
- Experience or interest in AI, ML, Computer Vision, Robotics, IoT
- Ability to read or write any of the following: C++, Rust, JS
- Passionate about the recycling industry
- Strong desire to work in a fast-moving startup environment
Affirmative Action/EEO Statement:
AMP Robotics is an equal opportunity employer. In order to provide equal employment and advancement opportunities to all individuals, employment decisions at the Company will be based on job openings, merit, qualifications, and abilities as required by the position. The Company does not discriminate, and does not permit its employees to discriminate against other employees, applicants, customers, or independent contractors because of:
- Race
- Color
- Religion
- Sex
- Sexual orientation (including gender identity or expression, including a person's orientation toward heterosexuality, homosexuality, bisexuality, or transgender status, or PeopleCare’s perception thereof)
- Pregnancy, childbirth, and related conditions
- Marital status
- National origin
- Citizenship
- Military or veteran status
- Ancestry
- Age (40 or over)
- Disability (including genetic information)
- Or, any other consideration made unlawful by applicable laws.
Equal employment opportunity will be extended to all persons in all aspects of the employer-employee relationship, including recruitment, hiring, upgrading, training, promotion, transfer, compensation, benefits, discipline, layoff, recall, and termination.
Other duties:
Please note this job description is not designed to cover or contain a comprehensive listing of activities, duties or responsibilities that are required of the employee for this job. Duties, responsibilities, and activities may change at any time with or without notice.
We recognize that there is more to work than the day-to-day responsibilities. In addition to a collaborative, high-performing team environment, we’re pleased to offer competitive base salaries; medical, dental and vision insurance; a 401(k) plan; paid time off and sick time; flexible work hours; and the opportunity to quickly accelerate your learning and growth.
Benefits information:
Full-Time / Salaried Employees
- Medical - The company covers up to 85% of the premium of UHC Gold Choice Plus POS 1250 BP9K. Employees pay the difference in premium if they select a more expensive plan. Up to 25% for dependents.
- Group Life, AD&D – 100% paid.
- Long Term Disability – 100% paid.
- Dental Insurance – 75% paid.
- Vision Insurance* - 75% paid.
- Employee Assistance Program - Provided through United Healthcare.
- Paid Vacation Leave – Accrues at a rate of ~4.31 hours (0.54 days) per pay period (2 weeks) starting day 1. Unused PTO carries over each year with a 1-year limit.
- Paid Sick Leave – 64 hours per year, given in full on start date, refreshes on anniversary.
- 401(k) retirement plan - (non-matching).
- Seven (7) paid holidays – 7 company designated and 2 floating holidays.
- Referral bonuses for staff positions.
Part-Time / Hourly Employees
- Medical - The company covers up to 85% of the premium of UHC Gold Choice Plus POS 1250 BP9K. Employees pay the difference in premium if they select a more expensive plan. Up to 25% for dependents.
- Paid Vacation Leave – Accrues at a rate of ~4.31 hours (0.54 days) per pay period (2 weeks) starting day 1. Unused PTO carries over each year with a 1-year limit.
- Paid Sick Leave – 64 hours per year, given in full on start date, refreshes on anniversary.
- 401(k) retirement plan (non-matching).