Checkr’s mission is to build a fairer future by improving understanding of the past. We believe all candidates, regardless of who they are, should have a fair chance to work. Established in 2014 and valued at $2.2B, Checkr is using technology to bring hiring to the next level. Our People Trust Platform uses machine learning to help thousands of companies modernize their background check process and make hiring safer, more efficient, and more inclusive. Some of our customers include, Uber, Instacart, Doordash, Netflix, Compass Group, and Adecco.
A career with Checkr is an opportunity to work with some of the best and brightest minds, disrupt an industry for a better future, and give otherwise overlooked candidates access to employment. Checkr has been recognized in Forbes Best Startup Employers and is a top Y Combinator company by valuation.
We’re looking for Site Reliability Engineers (SREs) who can help us design, build, and maintain high-performance, scalable, reliable services. At Checkr, Reliability Engineering is a cross-functional role that combines operations work with software engineering principles to enable high-availability of Checkr’s production systems. As an SRE at Checkr, you will work closely with our DevOps and Platform teams to build and run the core components and tools that power Checkr. You will also partner with our Product Engineering teams to help make their services more performant, scalable, observable, and reliable. We believe every engineering team at Checkr should be responsible for the software they build, and SREs play a critical part in providing the tools, practices, and expertise to make that happen.
We are growing and evolving the SRE team to help meet Checkr’s product-first reliability goals for 2021 and beyond. Having established a strong foundation--including a containerized microservices architecture (AWS, Kong, Kubernetes, Kafka, MySQL and MongoDB), CI/CD, full-stack monitoring, structured incident response and a blameless postmortem culture--we are focused on implementing new capabilities like:
- Chaos Engineering and Game Day Simulations to discover and test fixes for weak spots that would otherwise not be identified until a real-life production incident occurred
- Automating observability and alerting across an ever-changing landscape of microservices
- Short-term embedded rotations to help Product Engineering teams tune their services, drive adoption of tooling and best practices, provide coaching and build a strong feedback loop
- Software engineering project work, proposed and driven by individual SRE team members, to remove operational bottlenecks and increase velocity in ways we’ve never considered before
What a typical week may look like at Checkr:
- Debug production issues across services and levels of the stack
- Improve common operational challenges through tooling and automation
- Serve as the on-call incident commander to drive resolution of production reliability issues, contribute to the postmortem, identify and implement follow-up work to prevent recurrence
- Participate in design and production reviews for new features, products, or pieces of infrastructure to harden rollout/rollback plans and capability observability implementation
- Update runbooks or create documentation to enable self-service
- Build new metrics from logs
- Run a workshop to help teams define and implement user-focused SLOs
- Audit and tune the configuration of systems owned by other engineering teams
- Plan for the growth of Checkr’s infrastructure and infrastructure reliability/resiliency
What we value in an SRE Engineer:
SREs combine some level of experience in both software engineering and operations and may hail from a variety of backgrounds and job titles including: production or application engineers, software developers with a strong DevOps mindset, SysAdmins with solid systems engineering and programming skills, platform or DevOps engineers. We are hiring for multiple roles at different levels and encourage any applicants who meet most of the following criteria to apply:
- Strong Linux and Command Line skills
- Experience with microservices and public cloud services
- On-call support of highly available production systems
- Automating repetitive tasks using a programming language like Python, Go, Ruby, etc.
- Strong command of Git, CI or CI/CD pipelines
- Understand how application components interact and contribute to architectural discussions
- Experience with monitoring systems using tools like DataDog or Prometheus
- Unwavering commitment to operational security and best practices
- Ownership: identify problems but also propose solutions, then go out and implement them--from submitting a merge request on another team’s repository to scoping out a new reliability project
- Connection: motivated to help other teams improve their service reliability through reviews, pair programming, hands-on training and continuous improvement of tooling and services
- Experience defining user-focused SLO’s, and implementing targeted SLI’s to measure them
- Experienced in running Chaos Engineering experiments
- Ability to assess, design and evaluate large systems, including capacity and resource planning, component isolation, and graceful degradation
- Infrastructure as Code, Terraform
- Experience with Kubernetes, Kafka
What you get:
- A fast-paced and collaborative environment
- Learning and development allowance
- Competitive compensation and opportunity for advancement
- 100% medical, dental and vision coverage
- Unlimited PTO policy
- Monthly wellness stipend, home office stipend
Equal Employment Opportunities at Checkr
Checkr is committed to hiring talented and qualified individuals with diverse backgrounds for all of its tech, non-tech, and leadership roles. Checkr believes that the gathering and celebration of unique backgrounds, qualities, and cultures enriches the workplace.
Checkr also welcomes the opportunity to consider qualified applicants with prior arrest or conviction records. Checkr’s commitment to diversity extends to hiring talented individuals in spite of a prior criminal history in accordance with local, state, and/or federal laws, including the San Francisco’s Fair Chance Ordinance.