Senior Site Reliability Engineer

Checkr

| Remote

Sorry, this job was removed at 5:33 a.m. (MST) on Wednesday, May 26, 2021

View 586 Jobs

Find out who’s hiring remotely

See all Remote jobs

View 586 Jobs

Apply

By clicking Apply Now you agree to share your profile information with the hiring company.

Save job

Checkr’s mission is to build a fairer future by improving understanding of the past. We believe all candidates, regardless of who they are, should have a fair chance to work. Established in 2014 and valued at $2.2B, Checkr is using technology to bring hiring to the next level. Our People Trust Platform uses machine learning to help thousands of companies modernize their background check process and make hiring safer, more efficient, and more inclusive. Some of our customers include, Uber, Instacart, Doordash, Netflix, Compass Group, and Adecco.

A career with Checkr is an opportunity to work with some of the best and brightest minds, disrupt an industry for a better future, and give otherwise overlooked candidates access to employment. Checkr has been recognized in Forbes Best Startup Employers and is a top Y Combinator company by valuation.

We’re looking for Site Reliability Engineers (SREs) who can help us design, build, and maintain high-performance, scalable, reliable services. At Checkr, Reliability Engineering is a cross-functional role that combines operations work with software engineering principles to enable high-availability of Checkr’s production systems. As an SRE at Checkr, you will work closely with our DevOps and Platform teams to build and run the core components and tools that power Checkr. You will also partner with our Product Engineering teams to help make their services more performant, scalable, observable, and reliable. We believe every engineering team at Checkr should be responsible for the software they build, and SREs play a critical part in providing the tools, practices, and expertise to make that happen.

We are growing and evolving the SRE team to help meet Checkr’s product-first reliability goals for 2021 and beyond. Having established a strong foundation--including a containerized microservices architecture (AWS, Kong, Kubernetes, Kafka, MySQL and MongoDB), CI/CD, full-stack monitoring, structured incident response and a blameless postmortem culture--we are focused on implementing new capabilities like:

Chaos Engineering and Game Day Simulations to discover and test fixes for weak spots that would otherwise not be identified until a real-life production incident occurred
Automating observability and alerting across an ever-changing landscape of microservices
Short-term embedded rotations to help Product Engineering teams tune their services, drive adoption of tooling and best practices, provide coaching and build a strong feedback loop
Software engineering project work, proposed and driven by individual SRE team members, to remove operational bottlenecks and increase velocity in ways we’ve never considered before

What a typical week may look like at Checkr:

Debug production issues across services and levels of the stack
Improve common operational challenges through tooling and automation
Serve as the on-call incident commander to drive resolution of production reliability issues, contribute to the postmortem, identify and implement follow-up work to prevent recurrence
Participate in design and production reviews for new features, products, or pieces of infrastructure to harden rollout/rollback plans and capability observability implementation
Update runbooks or create documentation to enable self-service
Build new metrics from logs
Run a workshop to help teams define and implement user-focused SLOs
Audit and tune the configuration of systems owned by other engineering teams
Plan for the growth of Checkr’s infrastructure and infrastructure reliability/resiliency

What we value in an SRE Engineer:

SREs combine some level of experience in both software engineering and operations and may hail from a variety of backgrounds and job titles including: production or application engineers, software developers with a strong DevOps mindset, SysAdmins with solid systems engineering and programming skills, platform or DevOps engineers. We are hiring for multiple roles at different levels and encourage any applicants who meet most of the following criteria to apply:

Strong Linux and Command Line skills
Experience with microservices and public cloud services
On-call support of highly available production systems
Automating repetitive tasks using a programming language like Python, Go, Ruby, etc.
Strong command of Git, CI or CI/CD pipelines
Understand how application components interact and contribute to architectural discussions
Experience with monitoring systems using tools like DataDog or Prometheus
Unwavering commitment to operational security and best practices
Ownership: identify problems but also propose solutions, then go out and implement them--from submitting a merge request on another team’s repository to scoping out a new reliability project
Connection: motivated to help other teams improve their service reliability through reviews, pair programming, hands-on training and continuous improvement of tooling and services

Brownie points:

Experience defining user-focused SLO’s, and implementing targeted SLI’s to measure them
Experienced in running Chaos Engineering experiments
Ability to assess, design and evaluate large systems, including capacity and resource planning, component isolation, and graceful degradation
Infrastructure as Code, Terraform
Experience with Kubernetes, Kafka

What you get:

A fast-paced and collaborative environment
Learning and development allowance
Competitive compensation and opportunity for advancement
100% medical, dental and vision coverage
Unlimited PTO policy
Monthly wellness stipend, home office stipend

Equal Employment Opportunities at Checkr

Checkr is committed to hiring talented and qualified individuals with diverse backgrounds for all of its tech, non-tech, and leadership roles. Checkr believes that the gathering and celebration of unique backgrounds, qualities, and cultures enriches the workplace.

Checkr also welcomes the opportunity to consider qualified applicants with prior arrest or conviction records. Checkr’s commitment to diversity extends to hiring talented individuals in spite of a prior criminal history in accordance with local, state, and/or federal laws, including the San Francisco’s Fair Chance Ordinance.

#LI-Remote

Read Full Job Description

Senior Site Reliability Engineer

Location

Similar Jobs