Cutover

Site Reliability Engineer

Reposted 7 Days Ago

Remote

Hiring Remotely in US

120K-130K Annually

Mid level

Remote

Hiring Remotely in US

120K-130K Annually

Mid level

As a Site Reliability Engineer, you will ensure production system reliability, optimize performance, respond to incidents, and collaborate on infrastructure improvements.

The summary above was generated by AI

An inclusive work environment is an empowering one. At Cutover, we lead with empathy and enable others to succeed through curiosity, kindness, and self-expression.

Location: Remote, United States

This role requires on-call shifts, roughly 1 in 4 weeks and 1 in 4 weekends - 2nd Shift: 2:00pm -11:00pm PST (10:00 PM - 7:00 AM UTC)

Cutover provides enterprise technology operations teams with an AI-powered SaaS solution that automates and streamlines complex processes with intelligent runbooks. The Cutover solution enables teams to respond to incidents quickly, recover from IT outages, and manage cloud migrations with precision and efficiency. Cutover is used in many of the world's largest financial institutions to support their critical technology operations, including 5 out of the top 6 largest asset managers and 3 out of the top 5 US banks.

We’re looking for a Site Reliability Engineer (SRE) to add to our US team. This role will report to our SRE Lead.

Cutover’s SRE team is responsible for ensuring the reliability and performance levels of our production systems and applications. As a team, we’re committed to constantly improving our engineering culture to maintain a balance between risk and reliability.

What tech stack do we use here at Cutover?

The platform is built on a ReactJS frontend with a Ruby on Rails API, and all hosted on the reliable infrastructure of Amazon Web Services (AWS).

Your role will involve close collaboration with our support and engineering teams. Together, we actively engage in maintaining and optimizing the platform's reliability, utilizing cutting-edge tools and occasionally leveraging in-house software and scripts.

If you're passionate about ensuring the dependability and efficiency of complex systems and thrive in an environment where technologies like React, Ruby, AWS, Kubernetes, Terraform, Git, and Ansible are at the forefront, we invite you to join our team. Together, let's elevate the reliability of our Cutover Enterprise platform to new heights.

As a Site Reliability Engineer, here's what you'll be up to:

Incident Response: Respond to incidents and alerts, triaging urgency and investigating root cause
Documentation: Regular contributions to improve our documentation on system design, troubleshooting, best practices, and engineering processes
Root Cause Analysis: Contribute to post-mortems and help identify long-term improvements under guidance
Collaboration: Support cross-functional teams during investigations and post-incident reviews
Observability: Support and enhance observability tools and techniques by identifying metrics, logging, and alerting improvements
Automation: Write and execute simple automation scripts (e.g. Python, Ruby, Bash) to improve reliability and toil reduction
Development: Work on internal tools, pipelines, and IaC solutions to help improve the speed of software delivery and recovery
System Reliability: Work on efforts to enhance the reliability and performance of our application and systems, ensuring optimal uptime and minimal disruptions.
Infrastructure Optimization: Work closely with the development and platform engineering teams to optimize the infrastructure on AWS, ensuring scalability and efficiency.

Please note that this role involves a rotating on-call schedule, which will require occasional evening and weekend availability.

What we'd like you to bring to the table:…

A genuine excitement for complex problem solving within our tech stack, applying what you know to our unique problems.
Familiarity with at least one scripting language such as Ruby, JavaScript, Python, Bash
Experience with containerization (i.e. Docker) or IaC (e.g. Terraform, Helm, CloudFormation)
An eagerness to follow modern engineering practices and learn from others
Familiarity with observability tools such as DataDog, New Relic, Grafana, Prometheus, ELK, or OpenTelemetry
Understanding of core networking concepts (DNS, HTTP/S, Load Balancing, etc.)
A collaborative mindset with clear communication skills
Willing to ask questions to gain a better understanding of new or complex concepts

Nice to haves…

Exposure to major incident response processes
AWS Certified Cloud Practitioner or hands-on experience with cloud environments

The good stuff…

We're excited to offer Share Options as part of our compensation package.
20 days of PTO per year + public holidays, and we want you to take all of them!
3 volunteer days to use for any charitable/voluntary cause you would like.
A top-tier private health insurance package.
401k contribution plan
Work from home stipend
A personal learning and development budget through Learnerbly. You’ll be supported in your quest for knowledge, whatever that looks like to you.
If you’re thinking of starting or growing your family, then you’ll be in great company - more than half of our team are parents and we’ve built a globally consistent parental leave approach that we’re proud of.
Employee Referral Scheme.
Safeguarding the mental health of our teams is paramount for us. If you’d like to, then you’ll be able to avail yourself of multiple Cutover mental health initiatives, from fully subsidised therapy sessions to subscriptions to leading wellbeing platforms.

Target compensation package: $120,000 - $130,000 base, + stock options + benefits.

The final offer may vary from the target compensation package, taking into consideration factors such as your experience level and skill set. If we aren't aligned on salary at this stage, we’d still love to hear from you to better understand if there are more suitable opportunities at Cutover.

Diversity Statement - Empowering Our Teams

We encourage our team to bring their authentic selves to work, which we have found has strengthened workplace relationships and fostered a genuine sense of community.

If you are excited by this role, we invite you to apply! Even if your profile doesn’t check all the boxes, please don't simply scroll past! We recognize that talent lies everywhere and that some demographic groups are more likely to apply for a "stretch role" than others. We are always open to different perspectives and professional backgrounds to keep Cutover's culture evolving and to ensure that we never stop learning.

Cutover is an Equal Opportunity Employer. Maintaining an equitable hiring process is imperative to our mission. All applicants are considered without regard to race, ethnicity, national origin, religion, sex, gender identity, sexual orientation, age, mental or physical disability, marital status, protected veteran or parental status.

Learn more about Life at Cutover, our Guiding Principles, and our latest news on LinkedIn.

Top Skills

Ansible

AWS

Bash

Datadog

Docker

Elk

Git

Grafana

Kubernetes

New Relic

Opentelemetry

Prometheus

Python

React

Ruby

Ruby On Rails

Terraform

Similar Jobs

Nexthink

Site Reliability Engineer

5 Days Ago

Remote or Hybrid

Boston, MA, USA

174K-272K Annually

Senior level

174K-272K Annually

Senior level

Artificial Intelligence • Big Data • Cloud • Information Technology • Machine Learning • Software

Lead Site Reliability Engineers to build and manage a high-performance cloud platform, ensuring compliance and reliability in services, especially for the US Public Sector.

Top Skills: AnsibleAWSAzureBashCloudFormationCrossplaneDockerGCPGitGitlabGoIds/IpsJenkinsKubernetesPythonSIEMTerraform

Coinbase

Site Reliability Engineer

5 Days Ago

Easy Apply

Remote

USA

Easy Apply

186K-219K Annually

Senior level

186K-219K Annually

Senior level

Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3

The role involves improving software reliability, automating processes, collaborating with teams on system optimization, and mentoring engineers to establish reliability as a core value.

Top Skills: AWSAzureDatadogDockerEc2GCPGoKibanaKubernetesRubyTerraform

NBCUniversal

Staff Software Engineer

7 Days Ago

Remote or Hybrid

New York, NY, USA

130K-180K Annually

Senior level

130K-180K Annually

Senior level

AdTech • Cloud • Digital Media • Information Technology • News + Entertainment • App development

The Staff Software Engineer will oversee SAP BTP CPI applications' operational support, manage incidents, collaborate with various teams, and ensure high system performance.

Top Skills: AbapCloud ApplicationsCpiErp SystemsIdocJSONOdataRestSap AribaSap BtpSap C4CSap CallidusSap Success FactorsSfapiSftpSoapWorkdayXML

What you need to know about the Colorado Tech Scene

With a business-friendly climate and research universities like CU Boulder and Colorado State, Colorado has made a name for itself as a startup ecosystem. The state boasts a skilled workforce and high quality of life thanks to its affordable housing, vibrant cultural scene and unparalleled opportunities for outdoor recreation. Colorado is also home to the National Renewable Energy Laboratory, helping cement its status as a hub for renewable energy innovation.

Key Facts About Colorado Tech

Number of Tech Workers: 260,000; 8.5% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Lockheed Martin, Century Link, Comcast, BAE Systems, Level 3
Key Industries: Software, artificial intelligence, aerospace, e-commerce, fintech, healthtech
Funding Landscape: $4.9 billion in VC funding in 2024 (Pitchbook)
Notable Investors: Access Venture Partners, Ridgeline Ventures, Techstars, Blackhorn Ventures
Research Centers and Universities: Colorado School of Mines, University of Colorado Boulder, University of Denver, Colorado State University, Mesa Laboratory, Space Science Institute, National Center for Atmospheric Research, National Renewable Energy Laboratory, Gottlieb Institute