Todyl Jobs

Site Reliability Engineer II

Todyl

Site Reliability Engineer II

Posted Yesterday

Be an Early Applicant

Hybrid

Denver, CO, USA

130K-160K Annually

Mid level

Hybrid

Denver, CO, USA

130K-160K Annually

Mid level

Build and operate the production platform (Kubernetes, AWS, IaC, CI/CD, observability), automate self-service deployment, embed security and secrets management, run and modernize on-call, drive cost efficiency, mentor teammates, and maintain runbooks and post-incident reviews.

The summary above was generated by AI

About Us

At Todyl, we are on a mission to protect small and medium-sized businesses from ever-changing cyber threats. The Todyl platform fully integrates threat, risk, and compliance management to provide exceptional, affordable, unified cybersecurity solutions to MSPs (Managed Service Providers) and their end customers.

At the end of the day, we're here to keep our partners and customers safe and help them manage the risks and comply with regulations. Protecting others requires a team that works together with trust and cares deeply about carrying out our mission.

About the Role

The Site Reliability Engineering team at Todyl exists to make our platform reliable, secure, and easy for engineering teams to ship to. We do that by building automation, self-service tooling, and operational standards that let developers move fast without putting customers at risk. Our success is measured by how much production reliability and developer velocity we enable, not by how much work flows through us.

You'll spend your time building the tooling and platform capabilities that let engineering teams deploy, scale, and configure their services without having to file a ticket with us. You'll partner closely with developers, take operational reliability seriously, and bring an automation-first mindset to a platform that handles security workloads at the heart of our product.

In this role, we're looking for someone who:

Has a bias for action and a strong sense of ownership. They finish what they start and stay with the work through stabilization, not just through a successful deploy.
Sees SRE as a service to the engineering organization, not a gate. They build trust with developers and make other teams' jobs easier.
Treats security as a normal part of platform operations, not an afterthought, and brings a growth mindset to security regardless of starting expertise.
Gets energized by eliminating toil. They look at repetitive work and ask, "How do we make this go away?"
Actively uses AI tooling in their day-to-day work and is curious about where it goes next.
Can communicate technical decisions clearly to engineers and non-engineers, and is comfortable saying no or pushing back constructively when it matters.

What you'll do:

You'll build and operate the production platform, including Kubernetes, CI/CD pipelines, infrastructure-as-code, observability, secrets management, and the AWS foundation on which our services run.
You'll automate the path to production, investing in self-service capability so engineering teams can deploy and scale without depending on you for routine work. We're shifting from reactive to proactive, and we'd rather build guardrails than approve every deploy.
You'll drive cost visibility and efficiency across our cloud footprint, including AWS resource tagging, COGs attribution, and right-sizing across the platform.
You'll modernize how we run on-call: living runbooks, alerting we trust, and post-incident reviews as a normal part of how the team operates.
You'll embed security into day-to-day operations through patching, access controls, secrets rotation, and dependency hygiene, as part of the platform you operate rather than a separate workstream.
You'll partner with product teams early on reliability for high-stakes projects, helping shape the design rather than reviewing it the week before launch.
You'll participate in a weekly on-call rotation, resolve most issues independently, and update documentation after incidents.
You'll plan and estimate honestly. Break work into smaller increments, communicate delays early, and write tests for the automation you build because it runs in production.
You'll treat code review as a quality lever, not a checkbox. Catch missing tests, push back on tech debt, and watch dashboards and logs to verify your own changes after they ship.
You'll mentor less-tenured teammates through pairing, documentation, and the example you set. The team has engineers at different stages, and we expect knowledge to flow across them.
When something you've built is mature and stable, you'll look for ways to hand it off or make it self-managing rather than holding onto it forever.

Important note: We expect the person in this role to actively use AI tools, including Claude, to accelerate automation development, reduce toil, and solve infrastructure problems more quickly. Pairing strong SRE fundamentals with AI-assisted development is increasingly how modern platform teams move at the speed the market requires, and we want a teammate who is comfortable working this way. We'll talk about how you use AI in your work during the interview, and we expect your fluency with these tools to grow as part of your professional development here.

We don't expect deep knowledge across every item below, but familiarity with several of these will help you ramp quickly. Most importantly, we're looking for a strong technical background and the willingness to learn what you don't already know.

Kubernetes and containerization
AWS and cloud-native infrastructure
Infrastructure-as-code (Terraform, Salt)
CI/CD pipelines and automation
Observability stack (Grafana, Prometheus)
Linux at scale
Python or Bash for tooling
Networking fundamentals
Git and modern development workflows

What We Offer:

For full-time employees, Todyl offers comprehensive benefits including:

Medical, dental, and vision coverage
Health savings and flexible spending accounts (HSA/FSA)
Life insurance
Short- and long-term disability
Access to on-demand healthcare and telehealth services
Employee Assistance Program (EAP)
Flexible PTO in addition to 13 company holidays
401(k)
Generous parental leave programs

This role is based in Denver, CO, or Atlanta, GA, with 3 days per week in our office.

The salary range for this position is $130,000–$160,000, plus equity. The actual annual salary for this role will depend on each candidate's experience, qualifications, and work location, with most new hires placed near the midpoint of the posted range to ensure fairness and consistency across our team.

Todyl provides equal employment opportunities to all employees and applicants for employment without regard to race, color, religion, gender, sexual orientation, transgender status, gender identity or expression, national origin, age, disability, marital status, genetic information, military status, or any other status protected by applicable federal, state, or local laws.

We encourage you to apply even if you don't meet all the requirements listed. We're looking for the best person for the job, someone who brings a unique combination of skills and experience that makes them exceptional, even if they don't check every box.

Similar Jobs

RELX

Senior Site Reliability Engineer

Yesterday

In-Office or Remote

100K-210K Annually

Senior level

100K-210K Annually

Senior level

Information Technology • Legal Tech • Analytics

Design, build, and operate highly available AWS systems. Write and maintain Terraform, improve observability (Grafana, Pingdom, Uptrends), run on-call incident response, define SLOs/SLIs, build CI/CD with Azure DevOps/GitHub, automate operational work, document in Confluence, and mentor engineers.

Top Skills: AWSAzure DevopsCi/CdConfluenceDockerGitGitGrafanaJIRAKubernetesLinuxPingdomServicenowTerraformUptrends

Akamai Technologies

Site Reliability Engineer

23 Days Ago

In-Office or Remote

United States

95K-171K Annually

Junior

95K-171K Annually

Junior

Cloud • Security • Software • Cybersecurity

The Site Reliability Engineer II - Database ensures the integrity, security, and performance of MySQL databases while collaborating with development and operations teams to address database issues and improve reliability.

Top Skills: MySQLSQL

Waystar

Site Reliability Engineer

24 Days Ago

In-Office

Senior level

Healthtech • Payments • Software

The Senior Site Reliability Engineer II manages infrastructure for Waystar products, enhancing system reliability, observability, and performance while collaborating with engineering teams and mentoring juniors.

Top Skills: Apache AirflowAWSAzureCloudFormationGCPGrafanaKafkaKubernetesPowershellPrometheusPythonSparkSplunkTerraform

What you need to know about the Colorado Tech Scene

With a business-friendly climate and research universities like CU Boulder and Colorado State, Colorado has made a name for itself as a startup ecosystem. The state boasts a skilled workforce and high quality of life thanks to its affordable housing, vibrant cultural scene and unparalleled opportunities for outdoor recreation. Colorado is also home to the National Renewable Energy Laboratory, helping cement its status as a hub for renewable energy innovation.

Key Facts About Colorado Tech

Number of Tech Workers: 260,000; 8.5% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Lockheed Martin, Century Link, Comcast, BAE Systems, Level 3
Key Industries: Software, artificial intelligence, aerospace, e-commerce, fintech, healthtech
Funding Landscape: $4.9 billion in VC funding in 2024 (Pitchbook)
Notable Investors: Access Venture Partners, Ridgeline Ventures, Techstars, Blackhorn Ventures
Research Centers and Universities: Colorado School of Mines, University of Colorado Boulder, University of Denver, Colorado State University, Mesa Laboratory, Space Science Institute, National Center for Atmospheric Research, National Renewable Energy Laboratory, Gottlieb Institute