Site Reliability Engineer (SRE)

Sorry, this job was removed at 12:11 p.m. (MST) on Tuesday, May 05, 2026

In-Office

Boulder, CO, USA

In-Office

Boulder, CO, USA

Similar Jobs

Vertafore

Site Reliability Engineer

5 Days Ago

Hybrid

Denver, CO, USA

160K-180K Annually

Expert/Leader

160K-180K Annually

Expert/Leader

Information Technology • Insurance • Software

The Principal Site Reliability Engineer will drive reliability, scalability, and performance of production services, influence system design, and lead incident management. They will establish SLOs and error budgets, foster a blameless culture, and operate across AWS and hybrid environments.

Top Skills: .NetAWSC#Ci/CdInfrastructure-As-CodeJavaKubernetesLinuxPythonReactWindows

Applied Systems

Site Reliability Engineer

14 Days Ago

Remote or Hybrid

65K-135K Annually

Mid level

65K-135K Annually

Mid level

Cloud • Insurance • Payments • Software • Business Intelligence • App development • Big Data Analytics

The Site Reliability Engineer will ensure system reliability and scalability, manage infrastructure, automate tasks, and collaborate cross-functionally while mentoring junior engineers and supporting production environments.

Top Skills: AnsibleArgocdBashDatadogGithub ActionsGitlabGoHashicorp ConsulHelmKubernetesPackerPostgresPowershellPythonSQL ServerTerraformTypescript

Applied Systems

Senior Site Reliability Engineer

14 Days Ago

Remote or Hybrid

65K-160K Annually

Senior level

65K-160K Annually

Senior level

Cloud • Insurance • Payments • Software • Business Intelligence • App development • Big Data Analytics

As a Senior Site Reliability Engineer, you will ensure software reliability and scalability, manage IAC, CI/CD, monitor systems, and mentor junior engineers while collaborating across teams.

Top Skills: AnsibleArgocdBashDatadogGithub ActionsGitlabGoHashicorp ConsulHelmKubernetesPackerPostgresPowershellPythonSQL ServerTerraformTypescript

The Opportunity

We're hiring an experienced Site Reliability Engineer to own the reliability of the Freeplay platform and drive success for our most advanced enterprise customers. In this role, you will bridge the gap between core infrastructure engineering and high-stakes customer deployments. You won’t just be maintaining our internal SaaS environment; you will be the technical expert guiding Fortune 100 engineering teams as they deploy Freeplay into their own private clouds.

This is an exciting chance to join a fast-growing startup with a front-row seat to how AI products are being built at some of the largest and most innovative companies in the world. You’ll be hands-on with customers, learning about cutting-edge AI architectures while ensuring our platform runs flawlessly in their diverse and complex environments.

What's Freeplay?

Freeplay is the end-to-end platform for software teams to ship great AI products. We give product development teams the power to test, evaluate, monitor & optimize AI in production. Our customers use Freeplay to build better LLM features, chatbots, and agents. Today we serve leading software companies from growing startups to Fortune 100 companies.

Your Mission

Build the infrastructure that powers Freeplay and ensure successful deployments for our enterprise customers.

Partner with Enterprise Customers: Act as a key technical contact for our "Bring Your Own Cloud" (BYOC) deployments. You will jump on calls with customer engineering teams to guide them through installation, debug configuration issues in their VPCs, and ensure they are successful running Freeplay.
Own the Multi-Cloud Architecture: Help manage and improve our internal production infrastructure across AWS, GCP, and Azure ensuring high availability and seamless networking.
Solve the "Shipped Software" Challenge: Drive the engineering efforts to package and distribute Freeplay using tools like Helm, Replicated, and KOTS. You will help ensure our software is portable, installing as reliably in a customer's cloud environment as it does in our SaaS.
Master Infrastructure as Code: Drive our Terraform strategy, building modular, reusable, and secure infrastructure definitions that treat operations with the same rigor as application code.
Champion Observability: Implement and tune our monitoring stack (Datadog) to provide deep visibility into system health, and help customers implement similar observability for their private instances.
Scale Data & Messaging: Manage the stateful components of our stack, including PostgreSQL, Elasticsearch, and NATS JetStream, ensuring data integrity and performance under load.

About You

Experience: We are open to candidates ranging from Mid-Level (3+ years) to Senior/Staff (7+ years). We will tailor the scope and responsibilities to your expertise.
Customer-facing confidence. You are comfortable interacting directly with external engineering teams. You can troubleshoot a failed deployment while on a Zoom call with a client and explain complex architectural requirements clearly.
Production Kubernetes fluency. You are confident managing EKS/GKE/AKS clusters, debugging complex pod failures, managing ingress controllers, and handling autoscaling in production.
Deep Terraform expertise. You have experience structuring IaC for scale and have managed multi-environment setups.
Database operational experience. You aren't just an infrastructure plumber; you understand how to manage and tune databases (Postgres) and search indices (Elasticsearch) at scale.
Security-first thinking. You are familiar with cloud security best practices, including VPC networking, IAM/Workload Identity, and secrets management, and you can explain these concepts to security-conscious enterprise clients.

Bonus Points

Experience in a Solutions Engineering or Field Engineering capacity.
Experience with Replicated / KOTS or similar tools for packaging enterprise software for on-premise/VPC deployments.
Experience operating message queues like NATS, JetStream, or Kafka.
Background in AI/ML infrastructure or high-throughput data systems.

Compensation & Benefits

Competitive salary commensurate with experience, plus equity package.
Medical, dental, and vision insurance.
Premium hardware setup (MacBook, monitor, peripherals).
Four weeks of Paid Time Off per year (and we encourage you to take it!).

Location

We prefer candidates able to work full-time on-site in Boulder, CO, but we're open to exceptional remote candidates who can visit Boulder every 6 weeks for team collaboration.

Boulder, Colorado, United States

What you need to know about the Colorado Tech Scene

With a business-friendly climate and research universities like CU Boulder and Colorado State, Colorado has made a name for itself as a startup ecosystem. The state boasts a skilled workforce and high quality of life thanks to its affordable housing, vibrant cultural scene and unparalleled opportunities for outdoor recreation. Colorado is also home to the National Renewable Energy Laboratory, helping cement its status as a hub for renewable energy innovation.

Key Facts About Colorado Tech

Number of Tech Workers: 260,000; 8.5% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Lockheed Martin, Century Link, Comcast, BAE Systems, Level 3
Key Industries: Software, artificial intelligence, aerospace, e-commerce, fintech, healthtech
Funding Landscape: $4.9 billion in VC funding in 2024 (Pitchbook)
Notable Investors: Access Venture Partners, Ridgeline Ventures, Techstars, Blackhorn Ventures
Research Centers and Universities: Colorado School of Mines, University of Colorado Boulder, University of Denver, Colorado State University, Mesa Laboratory, Space Science Institute, National Center for Atmospheric Research, National Renewable Energy Laboratory, Gottlieb Institute

Freeplay.AI

Site Reliability Engineer (SRE)

Similar Jobs

Site Reliability Engineer

Site Reliability Engineer

Senior Site Reliability Engineer

Freeplay.AI Boulder, Colorado, USA Office

What you need to know about the Colorado Tech Scene

Key Facts About Colorado Tech