Maximum of 25 job preferences reached.
Top Remote Senior Site Reliability Engineer Jobs in Denver & Boulder, CO
eCommerce • Fintech • Payments • Software
The role involves ensuring software reliability and performance, managing incidents, developing infrastructure automation, and mentoring junior engineers within a platform team.
Top Skills:
AWSCloudFormationDatadogKubernetesOpentelemetryRubyRuby On RailsTerraform
Information Technology • Insurance • Software
The Sr. Site Reliability Engineer at Vertafore will own the reliability and performance of production services, design incident response protocols, and enhance system observability while applying software engineering practices.
Top Skills:
.NetAWSC#Ci/CdJavaKubernetesLinuxPythonReactWindows
Artificial Intelligence • Consumer Web • Digital Media • Information Technology • Social Impact • Software
Lead SRE work to keep Circle highly available and performant: respond to incidents, own monitoring/alerting/log management, manage and optimize MySQL/Postgres/ClickHouse/Redis databases, maintain server infrastructure and deployment pipelines, collaborate with engineering teams, and build internal SRE tooling and automation.
Top Skills:
AWSClickhouseKubernetesLlm-Based Tools (Copilots)MySQLPostgresRedis
19 Days AgoSaved
Easy Apply
Easy Apply
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
Own reliability, automation, and DevOps for Coinbase's corporate IAM platform: on-call/incident response, CI/CD and IaC pipelines, identity lifecycle tooling, observability and disaster recovery, documentation, and cross-team IAM advisement to ensure secure, scalable access for a global workforce.
Top Skills:
AbacAuth0AWSAzureC#Ci/CdContainer OrchestrationDuoEntraidGCPGenerative AiGitGoIacJavaMfaOktaPingPythonRbacRubySsoTerraform
19 Days AgoSaved
Easy Apply
Easy Apply
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
Senior SRE on the IT Operations team owning reliability, monitoring, and incident response for AI infrastructure. Build automation, CI/CD and Kubernetes tooling, improve observability and documentation, and develop internal full-stack tools using Go or Python. Partner with Infrastructure, Security, and Compliance to scale secure, resilient AI deployment pipelines.
Top Skills:
AnsibleAWSBashChefCi/CdDockerEc2GitGoKubernetesLinuxPuppetPythonRubySaltTerraform
Healthtech • Information Technology • Software • Telehealth
The Senior Site Reliability Engineer will develop, monitor, and maintain distributed production systems, ensuring uptime for patients and providers while automating processes and supporting a large engineering team.
Top Skills:
AWSDockerGCPKubernetes
HR Tech • Information Technology • Professional Services • Sales • Software
Own and operate production-grade Kubernetes infrastructure on AWS, build GitOps CI/CD with GitHub Actions and ArgoCD, develop AI agents and internal DevOps tooling, maintain Datadog-based observability, and manage on-call incident response while collaborating with engineering teams to improve reliability and delivery speed.
Top Skills:
Ai/LlmArgocdAWSCi/CdDatadogGithub ActionsGitopsGoKubernetesPython
Reposted 25 Days AgoSaved
Easy Apply
Easy Apply
Big Data • Cloud • Software • Database
Develop and maintain Kubernetes runtime environments, support developers, resolve critical issues, and participate in on-call rotations for production systems.
Top Skills:
AWSAzureCert-ManagerCorednsCrdsCriCsiGatekeeperGCPGoHelmKubernetesKustomizeOperatorsPythonTerraform
Reposted 4 Days AgoSaved
Easy Apply
Easy Apply
Big Data • Cloud • Software • Database
The Senior Site Reliability Engineer will develop and support distributed storage services, ensuring reliability and operational safety, with a focus on automation and efficiency.
Top Skills:
AWSAzureDnsGoGoogle Cloud PlatformKubernetesLinuxPythonTcp/IpTls
Other • Social Impact
As a Senior Site Reliability Engineer, you will design, develop, and maintain reliable infrastructure for Wikimedia's API services, ensuring performance and availability while driving reliability engineering practices and improving developer experience.
Top Skills:
AnsibleArgocdAWSAzureGCPGitlabGoKubernetesOpentelemetryPrometheusPythonTerraform
Other • Social Impact
The Senior Site Reliability Engineer is responsible for maintaining Wikimedia's infrastructure, improving reliability, automating processes, and collaborating with teams. The role involves troubleshooting, managing deployments, and leading incident responses while working remotely.
Top Skills:
AnsibleBashCassandraDebianGoGrafanaHhvmKubernetesMariadbMemcachedPHPPrometheusPuppetPythonRedisRubyShell
Financial Services
Prototype, write, test, document, and deploy release automation across environments. Build and maintain pipelines, collaborate with engineers and product teams, troubleshoot issues, participate in on-call rotation, and improve software delivery, configuration, monitoring, and operations.
Top Skills:
AnsibleBashDockerGitlabJenkinsKubernetesMssqlPostgresPowershellPythonRedisTeamcity
New
Cut your apply time in half.
Use ourAI Assistantto automatically fill your job applications.
Use For Free
3 Days AgoSaved
Blockchain • Fintech • Software • Cryptocurrency • Metaverse
Design, build, and maintain internal monitoring and alerting for high-load real-time systems; automate production testing; troubleshoot and resolve performance issues; coordinate cross-team incident resolution; recommend architectural and process improvements; research vendor solutions and enforce security best practices.
Top Skills:
AWSGCPJavaScriptLinuxNode.jsRest ApiWebsockets
Healthtech • Social Impact • Software
Own the operational lifecycle of cloud-native data infrastructure: design and automate reliable deployments, observability, incident response, SLIs/SLOs, autoscaling and IaC, and improve platform efficiency and data freshness across GKE and Cloud Run.
Top Skills:
BashBigQueryCloud BuildCloud MonitoringCloud RunDatadogDockerGCPGithub ActionsGkeGoGrafanaJIRAKubernetesPrometheusPulumiPythonSentrySlackSnykSonarqubeTerraform
Aerospace • Defense
Lead design, implementation, and operation of scalable, secure hybrid-cloud infrastructure for satellite ground systems. Improve developer experience, automate CI/CD and IaC, own observability, troubleshoot reliability issues, and collaborate with developers and satellite operators to advance SatDevOps practices.
Top Skills:
C/C++Ci/CdGCPGoGrafanaInfrastructure As Code (Iac)JavaKubernetesLokiPrometheusPythonRustSoftware Defined Networking (Sdn)
Software
Own and improve platform performance, reliability, and deployment automation. Manage cloud infrastructure, implement IaC, monitor systems with observability tools, provide operational support for distributed applications, and integrate production learnings into development workflows.
Top Skills:
Aiops ToolingAws Elastic ContainersAws RdsAws S3Claude CodeClaude CoworkDatadogHarness EngineeringInfrastructure As CodeKubernetesLlmsPrompt EngineeringRigorSplunk
Real Estate • Software
As a Senior Site Reliability Engineer, you'll enhance system performance and reliability, optimize databases, and implement AI-assisted solutions for operational efficiency.
Top Skills:
AnsibleDatadogElkGrafanaKubernetesLinuxMariadbMySQLPostgresPrometheusPuppetPythonRuby on RailsRubyTerraformTerragrunt
Database • Analytics
This role involves ensuring the reliability and performance of ClickHouse's cloud infrastructure, collaborating with engineering teams, incident management, and driving continuous improvement in service availability.
Top Skills:
AnsibleAWSAzureClickhouseDocker SwarmGoGoogle Cloud PlatformKubernetesPuppetPythonTerraform
Information Technology • Security • Cybersecurity
Operate and harden regulated cloud platforms (FedRAMP/DoD IL) by owning production reliability, designing resilient infrastructure, leading incident response and postmortems, automating compliance (NIST 800-53/STIG), supporting ATO and continuous monitoring, building secure IaC and CI/CD pipelines, and improving observability and operational tooling.
Top Skills:
Aws GovcloudBashCi/CdContainer HardeningDod Il4Dod Il5Fedramp HighGitopsGoGrafanaImage SecurityKubernetesLinux/UnixNist 800-53PrometheusPythonStigTerraform
Big Data • Analytics
Own production reliability for customer-facing radar and weather data services across Azure, colocation, and edge Kubernetes. Refactor C#/.NET services for multi-replica safety, design multi-cluster HA, operate self-managed Kubernetes, improve observability and automation, lead incident response and postmortems, and drive operational excellence and capacity planning.
Top Skills:
.NetAnsibleC#DatadogGpu-Enabled WorkloadsGrafanaHelmIstioKubernetesLokiLonghornAzureNatsOctopus DeployOpentelemetryPostgisPostgresPrometheusRabbitMQRancherRke2Terraform
Artificial Intelligence
Own operational excellence for cloud infrastructure: run incident management, improve reliability through automation, own a platform domain (e.g., Kubernetes, Temporal, observability), manage vendor and cost relationships, and deliver measurable reductions in incidents and costs within 12 months.
Top Skills:
AWSKubernetesLlm ApisMongoDBObservabilityPythonTemporal
Legal Tech • Software
Lead observability and incident management efforts: define SLIs/SLOs, build monitoring/alerting, dashboards, logging, and tracing. Drive incident response, postmortems, and reliability improvements to reduce MTTD/MTTR. Integrate observability into CI/CD, maintain AWS and Kubernetes infrastructure, automate operations, and mentor engineers on SRE best practices.
Top Skills:
AWSBashCi/CdDatadogDistributed TracingDynatraceGrafanaKubernetesNew RelicOpentelemetryPowershellPrometheusPython
Legal Tech • Software
As a Senior Site Reliability Engineer, you will lead reliability initiatives, design and maintain systems, enhance CI/CD pipelines, and mentor junior engineers while ensuring system availability and performance.
Top Skills:
AWSBashCloudwatchEc2EksIamKubernetesLambdaPowershellPythonS3
Healthtech
Design, scale, and operate secure AWS cloud infrastructure (EKS, IAM, RBAC); build and maintain IaC (Terraform/Terragrunt), GitHub Actions CI/CD, Datadog observability, and Python automation; document runbooks, participate in on-call rotations, postmortems, and Agile workflows to improve reliability and security.
Top Skills:
AWSDatadogEc2EksFargateGithub ActionsGithub Advanced SecurityHelmIamJIRAKubernetesLambdaPythonRbacSecrets ManagerServerlessTerraformTerragruntVpc
Semiconductor • Manufacturing
The role involves leading reliability initiatives, designing patterns for AI operations, managing SLOs, and mentoring junior engineers. You'll ensure platform resilience and optimize CI/CD pipelines for an AI-first intelligence platform.
Top Skills:
AWSBashCloudwatchDatadogDockerEksGitopsJavaKubernetesLambdaPythonSpring BootTerraform
Let Your Resume Do The Work
Upload your resume to be matched with jobs you're a great fit for.
Success! We'll use this to further personalize your experience.
Top Denver & Boulder, CO Companies Hiring Remote Senior Site Reliability Engineers
See AllPopular Job Searches
All Filters
Total selected ()
No Results
No Results
.png)


.png)
















