Maximum of 25 job preferences reached.
Top Remote Senior Site Reliability Engineer Jobs in Denver & Boulder, CO
eCommerce • Fintech • Payments • Software
The role involves ensuring software reliability and performance, managing incidents, developing infrastructure automation, and mentoring junior engineers within a platform team.
Top Skills:
AWSCloudFormationDatadogKubernetesOpentelemetryRubyRuby On RailsTerraform
Information Technology • Insurance • Software
The Sr. Site Reliability Engineer at Vertafore will own the reliability and performance of production services, design incident response protocols, and enhance system observability while applying software engineering practices.
Top Skills:
.NetAWSC#Ci/CdJavaKubernetesLinuxPythonReactWindows
Artificial Intelligence • Consumer Web • Digital Media • Information Technology • Social Impact • Software
Lead SRE work to keep Circle highly available and performant: respond to incidents, own monitoring/alerting/log management, manage and optimize MySQL/Postgres/ClickHouse/Redis databases, maintain server infrastructure and deployment pipelines, collaborate with engineering teams, and build internal SRE tooling and automation.
Top Skills:
AWSClickhouseKubernetesLlm-Based Tools (Copilots)MySQLPostgresRedis
12 Days AgoSaved
Easy Apply
Easy Apply
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
Own reliability, automation, and DevOps for Coinbase's corporate IAM platform: on-call/incident response, CI/CD and IaC pipelines, identity lifecycle tooling, observability and disaster recovery, documentation, and cross-team IAM advisement to ensure secure, scalable access for a global workforce.
Top Skills:
AbacAuth0AWSAzureC#Ci/CdContainer OrchestrationDuoEntraidGCPGenerative AiGitGoIacJavaMfaOktaPingPythonRbacRubySsoTerraform
12 Days AgoSaved
Easy Apply
Easy Apply
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
Senior SRE on the IT Operations team owning reliability, monitoring, and incident response for AI infrastructure. Build automation, CI/CD and Kubernetes tooling, improve observability and documentation, and develop internal full-stack tools using Go or Python. Partner with Infrastructure, Security, and Compliance to scale secure, resilient AI deployment pipelines.
Top Skills:
AnsibleAWSBashChefCi/CdDockerEc2GitGoKubernetesLinuxPuppetPythonRubySaltTerraform
Healthtech • Information Technology • Software • Telehealth
The Senior Site Reliability Engineer will develop, monitor, and maintain distributed production systems, ensuring uptime for patients and providers while automating processes and supporting a large engineering team.
Top Skills:
AWSDockerGCPKubernetes
HR Tech • Information Technology • Professional Services • Sales • Software
Own and operate production-grade Kubernetes infrastructure on AWS, build GitOps CI/CD with GitHub Actions and ArgoCD, develop AI agents and internal DevOps tooling, maintain Datadog-based observability, and manage on-call incident response while collaborating with engineering teams to improve reliability and delivery speed.
Top Skills:
Ai/LlmArgocdAWSCi/CdDatadogGithub ActionsGitopsGoKubernetesPython
Reposted 18 Days AgoSaved
Easy Apply
Easy Apply
Big Data • Cloud • Software • Database
Develop and maintain Kubernetes runtime environments, support developers, resolve critical issues, and participate in on-call rotations for production systems.
Top Skills:
AWSAzureCert-ManagerCorednsCrdsCriCsiGatekeeperGCPGoHelmKubernetesKustomizeOperatorsPythonTerraform
Legal Tech • Software
Lead observability and incident management efforts: define SLIs/SLOs, build monitoring/alerting, dashboards, logging, and tracing. Drive incident response, postmortems, and reliability improvements to reduce MTTD/MTTR. Integrate observability into CI/CD, maintain AWS and Kubernetes infrastructure, automate operations, and mentor engineers on SRE best practices.
Top Skills:
AWSBashCi/CdDatadogDistributed TracingDynatraceGrafanaKubernetesNew RelicOpentelemetryPowershellPrometheusPython
eCommerce
Ensure reliability and availability of Tradeweb's global AWS platform through IaC automation, observability and SLO definition, incident triage and resolution, on-call duties, collaboration with development teams, and security-focused platform improvements.
Top Skills:
ArgocdAWSAws LambdaEksGitsecopsInfrastructure As Code (Iac)Kubernetes (K8S)KustomizeLgtmLinux/UnixPulumiPythonSmsSns
Healthtech
Design, scale, and operate secure AWS cloud infrastructure (EKS, IAM, RBAC); build and maintain IaC (Terraform/Terragrunt), GitHub Actions CI/CD, Datadog observability, and Python automation; document runbooks, participate in on-call rotations, postmortems, and Agile workflows to improve reliability and security.
Top Skills:
AWSDatadogEc2EksFargateGithub ActionsGithub Advanced SecurityHelmIamJIRAKubernetesLambdaPythonRbacSecrets ManagerServerlessTerraformTerragruntVpc
Edtech
Lead SRE work to improve availability, reliability, observability, and security for a distributed SaaS platform. Build and maintain IaC (Terraform, CloudFormation), support CI/CD, manage containerized production environments (Kubernetes/EKS), run disaster recovery exercises, participate in on-call rotation, collaborate cross-functionally, and mentor teams while integrating tooling including AI into SRE workflows.
Top Skills:
.NetAnsibleAws EksCi/CdCloudFormationDockerJavaJavaScriptKubernetesPythonTerraform
New
Track Smarter, Apply Better.
Ditch the spreadsheets. Organize your job search with our freeApplication Tracker.
Use For Free
Reposted 10 Days AgoSaved
Easy Apply
Easy Apply
Big Data • Cloud • Software • Database
The Senior Site Reliability Engineer will lead security design and implementation for cloud infrastructures, mentor teams, and automate security solutions.
Top Skills:
AnsibleAWSAzureCloud Security ToolsCloudFormationGCPGoTerraform
Big Data
You will manage AWS infrastructure, automate deployments, debug application issues, and improve the operational health of Metabase Cloud.
Top Skills:
AWSDatadogGoGrafanaKubernetesPrometheusPythonTerraform
Cloud • Software
The Senior Site Reliability Engineer will automate operations using Python, manage Kubernetes and OpenStack clusters, and ensure high availability for enterprise infrastructures.
Top Skills:
KubernetesLinuxOpenstackPython
Software
As a Senior Site Reliability Engineer at Regrello, you'll shape the developer platform, collaborate with customers, and ensure the reliability and security of infrastructure and applications.
Top Skills:
AWSAzureCircleCIGCPGithub ActionsGitlab CiGoKubernetesTerraform
Artificial Intelligence • Software
As a Software Engineer in Reliability, you'll architect and manage multi-cloud GPU infrastructure, ensuring performance, security, and scale while debugging complex hardware/software issues.
Top Skills:
AmdAWSBashGoGpuInfinibandLinuxNvidiaOciPythonRdma
Legal Tech • Software
As a Site Reliability Engineer, you'll develop autonomous systems, improve CI/CD pipelines, mentor junior engineers, and ensure software reliability and security in a 24/7 environment.
Top Skills:
BashPowershellPython
Reposted 14 Days AgoSaved
Easy Apply
Easy Apply
Big Data • Cloud • Software • Database
As a Senior Site Reliability Engineer, you'll design and build complex systems, support Atlas platform operations, automate processes, and ensure high availability of services.
Top Skills:
AWSAzureDnsGCPGoHTTPLinuxPythonRubyTls
3D Printing • Aerospace • Hardware • Robotics • Software
Lead the reliability and scalability of BRINC's production systems, building secure cloud infrastructure and improving incident response. Collaborate with teams for optimal system performance.
Top Skills:
AWSInfrastructure As CodeJavaScriptNode.jsPython
Information Technology • Software • Consulting
The Senior SRE will design and implement automated Dynatrace configurations, integrate REST APIs, and develop TypeScript tooling for platform reliability, while ensuring observability and automation practices are followed.
Top Skills:
APIsAws CloudformationAws CodebuildAws LambdaDynatraceTypescript
Internet of Things
Maintain and evolve an EKS-based Kubernetes platform, build CI/CD pipelines (GitHub Actions, OIDC), manage IaC with Pulumi/Terraform/OpenTofu on AWS, operate observability stack, enforce security best practices, diagnose production incidents, participate in on-call rotation, and produce runbooks and documentation to improve reliability.
Top Skills:
AWSAws Secrets ManagerEksExternal Secrets OperatorGithub ActionsGrafanaIamKubernetesOidcOpentofuPulumiTerraformVectorVictorialogsVictoriametrics
Internet of Things
Operate and evolve an EKS-based Kubernetes platform, build CI/CD pipelines, manage infrastructure as code (Pulumi/Terraform/OpenTofu) across AWS, maintain observability and security practices, respond to incidents and perform post-mortems, participate in on-call rotation, and produce runbooks and architecture documentation while collaborating with distributed engineering teams.
Top Skills:
ArgocdAWSAws Secrets ManagerExternal Secrets OperatorFluxGithub ActionsGrafanaImapKeycloakKubernetes (Eks)OidcOpentofuPulumiSmtpTerraformVectorVictorialogsVictoriametrics
Reposted 13 Days AgoSaved
Other • Social Impact
As a Senior Site Reliability Engineer, you will manage and improve Wikimedia's infrastructure, handle operational tasks, automate processes, and provide mentorship while participating in a 24/7 on-call rotation.
Top Skills:
AnsibleBashDebianGoGrafanaHhvmKubernetesMemcachedPHPPrometheusPuppetPythonRedisRuby
Cloud • Software
Design, implement, and support Kubernetes and compute platforms in a private cloud. Oversee architecture and standardization across hardware, OS, and cloud orchestration.
Top Skills:
AnsibleBashCi/CdHelmKubernetesLinuxOpenstackPythonTerraformUbuntu
Let Your Resume Do The Work
Upload your resume to be matched with jobs you're a great fit for.
Success! We'll use this to further personalize your experience.
Popular Job Searches
All Filters
Total selected ()
No Results
No Results
.png)


.png)

























