Maximum of 25 job preferences reached.
Top Remote Senior Site Reliability Engineer Jobs in Denver & Boulder, CO
Reposted 23 Hours AgoSaved
Easy Apply
Easy Apply
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
Own reliability, automation, and DevOps for Coinbase's corporate IAM platform: on-call/incident response, CI/CD and IaC pipelines, identity lifecycle tooling, observability and disaster recovery, documentation, and cross-team IAM advisement to ensure secure, scalable access for a global workforce.
Top Skills:
AbacAuth0AWSAzureC#Ci/CdContainer OrchestrationDuoEntraidGCPGenerative AiGitGoIacJavaMfaOktaPingPythonRbacRubySsoTerraform
Reposted 23 Hours AgoSaved
Easy Apply
Easy Apply
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
Senior SRE on the IT Operations team owning reliability, monitoring, and incident response for AI infrastructure. Build automation, CI/CD and Kubernetes tooling, improve observability and documentation, and develop internal full-stack tools using Go or Python. Partner with Infrastructure, Security, and Compliance to scale secure, resilient AI deployment pipelines.
Top Skills:
AnsibleAWSBashChefCi/CdDockerEc2GitGoKubernetesLinuxPuppetPythonRubySaltTerraform
HR Tech • Information Technology • Professional Services • Sales • Software
Own and operate production-grade Kubernetes infrastructure on AWS, build GitOps CI/CD with GitHub Actions and ArgoCD, develop AI agents and internal DevOps tooling, maintain Datadog-based observability, and manage on-call incident response while collaborating with engineering teams to improve reliability and delivery speed.
Top Skills:
Ai/LlmArgocdAWSCi/CdDatadogGithub ActionsGitopsGoKubernetesPython
Reposted 6 Days AgoSaved
Easy Apply
Easy Apply
Big Data • Cloud • Software • Database
Develop and maintain Kubernetes runtime environments, support developers, resolve critical issues, and participate in on-call rotations for production systems.
Top Skills:
AWSAzureCert-ManagerCorednsCrdsCriCsiGatekeeperGCPGoHelmKubernetesKustomizeOperatorsPythonTerraform
eCommerce • Fintech • Payments • Software
The role involves ensuring software reliability and performance, managing incidents, developing infrastructure automation, and mentoring junior engineers within a platform team.
Top Skills:
AWSCloudFormationDatadogKubernetesOpentelemetryRubyRuby On RailsTerraform
Information Technology • Insurance • Software
The Sr. Site Reliability Engineer at Vertafore will own the reliability and performance of production services, design incident response protocols, and enhance system observability while applying software engineering practices.
Top Skills:
.NetAWSC#Ci/CdJavaKubernetesLinuxPythonReactWindows
Cloud • Software
The Senior Site Reliability Engineer will automate operations using Python, manage Kubernetes and OpenStack clusters, and ensure high availability for enterprise infrastructures.
Top Skills:
KubernetesLinuxOpenstackPython
Software
As a Senior Site Reliability Engineer at Regrello, you'll shape the developer platform, collaborate with customers, and ensure the reliability and security of infrastructure and applications.
Top Skills:
AWSAzureCircleCIGCPGithub ActionsGitlab CiGoKubernetesTerraform
Artificial Intelligence • Consumer Web • Digital Media • Information Technology • Social Impact • Software
Lead SRE work to keep Circle highly available and performant: respond to incidents, own monitoring/alerting/log management, manage and optimize MySQL/Postgres/ClickHouse/Redis databases, maintain server infrastructure and deployment pipelines, collaborate with engineering teams, and build internal SRE tooling and automation.
Top Skills:
AWSClickhouseKubernetesLlm-Based Tools (Copilots)MySQLPostgresRedis
Reposted 2 Days AgoSaved
Easy Apply
Easy Apply
Big Data • Cloud • Software • Database
As a Senior Site Reliability Engineer, you'll design and build complex systems, support Atlas platform operations, automate processes, and ensure high availability of services.
Top Skills:
AWSAzureDnsGCPGoHTTPLinuxPythonRubyTls
Information Technology • Software • Consulting
The Senior SRE will design and implement automated Dynatrace configurations, integrate REST APIs, and develop TypeScript tooling for platform reliability, while ensuring observability and automation practices are followed.
Top Skills:
APIsAws CloudformationAws CodebuildAws LambdaDynatraceTypescript
Cloud • Information Technology
As a Sr. Site Reliability Engineer, you'll ensure service reliability, build automation, and collaborate on infrastructure improvements while mentoring others.
Top Skills:
AnsibleCatchpointDockerElkGoGrafanaHashicorp VaultJenkinsKubernetesLinuxPrometheusPythonTerraform
New
Track Smarter, Apply Better.
Ditch the spreadsheets. Organize your job search with our freeApplication Tracker.
Use For Free
Reposted 2 Days AgoSaved
Other • Social Impact
As a Senior Site Reliability Engineer, you will manage and improve Wikimedia's infrastructure, handle operational tasks, automate processes, and provide mentorship while participating in a 24/7 on-call rotation.
Top Skills:
AnsibleBashDebianGoGrafanaHhvmKubernetesMemcachedPHPPrometheusPuppetPythonRedisRuby
Cloud • Security • Software • Cybersecurity
Design and maintain reliable infrastructure solutions for a cloud data protection platform. Ensure application scalability and support through CI/CD and monitoring tools while collaborating in a global team.
Top Skills:
AppinsightsAws CloudformationAzure Api ManagementAzure Arm TemplatesAzure Cosmos DbAzure DevopsAzure Entra IdAzure FunctionsAzure MonitorAzure Storage ServicesBashBitbucketElastic StackGitGoMicrosoft TfsPowershellPythonServerless FrameworkTerraform
Cloud • Software
Design, implement, and support Kubernetes and compute platforms in a private cloud. Oversee architecture and standardization across hardware, OS, and cloud orchestration.
Top Skills:
AnsibleBashCi/CdHelmKubernetesLinuxOpenstackPythonTerraformUbuntu
Artificial Intelligence • Blockchain • Information Technology • Consulting
Lead design and build of production-grade Azure infrastructure using Terraform, ensuring scalable, secure, and repeatable deployments. Provide technical leadership, platform enhancements, observability and incident response improvements, and Tier 2 infrastructure support while collaborating with engineering, security, and product teams to meet enterprise readiness and feature parity goals.
Top Skills:
ArgoAzureGoGrafanaKubernetesPrometheusPythonSpaceliftTerraform
Digital Media • Software • Sports
Seeking a Senior Site Reliability Engineer to enhance system reliability, performance, and scalability. Focus on automation, observability, and improving CI/CD practices while collaborating with engineering teams for better incident response and metrics improvement.
Top Skills:
AWSAzureC++Ci/CdDatadogDockerElkGCPGoGrafanaJavaKubernetesLinuxPrometheusPythonTerraform
5 Days AgoSaved
Robotics • Software
Own reliability across vehicle and cloud stacks for AUV operations: onboard Jetson/ROS2 compute, topside systems, cloud ingestion/processing and customer platform. Build automation, observability, runbooks, and self-recovery to reduce on-call toil; manage AWS infrastructure, IaC, container orchestration, and reliability targets. Participate in shared 12-hour on-call shifts and field deployments, mentor team on operational excellence.
Top Skills:
AWSBashContainerizationDockerGoGrafanaIamJetsonKubernetesLinuxPrometheusPythonRosRos 2Terraform
Reposted 5 Days AgoSaved
Artificial Intelligence • Cloud • Information Technology • Software
Design and operate large-scale GPU infrastructure for distributed AI training, ensuring reliability, performance, and efficient customer partnerships.
Top Skills:
AnsibleCudaDeepspeedFsdpGpuHelmInfinibandKubernetesLinuxMegatronNcclNvidia A100Nvidia B200Nvidia H100NvlinkPyTorchRoceTerraform
Cloud • Security • Software • Cybersecurity
Lead reliability, automation, and observability for high-density AI hardware infrastructure. Build Python-based IaC tooling, telemetry pipelines, Prometheus/Grafana dashboards, and AI-assisted tooling. Run 24x7 incident response, coordinate vendors and field technicians, define operational readiness, and drive post-mortems to improve uptime and performance.
Top Skills:
Bare-MetalBgpGrafanaIpv4Ipv6LlmsLokiOpentelemetryPagerdutyPrivate CloudPrometheusPythonSlackTimeseries EnginesVirtualized Environments
Cloud • Security • Software • Cybersecurity
Design, build, and operate scalable infrastructure and CI/CD/IaC systems. Implement observability (monitoring, logging, alerting), automate reliability improvements, mentor engineers, collaborate on incident response, and participate in on-call rotations to maintain Akamai Cloud services.
Top Skills:
AlertingAnsibleBashChefCi/CdGithub ActionsGitlab Ci/CdGoInfrastructure As CodeJenkinsLoggingMonitoringPuppetPythonSaltstackTelemetryTerraform
Software
Drive SRE practices for VA enterprise healthcare platforms: automate infrastructure and CI/CD, define SLIs/SLOs, improve observability and reliability, support incident response, and ensure cloud-native, secure, compliant operations in AWS and containerized environments.
Top Skills:
AnsibleAWSBashCi/CdCloudwatchDockerEcsEksElkGoGrafanaInfrastructure As CodeKubernetesLinuxOpentelemetryPowershellPrometheusPythonSplunkTerraform
Reposted 11 Days AgoSaved
Easy Apply
Easy Apply
Big Data • Cloud • Software • Database
The Senior Site Reliability Engineer will develop and support distributed storage services, ensuring reliability and operational safety, with a focus on automation and efficiency.
Top Skills:
AWSAzureDnsGoGoogle Cloud PlatformKubernetesLinuxPythonTcp/IpTls
Artificial Intelligence • Information Technology • Software • Database
As a Site Reliability Engineer, you will design, implement, and maintain scalable infrastructure, ensure system reliability, automate processes, and collaborate with engineering teams.
Top Skills:
DockerElk StackGoGrafanaJavaKubernetesNode.jsPrometheusPulumiPythonRubyTerraform
Reposted 8 Days AgoSaved
Other • Social Impact
As a Senior Site Reliability Engineer, you will design, develop, and maintain reliable infrastructure for Wikimedia's API services, ensuring performance and availability while driving reliability engineering practices and improving developer experience.
Top Skills:
AnsibleArgocdAWSAzureGCPGitlabGoKubernetesOpentelemetryPrometheusPythonTerraform
Let Your Resume Do The Work
Upload your resume to be matched with jobs you're a great fit for.
Success! We'll use this to further personalize your experience.
Top Denver & Boulder, CO Companies Hiring Remote Senior Site Reliability Engineers
See AllPopular Job Searches
All Filters
Total selected ()
No Results
No Results
.png)


.png)






















