The Senior Site Reliability Engineer will deploy and maintain observability infrastructure, automate processes, and troubleshoot complex systems within DoD networks.
ABOUT THE ROLE
Second Front Systems' (2F) Product team is seeking a highly skilled and motivated Senior Site Reliability Engineer to join our Observability team. We are a small team working to accelerate the deployment of emerging technology into national security use-cases. We are seeking technical professionals who want to operate on the front lines of an exciting and disruptive mission.
As a Senior SRE for Second Front Systems, you'll be responsible for deploying, maintaining, and scaling our observability infrastructure across multiple DoD networks. You'll work with Kubernetes-based platforms, BigBang charts from DoD Platform One, and build automation to make our monitoring stack easier to deploy for new customers. You'll be empowered to collaborate with others to implement infrastructure that delivers unique capabilities for our commercial and government customers, including the Department of Defense.
The Observability team is looking for a strong SRE with deep DevSecOps and Kubernetes experience. Someone who has deployed and maintained monitoring infrastructure at scale, with an eye for security in highly-regulated environments. Experience with DoD software deployments, Platform One, and single-tenant architectures is highly valued.
We are a fast-growing entrepreneurial team working at the convergence of technology and national security. If this type of effort interests you, come join us!
Note: This position requires U.S. citizenship due to government contract requirements.
Candidates must be located in the following geographic areas: DMV (DC/Maryland/Virginia), Raleigh/Durham/Chapel Hill, Denver/Colorado Springs, and Dallas/Fort Worth.
What You’ll Do
- Deploy and maintain observability stack (Grafana, Mimir, Prometheus) across multiple customer clusters and DoD networks
- Build Helm chart abstractions and automation to streamline monitoring deployments for new customers
- Troubleshoot and debug complex Kubernetes issues, networking problems, and monitoring stack failures
- Configure and maintain BigBang charts and DoD Platform One integrations
- Design and implement infrastructure automation using tools like Pulumi, ArgoCD, and Flux
- Work with Istio service mesh and Keycloak for authentication in secure environments
- Monitor and optimize performance of monitoring infrastructure across multiple environments
- Collaborate with security teams to ensure compliance with NIST requirements and DoD standards
- Participate in on-call rotation and incident response for production environments
Skills You’ll Bring to Our Team
- 5+ years of Site Reliability Engineering or DevOps experience
- Deep experience with Kubernetes administration, troubleshooting, and scaling
- Hands-on experience deploying and maintaining observability tools (Prometheus, Grafana, Mimir/Cortex)
- Strong understanding of Helm charts, GitOps practices, and CNCF tooling
- Experience with service mesh technologies (Istio preferred)
- Proven ability to debug complex distributed systems and networking issues
- Understanding of authentication systems and security in regulated environments
- Ability to work independently and collaborate with team members in a remote environment
Preferred Qualifications
- Active security clearance or ability to obtain a Secret-level security clearance
- Previous experience with DoD software deployments and Platform One
- Experience with BigBang charts and Iron Bank containers
- Experience working in national security or highly regulated environments
- Familiarity with compliance frameworks (NIST, FedRAMP, etc.)
- Experience with infrastructure as code (Pulumi, Terraform)
Technologies we Use
- Observability: Grafana stack, Prometheus, custom alerting tools
- Kubernetes: Helm, ArgoCD, Flux, Tekton, BigBang charts
- Security: Istio, Keycloak, Kyverno
- Infrastructure: AWS/GCP/Azure, Pulumi, Git/GitLab
- Languages: YAML, Bash, Go
Top Skills
Argocd
AWS
Azure
Flux
GCP
Git
Grafana
Helm
Istio
Keycloak
Kubernetes
Prometheus
Pulumi
Similar Jobs
Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse
The Senior Site Reliability Engineer will design, implement, and maintain an observability platform, ensuring reliability and performance while supporting production systems and optimizing operational practices.
Top Skills:
DockerGoGrafanaKubernetesLinuxNetworkingOpenstackOpentelemetryPerlPrometheusPythonRuby
Artificial Intelligence • Edtech
As a Senior Site Reliability Engineer, you will lead observability efforts, drive instrumentation strategy, and enhance infrastructure resilience through collaboration with product and engineering teams.
Top Skills:
AWSDatadogGCPGrafanaLokiOpentelemetryPrometheusTerraform
Big Data • Cloud • Software • Database
The Senior Site Reliability Engineer will enhance observability services, ensure system availability, and collaborate with teams on monitoring best practices.
Top Skills:
AWSFluentbitGCPJaegerKubernetesAzureQuickwitSplunkVectorVictoriametrics
What you need to know about the Colorado Tech Scene
With a business-friendly climate and research universities like CU Boulder and Colorado State, Colorado has made a name for itself as a startup ecosystem. The state boasts a skilled workforce and high quality of life thanks to its affordable housing, vibrant cultural scene and unparalleled opportunities for outdoor recreation. Colorado is also home to the National Renewable Energy Laboratory, helping cement its status as a hub for renewable energy innovation.
Key Facts About Colorado Tech
- Number of Tech Workers: 260,000; 8.5% of overall workforce (2024 CompTIA survey)
- Major Tech Employers: Lockheed Martin, Century Link, Comcast, BAE Systems, Level 3
- Key Industries: Software, artificial intelligence, aerospace, e-commerce, fintech, healthtech
- Funding Landscape: $4.9 billion in VC funding in 2024 (Pitchbook)
- Notable Investors: Access Venture Partners, Ridgeline Ventures, Techstars, Blackhorn Ventures
- Research Centers and Universities: Colorado School of Mines, University of Colorado Boulder, University of Denver, Colorado State University, Mesa Laboratory, Space Science Institute, National Center for Atmospheric Research, National Renewable Energy Laboratory, Gottlieb Institute