Lead observability and incident management efforts: define SLIs/SLOs, build monitoring/alerting, dashboards, logging, and tracing. Drive incident response, postmortems, and reliability improvements to reduce MTTD/MTTR. Integrate observability into CI/CD, maintain AWS and Kubernetes infrastructure, automate operations, and mentor engineers on SRE best practices.
Filevine is a Legal AI company delivering Legal Operating Intelligence for the future of legal work. Grounded in a singular system of truth, Filevine brings together data, documents, workflows, and teams into one unified platform—where modern legal work happens with clarity and consistency.
Powered by LOIS, the Legal Operating Intelligence System, Filevine connects context across every matter to transform legal operations from reactive to proactive. LOIS reads, understands, and reasons across your data to surface insight, automate complexity, and give professionals the clarity and confidence to see more, know more, and do more. Fueled by a team of exceptional collaborators and innovators, Filevine’s rapid growth has earned AI awards and recognition from Deloitte and Inc. as one of the most innovative and fastest-growing technology companies in the country.
What you will do
- Own and evolve observability strategy, including monitoring, alerting, dashboards, logging, and distributed tracing.
- Define and manage SLIs, SLOs, and reliability metrics.
- Lead incident response, postmortems, and continuous improvement initiatives.
- Improve MTTD and MTTR through automation and operational excellence.
- Integrate observability into CI/CD pipelines and software delivery workflows.
- Build and maintain reliable cloud infrastructure on AWS and Kubernetes.
- Mentor engineers and promote SRE best practices across the organization
What we are looking for
- 8+ years of experience in software engineering, infrastructure, or operations.
- 5+ years of Site Reliability Engineering experience.
- Deep expertise with observability platforms such as New Relic, Datadog, Dynatrace, Grafana, or Prometheus.
- Strong experience with monitoring, alerting, incident management, and reliability engineering practices.
- Hands-on experience with AWS, Kubernetes, and cloud-native technologies.
- Proficiency in Python, Bash, PowerShell, or similar scripting languages.
- Excellent communication and collaboration skill
Preferred Experience
- Leading observability platform implementations or migrations at scale.
- Building SLI/SLO frameworks and reliability programs.
- Experience with OpenTelemetry, distributed tracing, and modern observability architectures.
Cool Company Benefits:
- A dynamic, rapidly growing company, focused on helping organizations thrive
- Medical, Dental, & Vision Insurance (for full-time employees)
- Competitive & Fair Pay
- Maternity & paternity leave (for full-time employees)
- Short & long-term disability
- Opportunity to learn from a dedicated leadership team
- Top-of-the-line company swag
Privacy Policy Notice
Filevine will handle your personal information according to what’s outlined in our Privacy Policy.
Communication about this opportunity, or any open role at Filevine, will only come from representatives with email addresses using "filevine.com". Other addresses reaching out are not affiliated with Filevine and should not be responded to.
Similar Jobs
Fintech • Real Estate
The Senior Site Reliability Engineer executes reliability strategies, designs and maintains infrastructure, improves monitoring and deployment processes, collaborates with teams for system reliability and performance optimization.
Top Skills:
Automated Configuration ManagementAutomated ProvisioningAWSAzureAzure StorageCloud-Based SolutionsContainerization SolutionsGCPGitJIRALinuxMariadbMySQLRdsSQL ServerUnixWindows
Big Data • Cloud • Software • Database
As a Senior Site Reliability Engineer, you'll design and build complex systems, support Atlas platform operations, automate processes, and ensure high availability of services.
Top Skills:
AWSAzureDnsGCPGoHTTPLinuxPythonRubyTls
Information Technology • Insurance • Professional Services • Software
The Senior Site Reliability Engineer will enhance system reliability through infrastructure automation, support core applications, and optimize performance, collaborating with development teams on deployment processes.
Top Skills:
AWSCdktfCircleCICloudfrontDockerElasticsearchGithub ActionsJenkinsLambdaRdsRedisRuby On RailsS3TerraformTypescript
What you need to know about the Colorado Tech Scene
With a business-friendly climate and research universities like CU Boulder and Colorado State, Colorado has made a name for itself as a startup ecosystem. The state boasts a skilled workforce and high quality of life thanks to its affordable housing, vibrant cultural scene and unparalleled opportunities for outdoor recreation. Colorado is also home to the National Renewable Energy Laboratory, helping cement its status as a hub for renewable energy innovation.
Key Facts About Colorado Tech
- Number of Tech Workers: 260,000; 8.5% of overall workforce (2024 CompTIA survey)
- Major Tech Employers: Lockheed Martin, Century Link, Comcast, BAE Systems, Level 3
- Key Industries: Software, artificial intelligence, aerospace, e-commerce, fintech, healthtech
- Funding Landscape: $4.9 billion in VC funding in 2024 (Pitchbook)
- Notable Investors: Access Venture Partners, Ridgeline Ventures, Techstars, Blackhorn Ventures
- Research Centers and Universities: Colorado School of Mines, University of Colorado Boulder, University of Denver, Colorado State University, Mesa Laboratory, Space Science Institute, National Center for Atmospheric Research, National Renewable Energy Laboratory, Gottlieb Institute



