Sauce Labs

Senior Systems Engineer

Posted Yesterday

Be an Early Applicant

In-Office or Remote

2 Locations

135K-165K Annually

Senior level

In-Office or Remote

2 Locations

135K-165K Annually

Senior level

The Senior Systems Engineer is responsible for designing and operating scalable Kubernetes clusters, infrastructure automation, and optimization within a hybrid cloud environment.

The summary above was generated by AI

Location preference: Remote based out of Atlanta, GA, or Raleigh, NC.

About Us:

At Sauce Labs, we empower the world's top enterprises - like Walmart, Bank of America, and Indeed - to deliver quality web and mobile applications at speed. Our industry-leading platform ensures continuous quality across the SDLC, using AI-powered analytics to identify key quality signals from development through production. With our unified solution, teams can release and innovate with confidence, knowing their apps will always look, function, and perform exactly as they should. Backed by TPG and Riverwood Capital, we are shaping the future of digital confidence - join us!

The Role:

Sauce Labs is looking for a Senior Systems Engineer to join our high-performance Ops Team. This critical role is responsible for the architecture, operation, and massive scaling of the hybrid cloud infrastructure and software that powers Sauce Labs, launching over 10 million VMs a month. You will be a key decision-maker in how we evolve our platforms for the future, with a strong focus on building and optimizing our extensive Kubernetes environment.

Responsibilities:

Kubernetes and Cloud Native Architecture: Lead the design, deployment, and lifecycle management of highly available, scalable Kubernetes clusters across both our data centers and public cloud providers (GCP, AWS).
Infrastructure Automation: Write and maintain expert-level infrastructure-as-code using Terraform to deploy and manage services in our hybrid cloud environment. Develop robust automation and self-service tooling in Python or Go to empower engineering teams.
System and Hardware Operations: Install, configure, debug, and manage a diverse range of hardware and systems in our global data centers, including Dell, SuperMicro, storage arrays (NAS/SAN), and custom mobile device appliances.
Scalability and Performance Engineering: Creatively solve complex scaling challenges within our rapidly expanding environment. Optimize hardware, hypervisor (KVM-Qemu), and Kubernetes configurations to enhance performance and efficiency.
Observability and Monitoring: Engineer and enhance our observability stack (Prometheus, Grafana) to provide deep insights into the health and performance of our Kubernetes clusters, applications, and underlying infrastructure.
Disaster Recovery and Resiliency: Design, implement, and maintain robust disaster recovery strategies for critical production services, with a focus on multi-cluster and multi-region Kubernetes deployments.
Bare Metal Provisioning: Automate the deployment and lifecycle management of operating systems on bare metal servers using tools like PXE and Foreman.
Documentation and Runbooks: Create and maintain clear, comprehensive documentation, architectural diagrams, and NOC runbooks for the environments you manage.
Troubleshooting: Act as a senior escalation point for complex troubleshooting of application, server, and network issues within our containerized and virtualized environments.
On-Call: Participate in a 24x7 on-call rotation to ensure the stability and availability of the Sauce Labs platform.

Examples of Projects You Might Work On:

Architecting and implementing a multi-cluster Kubernetes strategy using federation or service mesh technologies to improve global service delivery.
Developing custom Kubernetes Operators in Go or Python to automate complex application lifecycle management.
Leading the migration of a stateful, on-premise service into a resilient, cloud-native architecture running on GKE or EKS.
Building a new pipeline-based tooling to fully automate the lifecycle of Kubernetes cluster nodes from bare metal provisioning to decommissioning.
Driving the evolution of our Prometheus-based monitoring stack to handle ever-increasing scale and provide richer, more actionable insights for developers.

Required Skills:

Proven ability to execute on high-level goals independently and to lead technical initiatives within cross-functional teams.
5+ years of experience as a Linux administrator/engineer at scale (hundreds of systems), with a deep understanding of designing and deploying highly available solutions.
3+ years of recent, hands-on professional experience architecting, operating, and scaling Kubernetes clusters in a large-scale production environment.
Expertise in Configuration Management solutions, preferably Ansible, for managing infrastructure at scale.
Strong skills in at least one programming language: Python (preferred) or Go.
Solid experience in Linux performance tuning, profiling, and monitoring.
Deep experience deploying and managing services in GCP and/or AWS using Terraform.
Experience with virtualization technologies, specifically KVM-Qemu.
A solid understanding of cloud, networking, and distributed computing concepts (TCP/IP, firewalls, VLANs, load balancing, etc.).
Experience with testing frameworks for infrastructure automation (e.g., InSpec, Ansible Molecule).
Familiarity with ZFS on Linux and managing storage appliances (iSCSI, NFS).
Deep experience with modern observability tooling (Prometheus, Grafana).
Excellent communication skills (verbal and written) and the ability to collaborate effectively across all levels of the organization.
Familiarity with software engineering best practices and agile methodologies.

We are a hybrid workplace that recognizes the importance of flexibility while valuing in-person collaboration and relationship building. As a result, Saucers located near an office location must be able and willing to come into the office. Those hired remotely must be able and willing to travel to an office as required by the specific role.

Please note our privacy terms when applying for a job at Sauce Labs.

Sauce Labs is proud to be an Equal Opportunity employer and values diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender identity/expression/status, sexual orientation, age, marital status, veteran status or disability status.

Security responsibilities at Sauce

At Sauce, we will commit to supporting the health and safety of employees and properties, partnering with internal stakeholders to learn and act on ever-evolving security protocols and procedures. You’ll be expected to fully comply with all policies and procedures related to security at the department and org wide level and exercise a ‘security first’ approach to how we design, build & run our products and services.

We are excited to share the base salary for this position exclusive of fringe benefits, potential bonuses or stock-based compensation. Your base salary compensation will be determined based on factors such as geographic location, skills, education, and/or experience, along with its relationship to the base salaries of current team members at Sauce Labs that are similarly situated.
Benefits and Perks that we offer include health coverage (medical, dental, and vision) along with disability and life insurance. In addition, Sauce Labs offers parental leave benefits, flexible time off, professional development, and a 401(k) retirement plan with match. To see more about benefits and perks at Sauce Labs, please check out our careers page at saucelabs.com/company/careers.

US Compensation Range

$135,000—$165,000 USD

Top Skills

Ansible

AWS

GCP

Grafana

Kubernetes

Kvm-Qemu

Prometheus

Python

Terraform

Zfs

Similar Jobs

PwC

Systems Engineer

6 Days Ago

Remote or Hybrid

77K-202K Annually

Mid level

77K-202K Annually

Mid level

Artificial Intelligence • Professional Services • Business Intelligence • Consulting • Cybersecurity • Generative AI

The role involves designing data architecture strategies, collaborating with stakeholders, analyzing complex problems, and mentoring team members in data engineering and analytics.

Top Skills: DockerJavaPythonScalaSQL

Dandy

Senior UX Engineer, CAD/3D Systems (North America)

9 Days Ago

Remote

USA

177K-208K

Senior level

177K-208K

Senior level

Computer Vision • Healthtech • Information Technology • Logistics • Machine Learning • Software • Manufacturing

As a Senior UI/UX Engineer, you'll create user interfaces for 3D dental design products, collaborating with cross-functional teams to enhance user experience and integrate technologies.

Top Skills: EmscriptenGCPNode.jsReactThree.JsTypescriptWasmWebgl

Dandy

Senior UX Engineer, CAD/3D Systems (Europe)

9 Days Ago

Remote

USA

Senior level

Computer Vision • Healthtech • Information Technology • Logistics • Machine Learning • Software • Manufacturing

Design and implement user interfaces for 3D CAD applications, collaborating with cross-functional teams to enhance user experiences and workflows.

Top Skills: EmscriptenGCPNode.jsReactThree.JsTypescriptWasmWebgl

What you need to know about the Colorado Tech Scene

With a business-friendly climate and research universities like CU Boulder and Colorado State, Colorado has made a name for itself as a startup ecosystem. The state boasts a skilled workforce and high quality of life thanks to its affordable housing, vibrant cultural scene and unparalleled opportunities for outdoor recreation. Colorado is also home to the National Renewable Energy Laboratory, helping cement its status as a hub for renewable energy innovation.

Key Facts About Colorado Tech

Number of Tech Workers: 260,000; 8.5% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Lockheed Martin, Century Link, Comcast, BAE Systems, Level 3
Key Industries: Software, artificial intelligence, aerospace, e-commerce, fintech, healthtech
Funding Landscape: $4.9 billion in VC funding in 2024 (Pitchbook)
Notable Investors: Access Venture Partners, Ridgeline Ventures, Techstars, Blackhorn Ventures
Research Centers and Universities: Colorado School of Mines, University of Colorado Boulder, University of Denver, Colorado State University, Mesa Laboratory, Space Science Institute, National Center for Atmospheric Research, National Renewable Energy Laboratory, Gottlieb Institute