We are building what's next
Nexthink Logo

Nexthink

Platform Site Reliability Engineer

Posted Yesterday
Be an Early Applicant
Remote or Hybrid
Hiring Remotely in Colorado Springs, CO
Senior level
Remote or Hybrid
Hiring Remotely in Colorado Springs, CO
Senior level
Design, implement, and maintain multi-tenant SaaS infrastructure, ensuring reliability, security, and scalability. Collaborate on incident response and system monitoring.
The summary above was generated by AI
Company Description

Nexthink is the leader in digital employee experience management software. The company provides IT leaders with unprecedented insight allowing them to see, diagnose and fix issues at scale impacting employees anywhere, with any application or network, before employees notice the issue. As the first solution to allow IT to progress from reactive problem solving to proactive optimization, Nexthink enables its more than 1,200 customers to provide better digital experiences to more than 15 million employees. Dual headquartered in Lausanne, Switzerland and Boston, Massachusetts, Nexthink has 9 offices worldwide.

 

Job Description

Nexthink is looking for a strong Platform Engineer with SRE operations experience to strengthen our infrastructure and accelerate our ability to deploy, monitor, and scale systems effectively. As a SaaS provider, our customers rely on us to deliver a seamless, reliable, and scalable experience 24/7.

Join Nexthink's vibrant team where cutting-edge technology meets innovation. Be a part of Nexthink's Digital Employee Experience technological revolution, ensuring our global customers enjoy a seamless user experience. Embrace the future with Nexthink in US; apply now and become a key player in our dynamic Platform Engineering/SRE organization.

What You'll Do:

  • Design, build, and maintain the infrastructure powering our multi-tenant SaaS platform with reliability, security, and scalability in mind.
  • Implement and manage cloud-native systems (AWS) using best-in-class tools and automation.
  • Operate and enhance Kubernetes clusters, deployment pipelines, and service meshes to support continuous delivery.
  • Establish and enforce SLOs, SLAs, and error budgets, and proactively address availability and performance issues.
  • Develop infrastructure as code (Terraform or similar) for repeatable and auditable provisioning.
  • Experience in programming solutions for Platform Tools such as for automation, monitoring, provisioning, using programming technologies.
  • Solid understanding of the network stack (TCP/IP, VPN, HTTP, SSL, routing, etc.), cloud topologies (VPC, Virtual Subnets, NACLS, NSG, ILB, ELB, etc.) and storage (S3, EBS, Azure Files etc).
  • Monitor system health, application performance, and user-facing SLAs using tools like Datadog, Prometheus, Grafana...
  • Be a main actor and improve incident response practices and help reduce mean time to detect (MTTD) and recover (MTTR). Experience in coordinating teams and persons to maintain a SLA.
  • Ability to troubleshoot, narrow down and fix incidents with minimal intervention of other functions.
  • Participate in a shared on-call rotation, responding to incidents, troubleshooting outages, and driving timely resolution and communication.
  • Work closely with software engineers to embed reliability and observability into every service.
  • Develop automated runbooks, health checks, and alerting to support reliable operations with minimal manual intervention.
  • Support automated testing, canary deployments, and rollback strategies to ensure safe, fast, and reliable releases.
  • Contribute to security best practices, compliance automation, and cost optimization.

Qualifications

  • Minimum BS in Computer Science/Engineering
  • 5+ years in an SRE/platform engineering role supporting SaaS platforms.
  • Strong hands-on experience with public cloud services (AWS, GCP, Azure).
  • Proficiency with Kubernetes, container-based deployment and related ecosystems (Helm...), and containerized microservices.
  • Strong programming or scripting skills (Python, Go, Bash...).
  • Experience with CI/CD pipelines (e.g., GitHub Actions, GitLab CI, ArgoCD).
  • Experience with observability stacks (Prometheus, ELK/EFK, Datadog, etc.).
  • Comfort with being part of a rotating on-call schedule, including handling critical incidents and conducting post-incident reviews.
  • Strong system-level troubleshooting skills and a proactive mindset toward incident prevention.
  • Deep understanding of Linux systems, networking, and common troubleshooting practices.
  • Experience supporting multi-tenant microservices architectures.
  • Familiarity with service mesh, e.g., Istio.
  • Knowledge of zero-downtime deployment strategies, blue/green and canary releases.
  • Exposure to compliance standards such as SOC 2, ISO 27001, or HIPAA. FedRAMP experience is a big plus.
  • Experience with chaos engineering or resilience testing practices.

Additional Information

We are the pioneers and trailblazers of a global IT Market Category (DEX) that is shaping the future of how the world works, giving our customers’ IT Teams total digital visibility across their enterprise. Our innovative solutions integrate real-time analytics, automation, and employee feedback across all endpoints. This enables our IT teams to solve complex technical challenges, create ever more productive workplaces, and deliver happy, satisfied employees in the digital workplace.

With over 1000 employees across 5 continents, Nexthink operates as One Team, connecting, collaborating and innovating to continuously grow. We call our employees ‘Nexthinkers’ and our commitment to diversity, inclusion, and equity is second to none. We currently have over 75 nationalities working with us, from all cultures and backgrounds, speaking many different languages.

If you are looking for a change and like a nice atmosphere, lots of challenges, and having fun while working, this is a great opportunity for you! Check what we offer:

  • 🏖️ Flexible Hours and unlimited vacation (employees have unlimited paid time off on top of the 15 days of holidays we offer), 11 company-paid holidays, and 3 extra days for volunteering.
  • 🏡 Hybrid work model that balances office and remote work, with structured onboarding to foster connections and team integration.
  • 📚 Free access to professional training platforms to explore your interests and enhance your skills.
  • 🍼 Up to 16 weeks of paid leave for birthing parents/primary caregivers, 6 weeks for secondary caregivers.
  • 💰 Plan for the future with a 401(k) plan featuring up to 4% company matching contributions, vesting immediately, to grow your retirement savings.
  • 📣 Bonuses for referring successful hires after three months of continuous employment.

Please note that not all the benefits listed above are available for temporary, contract, and internship roles. To ensure you have the most up-to-date information, we recommend checking with your Recruitment Partner.

Total Rewards @ Nexthink

At Nexthink, we offer one of the most comprehensive and generous benefits plans.  Your total rewards compensation package includes base salary and may also include a commission or performance bonus plan, as well as equity.  We provide our US employees with 100% covered company benefits that consist of health, dental, vision as well as access to life insurance, long-term disability, and accidental death/personal loss coverage. 

In addition, we offer: 

  • 🏖️ Flexible Hours and unlimited vacation (employees have unlimited paid time off on top of the 15 days of holidays we offer), 11 company-paid holidays, and 3 extra days for volunteering.
  • 🏡 Hybrid work model that balances office and remote work, with structured onboarding to foster connections and team integration.
  • 📚 Free access to professional training platforms to explore your interests and enhance your skills.
  • 🍼 Up to 16 weeks of paid leave for birthing parents/primary caregivers, 6 weeks for secondary caregivers.
  • 💰 Plan for the future with a 401(k) plan featuring up to 4% company matching contributions, vesting immediately, to grow your retirement savings.
  • 📣 Bonuses for referring successful hires after three months of continuous employment.

Base salary ranges are determined by country, role, level, experience, and skills. The range displayed on each job posting reflects Nexthink’s good faith determination of the minimum and maximum targets for new hire salaries across all US locations. Individual pay is determined by related factors, including job skills, experience, and relevant education or training, which may impact a final offer. Your Talent Acquisition Partner can share more about the specific salary range during the hiring process.
 

Top Skills

Argocd
AWS
Bash
Datadog
Github Actions
Gitlab Ci
Go
Grafana
Kubernetes
Linux
Prometheus
Python
Terraform

Similar Jobs at Nexthink

13 Days Ago
Remote or Hybrid
Denver, CO, USA
Mid level
Mid level
Artificial Intelligence • Big Data • Information Technology • Software
The consultant provides implementation and advisory services for Nexthink products, engaging with customers to ensure effective solutions and satisfaction while enhancing skills and resolving issues.
Top Skills: BashJSONLdapPower BIPowershellPythonRest ApiSQLSsoTableau
6 Days Ago
Remote or Hybrid
Colorado Springs, CO, USA
50K-100K
Senior level
50K-100K
Senior level
Artificial Intelligence • Big Data • Information Technology • Software
Manage customer success strategies for North American enterprise clients, ensuring software adoption, relationship building, and achieving renewal and expansion targets.
Top Skills: Digital TransformationIt OperationsItilItsmSaaS
12 Days Ago
Remote or Hybrid
Colorado Springs, CO, USA
75K-150K
Senior level
75K-150K
Senior level
Artificial Intelligence • Big Data • Information Technology • Software
The Site Reliability Engineer will build and manage a cloud platform, ensuring compliance with FedRAMP, enhancing operational performance, and leading incident management.
Top Skills: AnsibleAWSAzureBashCloudFormationCrossplaneDockerGCPGitGitlabGoJenkinsKubernetesPythonTerraform

What you need to know about the Colorado Tech Scene

With a business-friendly climate and research universities like CU Boulder and Colorado State, Colorado has made a name for itself as a startup ecosystem. The state boasts a skilled workforce and high quality of life thanks to its affordable housing, vibrant cultural scene and unparalleled opportunities for outdoor recreation. Colorado is also home to the National Renewable Energy Laboratory, helping cement its status as a hub for renewable energy innovation.

Key Facts About Colorado Tech

  • Number of Tech Workers: 260,000; 8.5% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Lockheed Martin, Century Link, Comcast, BAE Systems, Level 3
  • Key Industries: Software, artificial intelligence, aerospace, e-commerce, fintech, healthtech
  • Funding Landscape: $4.9 billion in VC funding in 2024 (Pitchbook)
  • Notable Investors: Access Venture Partners, Ridgeline Ventures, Techstars, Blackhorn Ventures
  • Research Centers and Universities: Colorado School of Mines, University of Colorado Boulder, University of Denver, Colorado State University, Mesa Laboratory, Space Science Institute, National Center for Atmospheric Research, National Renewable Energy Laboratory, Gottlieb Institute

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account