LivePerson

Site Reliability Engineer (SRE) II

Posted 3 Days Ago

Easy Apply

In-Office or Remote

Hiring Remotely in Remote, OR

Senior level

Easy Apply

In-Office or Remote

Hiring Remotely in Remote, OR

Senior level

Maintain and improve reliability of the Echo platform by operating GKE production workloads, implementing GitOps deployments, defining SLOs/SLIs, enhancing observability with OpenTelemetry, troubleshooting incidents, and collaborating with developers on safe CI/CD and progressive delivery.

The summary above was generated by AI

LivePerson (NASDAQ: LPSN) is the global leader in enterprise conversations. Hundreds of the world’s leading brands — including HSBC, Chipotle, and Virgin Media — use our award-winning Conversational Cloud platform to connect with millions of consumers. We power nearly a billion conversational interactions every month, providing a uniquely rich data set and safety tools to unlock the power of Conversational AI for better customer experiences.

At LivePerson, we foster an inclusive workplace culture that encourages meaningful connection, collaboration, and innovation. Everyone is invited to ask questions, actively seek new ways to achieve success and reach their full potential. We are continually looking for ways to improve our products and make things better. This means spotting opportunities, solving ambiguities, and seeking effective solutions to the problems our customers care about.

Overview:

We are looking for a Site Reliability Engineer (Level II) to support and enhance reliability across the Echo ecosystem. This role is responsible for maintaining existing production systems while actively supporting new platform initiatives and feature rollouts.

The ideal candidate has strong hands-on experience with GKE, GitOps-driven deployments, cloud-native networking, and proactive reliability engineering. This role requires close collaboration with application development teams to ensure safe, reliable, and observable production releases.

You will:

Production Reliability & Ownership

Maintain and support existing products within the Echo ecosystem.
Ensure high availability, performance, and reliability of platform services.
Define, monitor, and improve SLOs, SLIs, and error budgets.
Proactively identify system risks and implement reliability improvements.
Participate in incident response, troubleshooting, and post-incident reviews.

Cloud & Kubernetes (GKE)

Deploy, manage, and optimize workloads on Google Kubernetes Engine (GKE).
Manage cluster capacity, scaling strategies, and resource allocation.
Optimize CPU, memory, and storage utilization to improve performance and reduce cost.
Ensure cluster security, upgrades, and best practices are followed.
Troubleshoot networking, service mesh (if applicable), ingress, and service-to-service communication issues.

GitOps & Release Engineering

Implement and manage GitOps-based deployment workflows.
Ensure infrastructure and application changes are version-controlled and automated.
Work closely with developers to safely release code to production using CI/CD best practices.
Support progressive delivery techniques (e.g., canary, blue/green deployments).
Reduce deployment risk through automation and validation mechanisms.

Observability & Monitoring

Implement and enhance observability practices across services.
Build and maintain dashboards, alerts, and health metrics.
Implement and manage OpenTelemetry (OTEL) for tracing and metrics collection.
Ensure proactive alerting aligned with SLOs.
Drive improvements in monitoring coverage and signal quality.

Networking & System Understanding

Strong understanding of Kubernetes networking, services, ingress, load balancing, DNS, and service communication.
Diagnose latency, connectivity, and traffic routing issues.
Understand how distributed services interact across the ecosystem.

You have:

4–7 years of experience in SRE, DevOps, or Platform Engineering roles.
Strong hands-on experience managing production workloads on GKE.
Solid experience with GitOps practices (ArgoCD, Flux, or similar).
Strong understanding of Kubernetes networking and cloud networking fundamentals.
Experience optimizing resource allocation and scaling in Kubernetes.
Experience implementing observability solutions using OpenTelemetry (OTEL).
Experience defining and operating with SLOs and SLIs.
Hands-on experience with CI/CD pipelines and automated deployments.
Strong troubleshooting and incident management experience.

Benefits:

Health: medical, dental, and vision
Time away: vacation and holidays
Development: Generous tuition reimbursement and access to internal professional development resources
Equal opportunity employer
#LI-Remote

Why you’ll love working here:

As leaders in enterprise customer conversations, we celebrate diversity, empowering our team to forge impactful conversations globally. LivePerson is a place where uniqueness is embraced, growth is constant, and everyone is empowered to create their own success. And, we're very proud to have earned recognition from Fast Company, Newsweek, and BuiltIn for being a top innovative, beloved, and remote-friendly workplace.

Belonging at LivePerson:

We are proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. We also consider qualified applicants with criminal histories, consistent with applicable federal, state, and local law.

We are committed to the accessibility needs of applicants and employees. We provide reasonable accommodations to job applicants with physical or mental disabilities. Applicants with a disability who require reasonable accommodation for any part of the application or hiring process should inform their recruiting contact upon initial connection.

The talent acquisition team at LivePerson has recently been notified of a phishing scam targeting candidates applying for our open roles. Scammers have been posing as hiring managers and recruiters in an effort to access candidates' personal and financial information. This phishing scam is not isolated to only LivePerson and has been documented in news articles and media outlets.Please note that any communication from our hiring teams at LivePerson regarding a job opportunity will only be made by a LivePerson employee with an @liveperson.com email address.

LivePerson does not ask for personal or financial information as part of our interview process, including but not limited to your social security number, online account passwords, credit card numbers, passport information and other related banking information. If you have any questions and or concerns, please feel free to contact [email protected]

Top Skills

Google Kubernetes Engine (Gke),Kubernetes,Gitops,Argocd,Flux,Opentelemetry (Otel),Ci/Cd,Service Mesh,Ingress,Load Balancing,Dns,Cloud Networking

Similar Jobs

Cority

Site Reliability Engineer

8 Days Ago

Remote

United States

Mid level

Healthtech • Software

Maintain reliability, performance, and scalability of cloud-hosted services and databases. Implement SRE best practices, define SLIs/SLOs, respond to incidents, build monitoring and automation, perform DBA tasks (backups, restores, tuning), support CI/CD and DB migrations, and document runbooks and procedures.

Top Skills: Sql Server,Postgresql,Oracle,Amazon Rds,Azure Sql Database,Powershell,Bash,Python,Gitlab,Jenkins,Octopus Deploy,Ecs Fargate,Kubernetes,Flyway,Liquibase,Redis,Solarwinds Dpa

Zeta Global

Senior Site Reliability Engineer

8 Days Ago

Easy Apply

Remote or Hybrid

United States

Easy Apply

140K-170K Annually

Senior level

140K-170K Annually

Senior level

AdTech • Artificial Intelligence • Marketing Tech • Software • Analytics

The Senior Site Reliability Engineer will enhance system reliability, develop production-grade code, implement observability tools, conduct root cause analyses, and collaborate on system design for scalability.

Top Skills: ArgocdCi/CdDockerGitopsGoGrafanaHoneycombJenkinsKubernetesOpentelemetryPrometheusPythonTerraform

StarCompliance

Devops Engineer

24 Days Ago

Remote

Senior level

Fintech • Analytics • Financial Services

The Site Reliability Engineer will enhance system reliability, implement observability tools, and collaborate with teams to improve SaaS applications.

Top Skills: AWSAzureAzure DevopsBashDatadogGoNew RelicPowershellPrometheusPythonTerraform

What you need to know about the Colorado Tech Scene

With a business-friendly climate and research universities like CU Boulder and Colorado State, Colorado has made a name for itself as a startup ecosystem. The state boasts a skilled workforce and high quality of life thanks to its affordable housing, vibrant cultural scene and unparalleled opportunities for outdoor recreation. Colorado is also home to the National Renewable Energy Laboratory, helping cement its status as a hub for renewable energy innovation.

Key Facts About Colorado Tech

Number of Tech Workers: 260,000; 8.5% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Lockheed Martin, Century Link, Comcast, BAE Systems, Level 3
Key Industries: Software, artificial intelligence, aerospace, e-commerce, fintech, healthtech
Funding Landscape: $4.9 billion in VC funding in 2024 (Pitchbook)
Notable Investors: Access Venture Partners, Ridgeline Ventures, Techstars, Blackhorn Ventures
Research Centers and Universities: Colorado School of Mines, University of Colorado Boulder, University of Denver, Colorado State University, Mesa Laboratory, Space Science Institute, National Center for Atmospheric Research, National Renewable Energy Laboratory, Gottlieb Institute