Site Reliability Engineer at HomeAdvisor
At HomeAdvisor our Engineering organization is driven by innovation and teamwork. We are motivated by new challenges every day to solve unique problems while setting the standard for the home services marketplace. Our co-workers are our collaborators, our teammates and everyone has the opportunity to contribute and build. We believe strongly in a culture of an independent team of engineers, who are considered an expert in their own business domain and makes decisions relevant to that expertise. The SRE team define the standard of excellence within engineering around reliability, transparency, and availability.
Site Reliability Engineers (SREs) are responsible for keeping all user-facing services and other HomeAdvisor production systems running smoothly. SREs are a hybrid of operators and software engineers that leverage engineering principles, operational experience, and automation to our environments. You will help shape our infrastructure and build the foundation our team relies on for the rapid, reliable delivery of our product. We’ll rely on you to instill best practices for building scalable distributed systems, with a keen focus on observability and fault tolerance. Our stack consists of technologies such as Kubernetes, Java Spring Boot, Oracle, Postgres, Coherence, Redis, Elasticsearch inside a hybrid cloud.
We are looking for experienced Site Reliability Engineers who meet the following criteria
- Breadth of knowledge across our infrastructure and application stack.
- Contributes small improvements to all codebase to resolve issues.
- Experience with container orchestration technologies like Kubernetes, Mesos, or Nomad. (We use Kubernetes.)
- A track record of leveraging automation whenever and wherever.
- An appreciation of and enthusiasm for software engineering best practices, such as infrastructure as code, testing, and continuous delivery
- Identifies changes for the product or infrastructure architecture focusing on reliability, performance and availability perspective with a data-driven approach.
- Proactively work on the efficiency and capacity planning to set clear requirements and reduce the system resources making HomeAdvisor operate with cost as a discipline.
- Identify parts of the system that do not scale, provide immediate and long term resolution of these incidents.
- Identify Service Level Indicators (SLIs) that will align the team to meet the availability and latency objectives.
Collaboration and Communication:
- Know a domain really well and permeate that knowledge across the rest of the engineering organization.
- Perform and run blameless RCAs on incidents and outages and drive to prevent the incident from reoccurring.
- Show ownership of a major part of the infrastructure.
As an SRE you will:
- Be part of an on-call rotation to respond to incidents and provide support for software engineers across HomeAdvisor initiative teams.
- Build visibility into SLIs, SLOs, SLAs, dependency graphs to reduce operational burden or toil.
- Drive on instrumentation patterns to alert on symptoms and not on outages leveraging our monitoring stack of Grafana, Prometheus, Elasticsearch.
- Use your on-call shift to prevent incidents from occurring.
- Run our infrastructure with Terraform and Kubernetes.
- Use a data-driven approach to findings, turn into repeatable actions and then into automation.
- Improve the deployment process to make it as quick and dependable as possible.
- Design, build and maintain core infrastructure pieces that allow HomeAdvisor to scale to meet its market demand.
- Debug production issues across the full stack.
- Plan and shape the growth of HomeAdvisor’s ever-evolving infrastructure.
You may be a fit for this role if you:
- Think about systems - edge cases, failure modes, behaviors, specific implementations.
- Have an understanding of large scale system design, monitoring, and operational practices.
- Have strong programming skills - Ruby and/or Go
- Have an enthusiastic, go-for-it attitude. When you see something broken, you can't help but fix it.
- Have a burning desire for delivering quickly and iterating fast.
- Have experience with Nginx, HAProxy, Docker, Kubernetes, Terraform, or similar technologies
Projects you could work on:
- Improving our Monitoring stack across the board.
- Migrate our ingress controllers to a more cloud-native paradigm ( istio, envoy, traefik ).
- Instrument our rails app to collect important information about our applications.
- Immutable kubernetes upgrade pattern automation.
- Build tooling to help reduce toil across the engineering organization.
At HomeAdvisor we create the digital tools and services that empower millions of service professionals to connect with hundreds of millions of homeowners. We are a dual sided marketplace that nurtures growth of the independent small business and delivers a seamless experience in home improvement. As #1 in the Homeservices category, our tremendous scale is the launchpad to boundless inventions in technology and product!
HomeAdvisor is the self-contained operating business of ANGI Homeservices (NASDAQ: ANGI), a federation of spirited technology companies that build for a better economy. In 2017, Angie’s List and HomeAdvisor combined to create the world’s largest Homeservices marketplace. We love the software that we invent, the millions of service professionals whom we empower and the hundreds of millions of homeowners whose lives we beautify. As a purposeful technology company, we foster a culture of collaboration, and nurture growth through innovation. We are proud to be recognized as a Top Workplace in Denver for 6 years and counting!