Senior Site Reliability Engineer

| Remote
Sorry, this job was removed at 11:02 a.m. (MST) on Saturday, August 13, 2022
Find out who’s hiring remotely
See all Remote jobs
Apply
By clicking Apply Now you agree to share your profile information with the hiring company.

Angi® is transforming the home services industry, creating an environment for homeowners, service professionals and employees to feel right at “home.” For most home maintenance needs, our platform makes it easier than ever to find a qualified service professional for indoor and outdoor jobs, home renovations (or anything in between!). We are on a mission to become the home for everything home by helping small businesses thrive and providing solutions to financing and booking home jobs with just a few clicks.  

Over the last 25 years we have opened our doors to a network of over 200K service professionals and helped over 150 million homeowners love where they live. We believe home is the most important place on earth and are embarking on a journey to redefine how people care for their homes. Angi is an amazing place to build your dream career, join us—we cannot wait to welcome you home!

About the Role

Site Reliability Engineers (SREs) on the Telemetry team are responsible for ensuring that Angi’s Insights Platform can be relied upon to support the needs of our mission-critical systems.  The SRE role at Angi is different from many other organizations.  You will find yourself working in a team of SREs tasked with completing company objectives instead of being embedded amongst development teams.  The team works together to address client needs as any development group would.  This allows for easier sharing of knowledge between team members and a more consistent experience for the clients.  We build all of our solutions using EKS in AWS with Terraform and leverage Weave Flux, Prometheus, Cortex, Loki, Tempo, and Grafana to provide telemetry services for our clients. Every day you’ll find yourself either managing them, providing solutions based on their data, or working with clients on how to properly use our Telemetry Platform.

We are looking for experienced Site Reliability Engineers who meet the following criteria 

Technical: 

  1. A working knowledge of metrics, logs, and distributed tracing practices.
  2. Depth of knowledge in at least one of those practices. 
  3. Comfortable contributing to a shared codebase.
  4. Understand Kubernetes and the container orchestration concepts it uses.
  5. Passionate about process automation and familiar with enough different approaches to entertain several before deciding on which to pursue.
  6. A healthy amount of curiosity for containerized technology and how it works.

Execution: 

  1. Experience identifying changes that improve processes from a reliability and performance perspective.
  2. Enjoy finding solutions in low information situations.
  3. Comfortable using telemetry data to spot parts of a system that do not scale, research solutions, and implement a migration plan that mitigates the situation
  4. Enjoy working to determine what service information is important enough to drive service levels and create the means for them to use that data.

Collaboration and Communication: 

  1. Have a curiosity for current and new practices that lead to collaboration and process change.
  2. Enjoy documenting and sharing solutions to interesting challenges with others. 
  3. Participated in post-mortems and have definite opinions on how they serve the organization.
  4. Experience working as a team to support a critical core system. 

As an SRE you will: 

  • Determine what information is important enough to drive service levels for our services.
  • Use service level information to determine reliability on our Telemetry Platform. 
  • Participate in an on-call rotation that responds to incidents concerning the Telemetry Platform.
  • Contribute to solutions defined in GitLab projects and GitHub repositories.
  • Maintain AWS EKS clusters using our Terraform modules.
  • Automate complex business challenges that require your specific skill set.
  • Contribute to core infrastructure pieces that allow Angi to scale to meet the needs of its clients.
  • Use the Telemetry Platform to assist in investigations that happen across the organization.
  • Plan and shape the growth of Angi’s infrastructure as we iterate it over time.

You may be a fit for this role if you: 

  • Think about systems - edge cases, failure modes, behaviors, specific implementations. 
  • Have an understanding of large scale system design, monitoring, observability, and operational practices. 
  • Have strong programming skills - Go, Python, and/or Ruby 
  • Have an enthusiastic, go-for-it attitude. When you see something broken, you can't help but fix it. 
  • Have experience with Weave Flux, Nginx, Kubernetes, Terraform, Prometheus, Loki, Cortex, Tempo, or similar technologies
  • Are compelled to keep a constant eye on the Observability space, identifying and planning ahead based on changes in practices/technologies as they arise

Projects you could work on: 

  • Contribute to our team’s Telemetry Platform that consists of Prometheus, Cortex, Loki, Tempo, and Grafana deployed in EKS using Terraform and Weave Flux on AWS. 
  • Contribute to projects across the organization to address challenges that your skill set exceeds.
  • Work with our dev teams to determine how to make their paging strategy more meaningful and less problematic.
  • Develop ways to aid our development teams in instrumenting their services to collect important information about our applications that allows for investigation
  • Working to reduce the level of effort needed to utilize the instrumentation that the teams are creating.
  • Provide valuable feedback and collaborate with the teams whose products we use as we iterate on our own infrastructure.

Compensation & Benefits: 

  • The salary band for this position ranges from 140k - 200k, commensurate with experience and performance. 
  • Full medical, dental, vision package and a retirement plan to fit your needs
  • Flexible vacation policy; work hard and take time when you need it
  • The rare opportunity to work with sharp, motivated teammates solving some of the most unique challenges and changing the world

#LI-Remote
#BI-Remote 

Read Full Job Description
Apply Now
By clicking Apply Now you agree to share your profile information with the hiring company.

Location

Nestled within the River North Art District east of the South Platt River is the home of our Denver HQ office. Caddy corner to a variety of popular local restaurants and bars, this location provides access to the after-work happenings residents enjoy and the office itself has a variety of amenities.

Similar Jobs

Apply Now
By clicking Apply Now you agree to share your profile information with the hiring company.
Learn more about AngiFind similar jobs