Manager, Site Reliability Engineering

| US | Remote
Employer Provided Salary: 164,000-210,000 Annually
Salary data is provided by the employer. Please note this is not a guarantee of compensation.
Sorry, this job was removed at 9:55 a.m. (MST) on Tuesday, June 25, 2024
Find out who's hiring remotely in Greater Denver Area.
See all Remote Developer + Engineer jobs in Greater Denver Area
By clicking Apply Now you agree to share your profile information with the hiring company.

Every developer has a tab open on Stack Overflow.  

We are one of the most popular websites in the world - a community-based space focused on increasing productivity, decreasing cycle times, accelerating time to market, and protecting institutional knowledge. 

Innovation is at the heart of everything we do. We embrace collaboration, transparency, and believe in leading with empathy; creating an environment where every Stacker knows they belong. We embrace that the unique contributions and points of view of all Stackers contribute to our success.

We are a Best Company to Work For, in addition to being recognized for Best Company Leadership, Best Company Happiness, Best Company Perks and Benefits, Best Company Work-Life Balance, Best Company Compensation, and Best Company Outlook.

We are a remote-first company with Hiring HUBs based in the US, Canada, UK, and Germany.

At Stack Overflow our infrastructure needs keep getting bigger as our products evolve and grow. We’re looking for a Site Reliability Engineering Manager to join our existing SRE organization to help us grow our cloud infrastructure as we transition away from our on-premise footprint. As we scale, we need a manager with SRE experience to support our team on a day-to-day and tactical basis.

We’re looking for a manager who has prior experience with the Google Cloud Platform and familiarity with the .NET ecosystem. We’re looking for a manager who will support the team by:

  • Ensuring they have the skills, direction, and information they need to build robust infrastructure
  • Providing strong technical leadership as we transition our on-premise footprint to cloud-based infrastructure
  • Collaborating with various stakeholders to identify gaps and opportunities to improve reliability across the product
  • Working with the team to automate manual work
  • Creating repeatable, scalable systems and processes

We firmly believe in managers having demonstrated technical expertise. The right candidates will have prior experience supporting services and infrastructure in GCP.  We don’t expect you to know every other part of our stack coming in, so we’ll pair you with other members of the team to learn and gain experience across our entire infrastructure.  We operate in mixed Windows and Linux environments and expect someone in this role to have experience with one environment and a working understanding of the other.

What you’ll work on:

  • Remotely manage a team of 8 SREs ranging from junior to very senior. Continuously develop your team through coaching and feedback, and partner with Talent Acquisition to recruit new team members as needed.
  • Collaborate with product management, application teams, and other infrastructure teams on roadmaps to evolve and scale our public Stack Overflow products.
  • Advocate for your team and ensure they have everything they need to do their jobs.
  • Proactively identify and remove roadblocks.
  • Review technical specs and implementation with the team, create internal processes that enable the team to better collaborate, and enable the team to:
    • Set a high bar for reliability through formalized policies and processes.
    • Work with Technical PMs to ensure technical debt is prioritized and planned.
    • Reduce toil through software solutions by eliminating or automating manual tasks, steps, and workflows as we further streamline deployments and upgrades.
    • Improve the observability of our systems to help identify issues or bottlenecks by iterating on our monitoring and alerting strategies.
    • Improve our security patching and compliance strategy for cloud solutions.
  • Be available as an escalation point of contact for outages and incidents and direct the team to drive the issue to resolution.

Our current ecosystem includes:

  • Microsoft Azure
  • Self hosted infrastructure
  • Terraform, PowerShell, Python, Go
  • Windows Server, IIS, and .NET Core
  • Linux - CentOS
  • Our toolchain includes: GitHub, TeamCity (CI), GHA, Octopus Deploy, HAProxy / NGINX, ElasticSearch, Redis, Argo Workflows, Kubernetes, Datadog

What we’re looking for:

  • Results Focused SRE Background: We are looking for someone with a solid SRE background (5+ years). While we don’t expect you to be a hands-on SRE, you have a pragmatic attitude to technology and know how to handle complex projects and get them done.
  • Manager and Coach: You have leadership experience and love helping other people get better at their job. You have a knack for helping people see their problems in a new light and guiding them to a solution, and you make an excellent rubber ducky.
  • Cross-Team Facilitator: You like working with other teams and figuring out how to get things running more smoothly. You dive into communication problems and conflict and know when to create and modify existing processes and practices to move things forward. You’re eager and able to work with a wide variety of stakeholders and functional groups. This is particularly important given our remote working environment.
  • Collaborative and Inclusive: You are committed to  creating an inclusive and collaborative engineering culture. You help build understanding and empathy within your team and actively work to bring people into the conversation and understand their viewpoint.
  • Technical expertise: You have demonstrated experience with SRE concepts in your career and have experience working in Google Cloud Platform, with a proven track record of designing and deploying modular cloud-based systems that leverage relevant GCP technology.
  • Experience with Terraform or similar IaC tools.
  • Experience writing mature software solutions in a high-level programming language (for example, but not limited to, Python, Golang, C#), and a track record of getting stuff done.
  • A strong practical understanding of software development lifecycle phases, from planning and development through production deployment and monitoring.
  • Prior experience working in a cloud environment (Azure, AWS, or GCP).
  • Experience with Agile methodologies such as Scrum, XP, or Kanban. We are looking for someone who understands why Agile methodologies are beneficial to the team and product.
  • Experience working with cross-functional teams where SREs are embedded in product and platform engineering teams.

Willingness to learn new technologies and adapt to changing priorities.

Salary range - $164K - $210K + bonus

What you’ll get in return:

  • Competitive Base Salary 
  • Generous paid vacation
  • Generous parental leave (16 weeks at 100% pay), family care leave, and unlimited sick days
  • Equity (RSUs) for all employees at all levels
  • Industry-leading health benefits that are applicable per country of residence for all our full-time employees
  • Company-paid Life Insurance
  • Home Internet stipend
  • Professional allocation for your growth and development
  • One-time allowance to assist with your home office setup
  • Company-paid access to Calm, Bravely, LinkedIn Learning, MyAcademy and Overdrive

Stack Overflow is proud to be an equal opportunity workplace. We value diversity, inclusion, equity and belonging and these pillars are at the heart of how we work together here at Stack. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, or any other applicable legally protected characteristics in the location in which the candidate is applying. 

For individuals based in California, and other locations where required, we will consider employment qualified applicants with arrest and conviction records.

Read Full Job Description
Apply Now
By clicking Apply Now you agree to share your profile information with the hiring company.

Similar Jobs

Apply Now
By clicking Apply Now you agree to share your profile information with the hiring company.
Learn more about Stack OverflowFind similar jobs