Why These 2 Colorado Companies Are Making the Shift to Site Reliability Engineering

Both leaders agree that SRE could be the key to unlocking the full potential of DevOps.

Written by Colin Hanner
Published on Jun. 30, 2021
Why These 2 Colorado Companies Are Making the Shift to Site Reliability Engineering
Brand Studio Logo

The culture of DevOps has created success stories for engineering teams all over, yet the methods for implementing its principles have varied. In some cases, that’s created a messy division of labor and responsibilities, leaving some of the full potential of this system untapped. 

So how can a team achieve peak DevOps in the workplace? Enter site reliability engineers —also known as SRE or the shepherds of the back end. 

“To use an analogy, [site reliability engineers are] not the actors on stage; we’re the folks behind the scenes wearing the headsets and making sure everything is running smoothly,” wrote Google Search SRE Andrew Widdowson in 2012 (today, he is the global sites data and resilience lead at the company). “Alternatively, our work is like being a part of the world’s most intense pit crew. We change the tires of a race car as it’s going 100 mph.”

SREs can run independently of DevOps principles — as they did at Google during the early 2000s — but with the rise in popularity in both sectors in recent years, the two have become interwoven. It’s the SREs who have become the facilitators for successful DevOps implementation, which is just how the situation turned out for two Colorado-based companies, Vendavo and Alchemer

With SREs in the ranks, both organizations have seen increased support within teams, better communication and collaboration, and faster delivery. Barriers have been eliminated and teams are more aligned than ever before. 

Below, leaders from both organizations explain their philosophies on the DevOps-SRE partnership, their reasoning for shifting to SRE and the impact it’s had on their work. 

 

Brandi Vandegriff
Chief Technology Officer • Alchemer

Brandi Vandegriff is the new CTO of Alchemer, a survey and data insights platform. Though the division between DevOps and SRE is new at the company, Vandegriff said they’re hoping to get to a point where SREs, running on a DevOps philosophy, are embedded in delivery teams. 

What are the key differences between DevOps and SRE? Tell us a bit about your team's experiences with the two.

DevOps vs. site reliability engineering can often be a philosophical debate in the industry. I distinguish them in that DevOps should be a mantra that the team lives by. The industry often describes DevOps as a dedicated person or role that bridges the gap of getting code to the infrastructure in as fast and automated a way as possible. I believe SRE should be a part of DevOps as a culture, where there is a focus on building applications from the app code to the infrastructure and back around with monitoring, automation and scaling inherent in the programming and the team.  

The difference between DevOps and SRE here at Alchemer is new — and so am I. Alchemer has been in a place where developers rode the line and worked with operations to create DevOps over the course of trying to modernize the product. As a leader, I stand behind getting code in a pipeline and in production fast, but I emphasize the team's focus on reliability, scaling and modernizing the app. We have already built out a significant release pipeline, and that is often what DevOps engineers are focused on. We are shifting beyond that by leveraging a strong team of engineers.

What prompted your team to shift from DevOps to SRE? 

Alchemer needed help solving the big challenge of scaling as usage and data grows, as well as continuing to provide stability to its growing customer base. This challenge has Alchemer going through a big shift to scale organizationally and modernize the application. This has involved change from the product ideation to support management to how we deliver code. To react to this level of change, we added a director of cloud operations who is hiring up his SRE team. Having a group of dedicated SREs will allow us to programmatically build out new features, deploy them on a continuous integration/continuous delivery pipeline, and monitor and maintain it from concept to cash.

We have created a solid team to help build out Alchemer’s future, and we are looking to add more and increasingly integrate the teams.”   


What impact has this shift had on your engineering organization, the tech you build and the business as a whole? 

We are not quite where we want to be with the SRE team. They are still somewhat siloed from the engineers, which leads to an on-call rotation for support and the knowledge transition for deployments and releases at the end of a sprint. Where we want to be is to have SREs embedded with each delivery team and truly running DevOps as a philosophy within our organization.  

That said, we have created a solid team to help build out Alchemer’s future, and we are looking to add more and increasingly integrate the teams.   

In the meantime, having the SRE and cloud ops team has allowed us to improve our monitoring and application support. It has provided a clear line of communication on changes and releases to the application. Having the SRE team has directly improved the pace of code delivery for new features, as well as provided a whole new philosophy on how to build out new application environments at the push of a button. 

 

Nelson Normahomed
Senior Director of Cloud Operations & IT • Vendavo

For Nelson Normahomed, the senior director of cloud operations and IT at B2B software company Vendavo, the reason behind shifting from DevOps to SRE was a simple one. “We needed to work more efficiently, deploy faster, reduce issue resolution times and maintain a stable, performant and reliable system,” he said. 

What are the key differences between DevOps and SRE? Tell us a bit about your team's experiences with the two.

I often refer back to this quote from a Google article written by Liz Fong-Jones and Seth Vargo in 2018: “DevOps and SRE are not two competing methods for software development and operations, but rather close friends designed to break down organizational barriers to deliver better software faster.”

DevOps, as a framework or methodology, is the combination of best practices, principles, tools and the removal of barriers between traditional product development and operations teams. In it, teams work together across the entire development cycle, all the way through testing, deployment and operations. The goal is to provide an efficient pipeline with a continuous mechanism for feedback, resolutions and improvements throughout the process. 

The role of an SRE is to embrace these DevOps principles, understand how to interact and engage with the various teams within product development, and prescribe how to achieve and succeed in the various measures defined within the system. The SRE is tasked with normalizing and contributing to the pace of innovation and production stability, which tends to be the common misalignment between traditional development and operations teams. The role of a Vendavo SRE shifts from typical system administration tasks and Band-Aids to developing automation to spend less time nursing mundane tasks. Integrated within the product development team, the SRE exposes product faults and provides feedback on postmortems, reliability, performance and monitoring to ensure they are resolved or enhanced at the core.

What prompted your team to shift from DevOps to SRE? 

The shift from DevOps to SRE was more of a shift away from the traditional development and operations team structures. Simply, we needed to work more efficiently, deploy faster, reduce issue resolution times and maintain a stable, performant and reliable system. In order to achieve this goal, we had to eliminate the barriers and the team silos and create an environment of continuous collaboration and feedback. 

Typically, dev teams tend to create applications with little understanding of how they will operate in a production environment. Features, such as high availability, performance and monitoring, also tend to get overlooked. Ops teams tend to create environments, infrastructure and automation mainly based on budget, skillset or reactions to the latest issue of the day. We needed to control the pace of changes made to the environment. What we got with this disparity was both teams working and operating with misaligned priorities. So when production issues arose, tensions quickly escalated as it most often began by troubleshooting, then a finger-pointing exercise of fault ownership and internal escalations. Scaling, achieving company goals and improving customer satisfaction proved difficult.
 

We needed to work more efficiently, deploy faster, reduce issue resolution times and maintain a stable, performant and reliable system.”


What impact has this shift had on your engineering organization, the tech you build and the business as a whole? 

As we continue the shift of adopting the DevOps mindset within our organization, we’re integrating the role of the SRE within each of our product development teams. We are already realizing the benefits of improved communication and collaboration between teams, which provides us with the ability to strategically align on development and operational goals with combined priorities and workflows. Additionally, it gives us the ability to directly communicate and share issues, roadmaps, customer expectations and operational challenges on a regular basis. 

We have seen improved alignment on technologies deployed, including an integrated centralized observability platform allowing all teams to have full visibility of the utilization and performance of all product environments. That's a solution that would typically be limited to the operations team. We have also increased our security posture and, most importantly, continued to deliver product and infrastructure enhancements in a cost- and resource-efficient manner.

Hiring Now
BillGO
Fintech • Payments • Software • Financial Services