Senior Site Reliability Engineer at Alchemer
Founded in 2006 SurveyGizmo is an enterprise data collection, orchestration, and analytics platform that helps some of the biggest brands to deliver on their missions by enabling them to operationalize feedback by integrating critical data from their Customers (employee, users, patients, vendors, etc) into their key business systems through an easy to use, low code SaaS platform.
About the Job:
Do you have a passion for operational reliability, scalability, and efficiency? Have you previously contributed to the modernization and enablement of a successful SRE journey with a growing SaaS application? Do you seek a challenge to be an integral part of stabilizing and improving IT operations within the AWS cloud? Would you like to see the results of your hard work improve the velocity of deployments and quality of features to our important customers?
We are hiring a Senior Site Reliability Engineer to create and improve a fully automated build infrastructure that reduces friction in the development process and improves the quality of our application platform. As part of the SRE team, you will help lead and build the operational reliability, scalability, and efficiency of our platform. You will be a mentor and coach for other members of the SRE team.
You will ensure that applications can be continuously released with high levels of confidence while also improving the scalability, reliability, quality and performance of the infrastructure and build systems. You will participate in the development and implementation of software delivery standards, policies, and procedures through automation. As an SRE team member, you will collaborate with other joint customer teams, as well as mentor customer team members on SRE practices, processes, and coding best practices. As a Senior SRE Engineer, you are skilled and experienced at analyzing and resolving complex software configuration and infrastructure issues to maintain product integrity and overall stability of the platform. Most importantly, you are passionate about helping our team learn to build awesome every day, because that is just what we do.
Key responsibilities & duties:
- Responsible for building innovation in the areas of distributed system flow and resilience, continuous feedback and delivery.
- Help lead the transformation from an IaaS to a PaaS architecture and process workflows.
- Create efficiency and cultural transformation through the curation of new systems and capabilities.
- Build platforms that teams can leverage to accelerate innovation in the areas of reliability, scalability and velocity.
- Participate in incident response and improving the associated processes and tooling
- Refine and create new service level objectives and meaningful metrics
- You will troubleshoot complex technical issues, and design robust, scalable systems.
- You will help to improve our build and release process for the application and platform features
- Responsible for building out more intelligent alerting and support workflows
- Identify and reduce manual toil through automation, standards, and agile process.
- As a senior member of the SRE team you will mentor and coach other engineers, along with leading large technical projects
- You can troubleshoot, write and review code in various languages, but also know your way around a Linux server or a SQL database.
- Take part in a 24x7 on-call rotation.
- 7+ years SRE engineering experience, with previous team lead experience
- You have hands on experience in public cloud (AWS, GCP or others), automation tools (such as Ansible, Terraform, Puppet) as well as container technologies (Docker, Kubernetes or similar)
- Experience with implementing and tuning monitoring tools (Cloudwatch, Nagios, Elastic)
- You have familiarity with build automation, source control and CI/CD tools (GIT, Artifactory)
- Experience in developing and operationalizing runbook automation including error budgets, SLOs and availability metrics
- Experience with immutable infrastructure and platform automation, large scale infrastructure automation experience is a plus
- Experience with Atlassian tools (Jira, Confluence, BitBucket, Agile Frameworks)
- Hands on Database experience (mysql, RDS, Auora, DynamoDB)
- You have development experience with distributed systems and messaging
- Influence architectural decisions with focus on security, scalability and high-performance.
- You work well within and across teams to deliver high-quality software, foster solid engineering principles and represent our engineering values
- Strong customer advocate with excellent written and verbal communication skills
Our team members enjoy:
- 401k with 6% company per payroll match and immediate vesting
- Founder’s Pool profit-sharing program, with an annual profit-sharing bonus and additional units awarded annually
- HSA and FSA with optional yearly SurveyGizmo contribution Flexible Cafeteria Plan with reimbursement for Wellness, Education, and commuter and dependent care expenses (including pets!)
- Generous time off policy
- 14 paid holidays, including the week between Christmas and New Year’s. Plus, you get 4 floating holidays in addition to your PTO!
- Relaxed, open and highly collaborative environment
- Nearby bike and walking trails
- Fully stocked kitchen, including wine and beer