Hands-on architect role that will lead cloud architecture and SRE initiatives. This position will scale existing applications, build tools and processes to monitor and improve reliability, and streamline the operational side of our platform through automation. You will be expected to not only define and document solutions and standards but lead the delivery.
You will work closely with developers to ensure the system design of new features and products is performant and reliable. You should be able to advise teams about building and deploying reliable services, and troubleshoot incidents in production.
As a site reliability engineer, you will build software to automate manual tasks and reduce the overhead required to keep our services reliable in the face of continuous change and development.
New initiatives may require technology evaluation. Being able to quickly learn a new technology and become an expert on it is paramount to being successful in this position. You must love learning new technologies, patterns, and practices.
This role includes periodic on-call responsibilities for production services.
This is a potential leadership role with a possibility for SRE Team Lead responsibilities, if desired.
This position reports to the Development Department Director.
Required Technical Experience
- Experience with modern monitoring and observability tools (Prometheus, Grafana, Spunk, New Relic, etc.)
- Experience with SQL/NoSQL, Git, caching, traffic routing, and disaster mitigation strategies
- Software development experience in web-based applications
- Intermediate experience working with and operating Kubernetes/Helm.
- Intermediate or expert experience scaling existing architectures using cloud based technologies (Kubernetes, cloud databases, global load balancing, etc.)
- Experience building, maintaining, and evolving DevSecOps pipelines
- Experience defining and documenting solutions and standards
- Experience with Java (Spring Boot), Vue.js, Node.js technologies
- Working knowledge of Elasticsearch, Kibana
- Experience using terraform or other infrastructure as code tools
- Intermediate knowledge of networking in cloud based architectures is preferred (understanding DNS, IP routing, load balancers, reverse proxies, application firewalls, etc.)
- Expert networking knowledge (experience managing VPNs, NGINX firewalls, etc.)
- Experience with software architecture and its impact on cloud/infrastructure architecture
- Experience building or running software-as-a-service products
- Ability to quickly learn new technologies and become an expert
- Passionate about technology
- 5 years of experience with software development and operating at scale
- 3 years of experience working with cloud platforms like GCP, AWS, or Azure
- 3 years of experience with container orchestration - Kubernetes, Docker
- 3 years of experience with common build and deployment (CI/CD) tools and practices
- Proficient knowledge of solving complex problems and can work autonomously all the time
- Mentors less experienced team members on best practices and code compliance
- Strong written and verbal communication skills
- Work at a computer for extended periods of time, some weekend work possible if there is a hard deadline