Site Reliability Engineer
Job Type
Full-time
Description
Engrain has transformed the way people find, lease, and manage properties. Engrain provides a holistic suite of mapping solutions built specifically for the real estate industry to provide stunning unit-level map visualizations that integrate with countless websites and property tech applications. Our revolutionary, unit-level map data and interactive visuals within our SightMap, TouchTour and Asset Intelligence product lines allow both property owners and prospective renters to better ensure occupancy and drive revenue.
As a Site Reliability Engineer, you will be part of Engrain's growth and dynamic environment. As an SRE you will work closely with Engrain's engineering and product teams and participate in projects across the company.
What you'll do...
- Monitor, analyze, and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of needs, and innovating to continually improve our monitoring tools and techniques
- Implement, maintain, and consult on the observability stack that supports the needs of multiple internal stakeholders
- Utilize your deep experience and problem-solving skills to help prevent and investigate production issues
- Participate in the design and implementation of new system layers of high complexity compute environments
- Research and recommend specific systems, architectures, and applications, for cloud infrastructure solutions
- Collaborate closely with software engineering and DevOps teams to ensure proper integration of systems and seamless deployment of applications, providing reliable, predictable deployment and maintenance of distributed systems?
- Stay up-to-date with industry best practices and emerging technologies related to cloud infrastructure, SRE, provider services, and security best practices?
- Proactively identify and resolve issues, troubleshoot problems, and conduct root cause analysis to prevent reoccurrence
- Develop and maintain automation tools and scripts to streamline operational processes and reduce manual intervention
- Implement and maintain effective disaster recovery strategies, backup systems, and business continuity plans
- Participate in the on-call rotation and respond to incidents promptly to minimize downtime and resolve critical issues
- Contribute to the documentation of system architecture, configurations, operational procedures, and incident response plans
- Identify opportunities for optimization and cost reduction in cloud resource utilization and provide recommendations to optimize infrastructure performance
- Conduct regular performance and capacity planning exercises to ensure the infrastructure can handle increasing workloads and scale as needed
Requirements
What you offer us...
- 3 or more years of relevant DevOps and/or Site Reliability Engineering experience
- Significant experience with deploying web apps to public cloud infrastructure (GCP/AWS - required, Azure desired)
- Deep understanding of GCP services, such as Compute Engine, App Engine, Cloud Storage, BigQuery, Cloud Pub/Sub, etc.
- Deep understanding of AWS services and best practices, including EC2, S3, RDS, Lambda, CloudWatch, CloudFormation, and IAM
- Proficiency in one or more programming languages, such as Python, Java, or Go, and experience with infrastructure-as-code tools (e.g., Terraform, Ansible)
- Strong knowledge of systems administration, networking, and security principles
- Experience with monitoring and observability tools (e.g., Stackdriver, Prometheus, Grafana) and logging frameworks (e.g., ELK stack)
- Familiarity with containerization and orchestration technologies (e.g., Docker, Kubernetes)
- Proven ability to troubleshoot complex issues, conduct root cause analysis, and implement effective solutions
- Strong collaboration and communication skills, with the ability to work effectively in a team-oriented environment
- Experience with CI/CD pipelines and related tools (e.g., Jenkins, BitBucket, GitLab CI/CD)
- Knowledge of database management systems (e.g., MySQL, PostgreSQL, MongoDB) and caching technologies (e.g., Redis, Memcached)
- Familiarity with microservices architecture and distributed systems
- Proactive mindset with a focus on automation, continuous improvement, and learning
- Understanding of compliance frameworks and security best practices in a cloud environment
What we offer you...
- Salary Disclosure for Colorado: minimum base salary of $90,000 to $120,000. Final offer amount is determined by factors including years and depth of candidate's experience, certifications, and skill set alignment to the job requirements
- Various health, dental and vision insurance plans to choose from
- 2-10 weeks of paid parental leave + additional paid and unpaid leave options
- Up to 18 days annually of PTO & 10 holidays per year
- Dog-friendly office
- 401k match up to 4 percent
- Annual stipend for personal growth through our Grow450 program
- On-site amenities include a professional fitness center, flexible & modern workspace, coffee bar, happy hour taps & team member lounge
All qualified applicants will receive consideration for employment without regard to race, color, sex, sexual orientation, gender identity, religion, national origin, disability, veteran status, age, marital status, pregnancy, genetic information, or any other legally protected status.