- Lead by example by addressing production issues and technically mentor and coach other team members.
- Develop playbooks to troubleshoot and address recurring issues.
- Develop root cause analysis for internal and external stakeholders.
- Develop and report metrics describing production system availability, uptime, and responsiveness.
- Develop and implement tools and processes to increase monitoring of production systems and applications.
Why Red Canary
Red Canary was founded to make security for every business better by protecting organizations around the world from cyber threats. Our combination of market defining technology, processes, and expertise delivered using an innovative SaaS model is preventing breaches every day.
The Red Canary Engineering team builds and operates the platform to deliver unmatched threat detection and response. We process billions of events per day from hundreds of thousands of systems worldwide. We are on the front lines of cybersecurity with unique opportunities to utilize new technologies and solve the hardest problems in cyber security.
Why You Matter
Operational systems and applications that provide security to customer environments require 24/7 availability. The organizations that provide the most reliable operational systems are driven by teams that are passionate about keeping production environments up and available beyond customer expectations. These teams are hyper-focused on monitoring production systems, identifying potential and realized issues as early as possible, attacking and resolving these issues in real-time, and improving tools and processes to prevent future recurrences.
Delivering that level of 24/7 operational availability is paramount to the Red Canary charter of being our customers’ top security ally. As a Senior Production Engineer, you will use your technical expertise to develop tools and processes to enhance the resiliency of our systems while also enhancing monitoring, troubleshooting and resolution efforts. Your expertise with infrastructure technologies will allow you to support Red Canary’s response to, and resolution of production incidents. You will evaluate issues realized in Production and update the infrastructure-as-code (IAC) baseline to introduce automation that negates the need for human operators to intervene. You will be the subject matter expert that the Production Team relies on to resolve real-time issues experienced in operations, and to understand the root cause of failures.
Who You Are
You are passionate about working in 24/7 operational environments that leverage cloud, container, and IaC technologies and services (AWS, Kubernetes, Terraform, Salt). You use your skills and experience to address production impacts, either directly or through/with your team members. You use your software development/ systems administration/ scripting skills and knowledge of IaC to improve reliability, observability, and availability of production systems and applications. You understand configuration management best practices and how they enable reliable production systems.
You are diligent about capturing and sharing best-practices around troubleshooting and resolution of issues. You love to use your skills and experiences to coach other team members and grow their skills. You understand the urgency in resolving production issues in real-time, but also establish actions and plans to implement strategic fixes that address root problems.
You have experience performing and documenting root-cause-analysis. You have experience accepting responsibility for maintaining applications and systems that were externally developed.
The ideal candidate has demonstrated success managing 24/7 operational systems. A strong technical skill-set with infrastructure, cloud, container, and monitoring tools and technologies is required.
Individuals seeking employment at Red Canary are considered without regard to race, color, religion, national origin, age, sex, marital status, ancestry, physical or mental disability, veteran status, gender identity, or sexual orientation.
Read Full Job Description