Courtyard.io

Staff Software Engineer - Security and Reliability

Reposted 15 Days Ago

Remote

Hiring Remotely in United States

Senior level

Remote

Hiring Remotely in United States

Senior level

The Staff Software Engineer will manage security, reliability, and observability for an e-commerce platform, implementing monitoring systems and conducting security audits.

The summary above was generated by AI

About Courtyard

Courtyard.io is one of the fastest-growing collectibles startups. From cards to coins, we’re making it faster, easier, and more exciting than ever to discover, collect, and cash out instantly.
We’re not just another marketplace. With thrilling pack rips, instant liquidity, and seamless vaulting, Courtyard.io delivers the ultimate collecting experience. Whether you’re investing, trading, or curating your dream collection, we’ve built a platform that’s trusted, simple, and built for speed.
And we’re just getting started. We’re a remote-first company hiring across all functions to push the boundaries of what’s possible in collectibles and digital ownership.

About the role

We are actively recruiting a staff software engineer to own the security, reliability, and observability of the fastest growing e-commerce startup. You will be reporting directly to our Head of Engineering and work very closely with many members of our engineering team. Your mission will include establishing and maintaining world-class observability, monitoring and alerting systems, building systems that reduce operational toil for the entire engineering team, and conducting security audits, reviews and mitigations across our entire platform. We take reliability and security seriously. Doing so prepared us to scale to $500M in volume in under a year. You will help us scale the next 100x while keeping our systems secure and reliable.

About You

You have exceptional high agency and you don't let yourself be stuck on problems: you find creative solutions to complex reliability and security challenges so the business never stops running. When systems fail, you build the automation and tooling that helps the entire team respond effectively, not just heroically fix things yourself.
You are a "professional hacker" in the best sense - someone who can operate without much guidance, exercise excellent judgment on when to build vs buy vs configure, and see security and reliability as fundamental enablers of business success rather than obstacles to overcome.
8+ years of experience building, securing, and operating complex distributed systems at scale. You've been on-call, you've debugged production incidents, and you've built the monitoring and automation systems that reduced toil for entire engineering organizations.
You are passionate about making systems observable, reliable, and secure. You understand that the best reliability work multiplies the effectiveness of the entire team - better monitoring means faster debugging for everyone, better automation means less manual toil, and better incident response processes mean the whole team can handle issues confidently. We don't believe in heroes; we believe in systems that make heroics unnecessary.

You understand our specific technology stack and can hit the ground running:

Go microservices running on Google Cloud Run
PostgreSQL
Redis
Google Cloud Platform infrastructure (Cloud Run, Cloud Build, Pub/Sub, Cloud Storage)
Terraform for infrastructure as code
Blockchain indexing and transaction submission
External service integrations

You have deep expertise in at least several of these areas:

Building comprehensive observability platforms (metrics, logs, traces, dashboards)
Designing and implementing effective alerting strategies that minimize noise while catching real issues
Creating automation and tooling that reduces operational toil
Establishing incident response processes, runbooks, and postmortem practices
Conducting security audits and threat modeling for distributed systems
Implementing security controls, authentication/authorization systems, and secrets management
Performance optimization and capacity planning for high-throughput systems
Database reliability, backup/recovery strategies, and data integrity
API security, rate limiting, and DDoS mitigation
Compliance and audit logging for financial systems

You understand that sometimes the rocket must be launched and completed in flight. This means you're comfortable making pragmatic security and reliability tradeoffs when needed, while always having a plan to improve things incrementally. You know when "good enough for now with monitoring" is the right answer, and when "we need to fix this before we ship" is non-negotiable.

What You'll Own

Observability & Monitoring: Build and maintain comprehensive monitoring across our microservices architecture. Instrument our Go services with meaningful metrics. Create dashboards that tell the story of system health. Ensure every engineer can debug any issue in production with the data we collect.
Alerting & On-call Support: Design alerting strategies that wake people up for real problems, not noise. Every engineer is already in an oncall rotation - your job is to make their lives easier by building better alerts, better runbooks, and better automation. Reduce the toil so oncall is manageable and incidents are handled smoothly by whoever is on duty.
Security Audits & Reviews: Conduct regular security reviews of our codebase, infrastructure, and third-party integrations. Identify vulnerabilities before they become incidents. Work with the team to implement mitigations. Establish security best practices and ensure they're followed.
Incident Response Systems: Build the systems and processes that enable effective incident response across the team. Create runbooks, automate common remediation tasks, and establish postmortem practices that turn incidents into learning opportunities. Make it easy for any engineer to handle incidents confidently.
Reliability Engineering: Identify and eliminate single points of failure. Implement circuit breakers, retries, and graceful degradation. Build automation that reduces manual operational work. Ensure our systems can handle 100x growth without proportionally increasing operational burden.
Infrastructure Security: Secure our GCP infrastructure, manage secrets properly, implement least-privilege access controls, and ensure our Terraform configurations follow security best practices. Own the security of our CI/CD pipelines and deployment processes.

What You’ll Get In Return

A dynamic and engaging environment focused on fostering real growth and innovation
Opportunities to create amazing products that our customers truly love and value
Comprehensive health insurance packages with dependent coverage
Competitive salary with ample opportunities for career advancement and development
Enjoy the flexibility of a fully remote work environment
Access to employee wellness programs designed to support your overall well-being
401(k) plan with a 4% employer match to help you plan for the future

Top Skills

Blockchain

External Service Integrations

Google Cloud Platform

Google Cloud Run

Postgres

Redis

Terraform

Similar Jobs

Vertafore

Account Manager

An Hour Ago

Remote or Hybrid

CO, USA

55K-85K Annually

Mid level

55K-85K Annually

Mid level

Information Technology • Insurance • Software

The Sr. Field Account Manager is responsible for developing customer relationships, selling Vertafore products, and achieving revenue targets while ensuring customer satisfaction.

Top Skills: Crm SoftwareMS OfficeSalesforceWeb Conferencing

Airwallex

Senior Product Marketing Manager

An Hour Ago

Remote or Hybrid

San Francisco, CA, USA

Senior level

Artificial Intelligence • Fintech • Payments • Business Intelligence • Financial Services • Generative AI

The Senior Product Marketing Manager will develop product positioning, manage product launches, and enhance customer experience by collaborating with various teams.

Top Skills: AI

Airwallex

Manager, Performance Marketing, Brandformance

An Hour Ago

Remote or Hybrid

San Francisco, CA, USA

Senior level

Artificial Intelligence • Fintech • Payments • Business Intelligence • Financial Services • Generative AI

As Manager of Performance Marketing, you'll develop strategies to enhance brand awareness and drive measurable business outcomes through multi-channel campaigns and analytics.

Top Skills: Ga4LookerMarketoSalesforceSQLTableau

What you need to know about the Colorado Tech Scene

With a business-friendly climate and research universities like CU Boulder and Colorado State, Colorado has made a name for itself as a startup ecosystem. The state boasts a skilled workforce and high quality of life thanks to its affordable housing, vibrant cultural scene and unparalleled opportunities for outdoor recreation. Colorado is also home to the National Renewable Energy Laboratory, helping cement its status as a hub for renewable energy innovation.

Key Facts About Colorado Tech

Number of Tech Workers: 260,000; 8.5% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Lockheed Martin, Century Link, Comcast, BAE Systems, Level 3
Key Industries: Software, artificial intelligence, aerospace, e-commerce, fintech, healthtech
Funding Landscape: $4.9 billion in VC funding in 2024 (Pitchbook)
Notable Investors: Access Venture Partners, Ridgeline Ventures, Techstars, Blackhorn Ventures
Research Centers and Universities: Colorado School of Mines, University of Colorado Boulder, University of Denver, Colorado State University, Mesa Laboratory, Space Science Institute, National Center for Atmospheric Research, National Renewable Energy Laboratory, Gottlieb Institute