Courtyard.io Logo

Courtyard.io

Staff Software Engineer - Security and Reliability

Posted Yesterday
Remote
Hiring Remotely in United States
Senior level
Remote
Hiring Remotely in United States
Senior level
The Staff Software Engineer will manage security, reliability, and observability for an e-commerce platform, implementing monitoring systems and conducting security audits.
The summary above was generated by AI

About Courtyard


Courtyard.io is one of the fastest-growing collectibles startups. From cards to coins, we’re making it faster, easier, and more exciting than ever to discover, collect, and cash out instantly.
We’re not just another marketplace. With thrilling pack rips, instant liquidity, and seamless vaulting, Courtyard.io delivers the ultimate collecting experience. Whether you’re investing, trading, or curating your dream collection, we’ve built a platform that’s trusted, simple, and built for speed.
And we’re just getting started. We’re a remote-first company hiring across all functions to push the boundaries of what’s possible in collectibles and digital ownership.

About the role


We are actively recruiting a staff software engineer to own the security, reliability, and observability of the fastest growing e-commerce startup. You will be reporting directly to our Head of Engineering and work very closely with many members of our engineering team. Your mission will include establishing and maintaining world-class observability, monitoring and alerting systems, building systems that reduce operational toil for the entire engineering team, and conducting security audits, reviews and mitigations across our entire platform. We take reliability and security seriously. Doing so prepared us to scale to $500M in volume in under a year. You will help us scale the next 100x while keeping our systems secure and reliable.


About You


  • You have exceptional high agency and you don't let yourself be stuck on problems: you find creative solutions to complex reliability and security challenges so the business never stops running. When systems fail, you build the automation and tooling that helps the entire team respond effectively, not just heroically fix things yourself.
  • You are a "professional hacker" in the best sense - someone who can operate without much guidance, exercise excellent judgment on when to build vs buy vs configure, and see security and reliability as fundamental enablers of business success rather than obstacles to overcome.
  • 8+ years of experience building, securing, and operating complex distributed systems at scale. You've been on-call, you've debugged production incidents, and you've built the monitoring and automation systems that reduced toil for entire engineering organizations.
  • You are passionate about making systems observable, reliable, and secure. You understand that the best reliability work multiplies the effectiveness of the entire team - better monitoring means faster debugging for everyone, better automation means less manual toil, and better incident response processes mean the whole team can handle issues confidently. We don't believe in heroes; we believe in systems that make heroics unnecessary.

You understand our specific technology stack and can hit the ground running:

  •  Go microservices running on Google Cloud Run
  •  PostgreSQL
  •  Redis
  •  Google Cloud Platform infrastructure (Cloud Run, Cloud Build, Pub/Sub, Cloud Storage)
  •  Terraform for infrastructure as code
  •  Blockchain indexing and transaction submission
  •  External service integrations

You have deep expertise in at least several of these areas:

  •  Building comprehensive observability platforms (metrics, logs, traces, dashboards)
  •  Designing and implementing effective alerting strategies that minimize noise while catching real issues
  •  Creating automation and tooling that reduces operational toil
  •  Establishing incident response processes, runbooks, and postmortem practices
  •  Conducting security audits and threat modeling for distributed systems
  •  Implementing security controls, authentication/authorization systems, and secrets management
  •  Performance optimization and capacity planning for high-throughput systems
  •  Database reliability, backup/recovery strategies, and data integrity
  •  API security, rate limiting, and DDoS mitigation
  •  Compliance and audit logging for financial systems


You understand that sometimes the rocket must be launched and completed in flight. This means you're comfortable making pragmatic security and reliability tradeoffs when needed, while always having a plan to improve things incrementally. You know when "good enough for now with monitoring" is the right answer, and when "we need to fix this before we ship" is non-negotiable.


What You'll Own

  • Observability & Monitoring: Build and maintain comprehensive monitoring across our microservices architecture. Instrument our Go services with meaningful metrics. Create dashboards that tell the story of system health. Ensure every engineer can debug any issue in production with the data we collect.
  • Alerting & On-call Support: Design alerting strategies that wake people up for real problems, not noise. Every engineer is already in an oncall rotation - your job is to make their lives easier by building better alerts, better runbooks, and better automation. Reduce the toil so oncall is manageable and incidents are handled smoothly by whoever is on duty.
  • Security Audits & Reviews: Conduct regular security reviews of our codebase, infrastructure, and third-party integrations. Identify vulnerabilities before they become incidents. Work with the team to implement mitigations. Establish security best practices and ensure they're followed.
  • Incident Response Systems: Build the systems and processes that enable effective incident response across the team. Create runbooks, automate common remediation tasks, and establish postmortem practices that turn incidents into learning opportunities. Make it easy for any engineer to handle incidents confidently.
  • Reliability Engineering: Identify and eliminate single points of failure. Implement circuit breakers, retries, and graceful degradation. Build automation that reduces manual operational work. Ensure our systems can handle 100x growth without proportionally increasing operational burden.
  • Infrastructure Security: Secure our GCP infrastructure, manage secrets properly, implement least-privilege access controls, and ensure our Terraform configurations follow security best practices. Own the security of our CI/CD pipelines and deployment processes.

What You’ll Get In Return

  • A dynamic and engaging environment focused on fostering real growth and innovation
  • Opportunities to create amazing products that our customers truly love and value
  • Comprehensive health insurance packages with dependent coverage
  • Competitive salary with ample opportunities for career advancement and development
  • Enjoy the flexibility of a fully remote work environment
  • Access to employee wellness programs designed to support your overall well-being
  • 401(k) plan with a 4% employer match to help you plan for the future

Top Skills

Blockchain
External Service Integrations
Go
Google Cloud Platform
Google Cloud Run
Postgres
Redis
Terraform

Similar Jobs

5 Minutes Ago
Remote or Hybrid
United States
Mid level
Mid level
Automotive • Big Data • Information Technology • Robotics • Software • Transportation • Manufacturing
The District Manager Parts and Service drives aftersales performance in the Pittsburgh area by partnering with dealerships, analyzing performance, and managing customer satisfaction initiatives. This role involves strategic planning, KPI management, and frequent travel to support dealership operations.
Top Skills: Automotive Parts And Service SystemsDealer Operating ReportsFixed Ops Analysis ToolsExcel
6 Minutes Ago
Remote or Hybrid
3 Locations
118K-205K Annually
Senior level
118K-205K Annually
Senior level
Automotive • Big Data • Information Technology • Robotics • Software • Transportation • Manufacturing
The Senior Analyst will develop ownership models, understand customer needs, collaborate with engineering teams, and represent GM at industry events to enhance customer value.
Top Skills: Python
21 Minutes Ago
In-Office or Remote
Austin, TX, USA
169K-266K Annually
Senior level
169K-266K Annually
Senior level
Cloud • Information Technology • Productivity • Security • Software • App development • Automation
Lead complex technical projects, mentor engineers, build strategies, and ensure compliance in a collaborative environment with cross-functional teams.
Top Skills: Data EngineeringDodFedramp

What you need to know about the Colorado Tech Scene

With a business-friendly climate and research universities like CU Boulder and Colorado State, Colorado has made a name for itself as a startup ecosystem. The state boasts a skilled workforce and high quality of life thanks to its affordable housing, vibrant cultural scene and unparalleled opportunities for outdoor recreation. Colorado is also home to the National Renewable Energy Laboratory, helping cement its status as a hub for renewable energy innovation.

Key Facts About Colorado Tech

  • Number of Tech Workers: 260,000; 8.5% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Lockheed Martin, Century Link, Comcast, BAE Systems, Level 3
  • Key Industries: Software, artificial intelligence, aerospace, e-commerce, fintech, healthtech
  • Funding Landscape: $4.9 billion in VC funding in 2024 (Pitchbook)
  • Notable Investors: Access Venture Partners, Ridgeline Ventures, Techstars, Blackhorn Ventures
  • Research Centers and Universities: Colorado School of Mines, University of Colorado Boulder, University of Denver, Colorado State University, Mesa Laboratory, Space Science Institute, National Center for Atmospheric Research, National Renewable Energy Laboratory, Gottlieb Institute

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account