Discogs Logo

Discogs

Senior Site Reliability Engineer (REMOTE)

Posted 12 Days Ago
In-Office or Remote
5 Locations
130K-140K Annually
Senior level
In-Office or Remote
5 Locations
130K-140K Annually
Senior level
As a Senior Site Reliability Engineer, you will maintain infrastructure, automate deployments, mentor teams, handle incidents, and enhance system reliability.
The summary above was generated by AI
Description

The Discogs Platform team is focused on several objectives: building and supporting performant, cost-effective, reliable infrastructure; developer experience tooling and mentorship; and creating "golden paths" for organization-wide standards and velocity. As a Platform member, the Senior Site Reliability Engineer will contribute to the Platform team’s centralized infrastructure, including maintenance, monitoring, and automation of services ranging from databases to Kubernetes; lead incident response and postmortem efforts; and work closely with other engineering teams to understand their needs and drive improvements to both our technologies and processes.

Location

This is a remote position. Open to candidates located in OR, WA, CA, CO, TX, IL

Compensation

Starting Base Salary Range: $130,000 - $140,000 yearly


Who We Are

We are dedicated to supporting a global community of music fans and collectors who share the value, culture, connection, and joy of record collecting. Fostering the exchange of knowledge, records, and curation, we help people help each other deepen their relationship with music. Leveraging the power of community, we are committed to enabling people to explore artists and their recorded works through the world's definitive music discography, stay informed with record collection and sales history data, get organized with specialized collection management tools, and stay connected to a global community of fellow record collectors and sellers. Providing this essential set of resources, tools, and access, we aim to unleash boundless opportunities for people to dig into the depths of their musical interests, build and fortify their record collections, cultivate and bridge communities, and elevate their connection to music and record collecting.

What You’ll Accomplish

Reasonable accommodations may be made to enable individuals with disabilities to perform the essential functions.

  • Maintaining organization cloud presence in AWS
  • Automating and deploying infrastructure configurations using Infrastructure as Code (IAC)
  • Mentoring engineering squads on Platform best practices for Kubernetes, MySQL, Kafka, and other software development lifecycle areas
  • Assist engineering squads with capacity planning, infrastructure budgeting, and production readiness
  • Writing documentation and runbooks that contribute to the engineering organization’s knowledge base
  • Implementing monitoring and alerting systems with Discogs observability tools
  • Working in a containerized, orchestrated environment
  • Participating in on-call rotation, responding to incidents, and troubleshooting data and other operations issues
  • Contribute to efforts on the reliability and design patterns of our Kafka, Kafka Connect and database implementations
Requirements

What You’ll Contribute

Minimum Education and Experience

  • A Bachelor's Degree in Computer Science or similar area of focus, or equivalent relevant work experience.
  • 5+ years experience in Ops, DevOps, Site Reliability, Platform or other systems roles.

Required Skills & Abilities:

  • Infrastructure-as-code (Terraform)
  • CI/CD (GitHub Actions)
  • GitOps (ArgoCD)
  • Kubernetes (EKS, Kustomize, Karpenter, administration, application manifests)
  • AWS and cloud development (VPC, EKS, RDS, S3)
  • FinOps and cloud cost optimization
  • Observability (Datadog, Sentry)
  • Scripting (Shell, Python)
  • Track record of collaboration and mentorship
  • Excellent written communication and documentation skills
  • Continuous learning
  • Ownership and proactive approach to solving large problems

Preferred:

  • Kafka: Cluster administration (Strimzi), Kafka Connect (Debezium, JDBC)
  • Relational database administration and performance (MySQL, Percona Server, AWS RDS)
  • Elasticsearch (ECK administration, scaling, performance)
  • Python (SQLAlchemy, FastAPI)
  • GraphQL (schema design, Apollo federation)
  • REST API
  • Hashicorp Vault
  • Redis
  • Memcached

The Platform team covers a wide range of technical topics and we'd love to hear about your skills beyond this list!

Benefits

What We Provide

    • Competitive compensation: salary, plus performance-related bonus program
    • 401(k) with employer match
    • 100% company-paid medical and dental insurance benefits for you and your dependents
    • 4 weeks paid vacation, increasing based on tenure
    • 18 weeks paid leave for birth moms
    • 8 weeks paid parental leave, including for adoption
    • Monthly wellness allowance
    • Annual professional and personal development allowance
    • Work from home office set-up and expense allowances
    • Flexible work location opportunities
    • Employer matching toward charitable contributions

What We Believe In

We're building a world idealized for record collectors, driven by community, and fueled by a shared passion for music. Through culture, information, and innovation, we strive to develop a complete ecosystem of resources to empower music lovers and entrepreneurs everywhere to engage more deeply in the joys and possibilities of record collecting. We foster a collaborative community dedicated to preserving the recording industry's past, present, and unfolding future by cataloging the world's complete, interconnected music discography. Leveraging the power of this dynamic knowledge base, we aim to innovate integrated technologies to empower music fans everywhere to embark on a boundless journey of music discovery and record collecting. We envision this to be the complete collecting journey.

Discogs is an Equal Opportunity Employer.

Applicants needing accommodation to apply should contact us at 503-597-6340

Discogs does not promote job openings through text messaging. If you receive a text message claiming to offer a position at our company, please disregard it as fraudulent. For a list of our actively open positions and to apply, please visit the official Careers page on our website:

If you apply for this role, you will be required to upload a resume, cover letter, and fill out a few questions regarding your application. Once submitted, our hiring team will review your application and contact you if you are selected for an interview. Whether you are successful or not, we will store your application and data in our system for a maximum period of one year from the application date in case another role becomes available that you are suitable for. If you have any questions or concerns about us storing this data and/or the period of time, please contact us at [email protected] and we will respond to you within 30 days.

Top Skills

Argocd
AWS
Datadog
Elasticsearch
Github Actions
GraphQL
Hashicorp Vault
Kafka
Kubernetes
Memcached
MySQL
Postgres
Python
Redis
Rest Api
Sentry
Shell
Terraform

Similar Jobs

6 Days Ago
Remote
United States
Senior level
Senior level
Big Data • Marketing Tech • Analytics
As a Senior Site Reliability Engineer, you'll ensure software system availability, develop reliability tools, and lead project improvements.
Top Skills: AirflowAmazon Web ServicesAtlantisBigQueryCloud ComposerCloud FunctionsCloud RunCloudsqlGithub ActionsGkeGoGoogle Cloud LoggingGoogle Cloud PlatformHarnessHelmKubeflow PipelinesKubernetesLookerNexusPub/SubPythonScalaTerraform
2 Days Ago
Easy Apply
Remote or Hybrid
United States
Easy Apply
127K-249K Annually
Senior level
127K-249K Annually
Senior level
Big Data • Cloud • Software • Database
The Senior Site Reliability Engineer defines observability standards, builds infrastructure for monitoring services, and collaborates to improve system reliability and performance.
Top Skills: AWSFluentbitGCPJaegerKubernetesAzureMongoDBQuickwitSplunkVectorVictoriametrics
12 Days Ago
Remote
USA
Senior level
Senior level
Automotive • Software
The Senior Site Reliability Engineer will optimize platform reliability, manage Kubernetes infrastructure, deploy monitoring solutions, and collaborate on system performance.
Top Skills: AndroidArgocdAWSCircleCICrossplaneDockerGCPGitGoGrafanaKafkaKubernetesLokiNew RelicObjective-COpentelemetryPostgresPrometheusPythonReactRedisRedshiftReduxRuby On RailsSentrySwiftTerraformThanos

What you need to know about the Colorado Tech Scene

With a business-friendly climate and research universities like CU Boulder and Colorado State, Colorado has made a name for itself as a startup ecosystem. The state boasts a skilled workforce and high quality of life thanks to its affordable housing, vibrant cultural scene and unparalleled opportunities for outdoor recreation. Colorado is also home to the National Renewable Energy Laboratory, helping cement its status as a hub for renewable energy innovation.

Key Facts About Colorado Tech

  • Number of Tech Workers: 260,000; 8.5% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Lockheed Martin, Century Link, Comcast, BAE Systems, Level 3
  • Key Industries: Software, artificial intelligence, aerospace, e-commerce, fintech, healthtech
  • Funding Landscape: $4.9 billion in VC funding in 2024 (Pitchbook)
  • Notable Investors: Access Venture Partners, Ridgeline Ventures, Techstars, Blackhorn Ventures
  • Research Centers and Universities: Colorado School of Mines, University of Colorado Boulder, University of Denver, Colorado State University, Mesa Laboratory, Space Science Institute, National Center for Atmospheric Research, National Renewable Energy Laboratory, Gottlieb Institute

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account