Design and implement solutions to enhance platform reliability. Lead projects, collaborate across teams, and mentor junior members while enforcing best practices and reliability metrics.
Attentive® is the AI marketing platform for 1:1 personalization redefining the way brands and people connect. We’re the only marketing platform that combines powerful technology with human expertise to build authentic customer relationships. By unifying SMS, RCS, email, and push notifications, our AI-powered personalization engine delivers bespoke experiences that drive performance, revenue, and loyalty through real-time behavioral insights.
Recognized as the #1 provider in SMS Marketing by G2, Attentive partners with more than 8,000 customers across 70+ industries. Leading global brands like Crate and Barrel, Urban Outfitters, and Carter’s work with us to enable billions of interactions that power tens of billions in revenue for our customers.
With a distributed global workforce and employee hubs in New York City, San Francisco, London, and Sydney, Attentive’s team has been consistently recognized for its performance and culture. We’re proud to be included in Deloitte’s Fast 500 (four years running!), LinkedIn’s Top Startups, Forbes’ Cloud 100 (five years running!), Inc.’s Best Workplaces, and the Human Rights Campaign Foundation's Corporate Equality Index!
About the Role
What You’ll Accomplish
- Design and deliver high-impact solutions: Design and implement systems that enhance reliability, observability, traceability, and incident management, ensuring the platform scales effectively
- Lead execution on key projects: Take ownership of projects, driving them from discovery through execution
- Partner across teams: Collaborate with engineers from AI/ML, Data, Platform, and Product teams to develop best-in-class platforms and services
- Establish standards and best practices: Define and enforce production standards, processes, and tools to ensure operational excellence
- Champion reliability goals: Advocate for and implement SLIs, SLOs, and other reliability-focused metrics across the engineering organization
- Mentor and knowledge share: Guide and mentor junior team members, fostering technical growth and helping to develop the next generation of engineering leaders
- Innovate and inspire: Drive continuous improvement by bringing creative ideas and challenging the status quo
Your Expertise
- 5+ years of experience in Production Engineering, SRE, Platform Engineering, DevOps, Backend Engineering, or similar roles
- Strong coding ability in at least one language (e.g., Golang, Python, Java, Typescript) with the capability to solve complex issues through code
- Experience with cloud-native technologies and Infrastructure-as-Code (e.g. Kubernetes, Terraform, AWS)
- Demonstrated experience delivering medium to large-scale projects that drive meaningful improvements in platform reliability and scalability
- Deep understanding of production reliability concepts, including SLIs, SLOs, and incident management
- Proficient in designing and maintaining CI/CD pipelines, deployment strategies, and release automation to enable fast, safe delivery
- Fluency in frontier AI-assisted development tools and agents (Claude Code, Codex, Cursor, or similar)
- Excellent verbal and written communication skills with the ability to collaborate across technical and non-technical teams
- Familiarity with working in dynamic, reliability-focused production environments (preferred)
What We Use
- Our services run primarily in Kubernetes, hosted on AWS EKS
- Our tooling includes Terraform, Helm, ArgoCD, Istio, CloudFlare, Datadog, and Incident.io
- Our backend is primarily Java / Spring Boot microservices, built with Gradle, coupled with things like DynamoDB, Kinesis, AirFlow, Postgres, and Redis
- Our frontend is built with React and TypeScript, and uses best practices like GraphQL, Storybook, Radix UI, Vite, esbuild, and Playwright
- Our automation is driven by custom and open source machine learning models, lots of data and built with Python, Metaflow, HuggingFace 🤗, PyTorch, TensorFlow, and Pandas
You'll get competitive perks and benefits, from health & wellness to equity, to help you bring your best self to work.
For US based applicants:
- The US base salary range for this full-time position is $220,000 - 275,000 annually + equity + benefits
- Our salary ranges are determined by role, level and location
#LI-EF1
By applying for this position, your data will be processed as per Attentive's Privacy Policy.
Attentive Company Values
Default to Action - Move swiftly and with purpose
Be One Unstoppable Team - Rally as each other’s champions
Champion the Customer - Our success is defined by our customers' success
Act Like an Owner - Take responsibility for Attentive’s success
Learn more about AWAKE, Attentive’s collective of employee resource groups.
If you do not meet all the requirements listed here, we still encourage you to apply! No job description is perfect, and we may also have another opportunity that closely matches your skills and experience.
At Attentive, we know that our Company's strength lies in the diversity of our employees. Attentive is an Equal Opportunity Employer and we welcome applicants from all backgrounds. Our policy is to provide equal employment opportunities for all employees, applicants and covered individuals regardless of protected characteristics. We prioritize and maintain a fair, inclusive and equitable workplace free from discrimination, harassment, and retaliation. Attentive is also committed to providing reasonable accommodations for candidates with disabilities. If you need any assistance or reasonable accommodations, please let your recruiter know.
Top Skills
Airflow
AWS
Cloudflare
Datadog
DynamoDB
Esbuild
Go
Gradle
GraphQL
Incident.Io
Java
Java Spring Boot
Kinesis
Kubernetes
Pandas
Playwright
Postgres
Python
PyTorch
Radix Ui
React
Redis
Storybook
TensorFlow
Terraform
Typescript
Vite
Similar Jobs
Artificial Intelligence • Consumer Web • Digital Media • Information Technology • Social Impact • Software
The Senior Site Reliability Engineer will manage system incidents, improve monitoring and logging, optimize database infrastructure, and collaborate on scaling systems efficiently.
Top Skills:
AWSClickhouseKubernetesMySQLPostgresRedis
Big Data • Cloud • Software • Database
Develop and maintain Kubernetes runtime environments, support developers, resolve critical issues, and participate in on-call rotations for production systems.
Top Skills:
AWSAzureCert-ManagerCorednsCrdsCriCsiGatekeeperGCPGoHelmKubernetesKustomizeOperatorsPythonTerraform
Fintech • Software
The Senior Site Reliability Engineer ensures fast, stable SaaS products through automation, collaboration, monitoring, and implementing AI tools to enhance performance and reliability.
Top Skills:
Ai ToolsAnsibleAppdynamicsAWSAzureAzure DevopsBashC# .NetCosmosDatadogDynatraceHarnessJavaJenkinsKubernetesNew RelicPowershellPythonSaaSSQLTerraform
What you need to know about the Colorado Tech Scene
With a business-friendly climate and research universities like CU Boulder and Colorado State, Colorado has made a name for itself as a startup ecosystem. The state boasts a skilled workforce and high quality of life thanks to its affordable housing, vibrant cultural scene and unparalleled opportunities for outdoor recreation. Colorado is also home to the National Renewable Energy Laboratory, helping cement its status as a hub for renewable energy innovation.
Key Facts About Colorado Tech
- Number of Tech Workers: 260,000; 8.5% of overall workforce (2024 CompTIA survey)
- Major Tech Employers: Lockheed Martin, Century Link, Comcast, BAE Systems, Level 3
- Key Industries: Software, artificial intelligence, aerospace, e-commerce, fintech, healthtech
- Funding Landscape: $4.9 billion in VC funding in 2024 (Pitchbook)
- Notable Investors: Access Venture Partners, Ridgeline Ventures, Techstars, Blackhorn Ventures
- Research Centers and Universities: Colorado School of Mines, University of Colorado Boulder, University of Denver, Colorado State University, Mesa Laboratory, Space Science Institute, National Center for Atmospheric Research, National Renewable Energy Laboratory, Gottlieb Institute



