WHO WE ARE
Zeta Global (NYSE: ZETA) is the AI-Powered Marketing Cloud that leverages advanced artificial intelligence (AI) and trillions of consumer signals to make it easier for marketers to acquire, grow, and retain customers more efficiently. Through the Zeta Marketing Platform (ZMP), our vision is to make sophisticated marketing simple by unifying identity, intelligence, and omnichannel activation into a single platform – powered by one of the industry’s largest proprietary databases and AI. Our enterprise customers across multiple verticals are empowered to personalize experiences with consumers at an individual level across every channel, delivering better results for marketing programs. Zeta was founded in 2007 by David A. Steinberg and John Sculley and is headquartered in New York City with offices around the world. To learn more, go to www.zetaglobal.com.
Role Overview
We are seeking a Principal DevOps Engineer to serve as a transformative force in how ZetaGlobal builds, deploys, and operates software at scale. This is not a maintenance role. You will be a DevOps disruptor: someone who challenges the status quo, reimagines deployment pipelines, and empowers hundreds of developers across multiple teams to ship code to production safely, multiple times per day, concurrently.
Your prime responsibility is to enable true continuous integration and continuous deployment (CI/CD) directly to production using canary releases, blue/green deployments, incremental rollout pipelines, feature flag-driven releases, and any other proven strategy that delivers speed with safety. You will architect and operate these systems within a regulated, globally compliant environment spanning GDPR, CCPA, and SOC 2 requirements.
In addition, you will serve as a Site Reliability Engineer (SRE) leader, ensuring safe operations, incident readiness, and platform stability as we continue to scale. You will influence both DevOps/SRE practices and software architecture decisions, simplifying and streamlining operational management across the organization.
Key Responsibilities
CI/CD & Deployment Excellence
- Design, build, and operate production-grade CI/CD pipelines enabling multiple developers on multiple teams to deploy concurrently to production, multiple times daily, with zero-downtime guarantees.
- Implement and optimize advanced deployment strategies including canary releases, blue/green deployments, rolling updates, incremental rollouts, and feature flag-gated releases via Statsig.
- Build self-service deployment tooling that empowers developers to own their release process while enforcing safety guardrails, automated rollback triggers, and automate compliance gates.
- Establish deployment observability with real-time canary analysis, automated health scoring, and progressive delivery metrics integrated with Grafana, Prometheus, and Honeycomb.
- Champion CI/CD workflows using GitLab CI/CD, Helm charts, and Terraform to ensure infrastructure and application deployments are version-controlled, auditable, and reproducible.
Platform Reliability & SRE
- Define and enforce SLOs/SLIs/SLAs across services, establishing error budgets that balance velocity with reliability.
- Lead incident response processes, including on-call rotations, runbook development, blameless postmortems, and incident command structure.
- Design and implement robust observability stacks leveraging Grafana, Prometheus, Loki, and Honeycomb for metrics, logging, tracing, and alerting at scale.
- Proactively identify and eliminate reliability risks through chaos engineering, load testing, capacity planning, and failure mode analysis.
- Reduce operational toil through automation, self-healing infrastructure patterns, and intelligent alerting to minimize mean time to detection (MTTD) and recovery (MTTR).
Infrastructure & Architecture
- Manage and optimize AWS infrastructure spanning EC2, SQS, DynamoDB, and related services with Infrastructure as Code (Terraform) best practices.
- Design and operate Kafka-based event streaming infrastructure for high-throughput, low-latency data pipelines supporting real-time marketing and analytics workloads.
- Ensure robust networking across the platform, including DNS management, service mesh configuration, load balancing, TCP/IP optimization, routing policies, and VPC architecture.
- Manage containerization strategy using Docker, ensuring efficient image builds, vulnerability scanning, registry management, and runtime security.
- Support data infrastructure operations across Snowflake, MySQL, and other database platforms, collaborating with data engineering teams on reliability and performance.
Compliance, Security & Governance
- Embed compliance controls directly into CI/CD pipelines, ensuring automated enforcement of GDPR, CCPA, and SOC 2 requirements at every stage of the software delivery lifecycle.
- Implement audit trails, change management controls, and deployment approval workflows required by regulatory frameworks in the MarTech and AdTech domains.
- Collaborate with Security and Legal teams to ensure infrastructure and deployment processes meet global compliance obligations across all operating regions.
- Maintain awareness of evolving privacy regulations (ePrivacy, state-level US laws, international data residency requirements) and proactively adapt infrastructure accordingly.
Technical Leadership & Influence
- Serve as a technical leader and DevOps disruptor, challenging legacy processes and introducing modern practices that dramatically improve developer velocity and operational safety.
- Influence software architecture decisions to simplify and streamline operational management, advocating for patterns that are deployment-friendly, observable, and resilient by design.
- Clearly communicate complex technical strategies to engineering leadership, product stakeholders, and cross-functional teams to build alignment and drive adoption.
- Develop reference architectures, internal standards, and golden path templates that codify best practices and accelerate onboarding of new services and teams.
- Participate in on-call rotations and lead by example in incident response, demonstrating the operational discipline expected across the engineering organization.
Required Qualifications
- 10+ years of progressive experience in DevOps, SRE, Platform Engineering, or Infrastructure Engineering roles, with demonstrated impact at staff or principal level.
- Expert-level Kubernetes knowledge, including cluster administration, Helm chart authoring, custom controllers/operators, network policies, RBAC, and multi-cluster management on AWS EKS.
- Deep expertise in CI/CD pipeline architecture and advanced deployment strategies (canary, blue/green, progressive delivery, feature flag integration) at scale.
- Strong proficiency with Infrastructure as Code using Terraform, including module design, state management, and multi-environment orchestration.
- Expert knowledge of Docker containerization, including multi-stage builds, security hardening, image optimization, and container runtime management.
- Production experience with Apache Kafka, including cluster management, topic design, consumer group strategies, and operational monitoring for high-throughput streaming workloads.
- Strong networking fundamentals: DNS (Route 53, internal DNS), TCP/IP, routing, API Gateway, load balancing (ALB/NLB), service mesh, VPC peering, transit gateways, and network troubleshooting.
- Extensive AWS experience spanning EKS, EC2, SQS, DynamoDB, IAM, VPC, CloudWatch, and related services in production environments.
- Hands-on experience with observability platforms: Grafana (dashboards, alerting), Prometheus (metrics, PromQL), Loki (log aggregation), and Honeycomb (distributed tracing, BubbleUp analysis).
- Working familiarity with multiple language stacks including Node.js, React, Python, Java, and Ruby, sufficient to understand build systems, dependency management, and runtime characteristics.
- Experience operating within regulated environments, with practical knowledge of GDPR, CCPA, SOC 2, and compliance automation in MarTech or AdTech domains.
- Proven ability to influence engineering culture, drive adoption of new practices, and communicate complex technical strategies clearly to both technical and non-technical stakeholders.
- Demonstrated experience with GitLab CI/CD pipelines, including advanced pipeline features such as parent-child pipelines, dynamic environments, and security scanning integration.
Preferred Qualifications
- AWS certifications: Solutions Architect Professional, DevOps Engineer Professional, or Security Specialty.
- Experience with Statsig or similar feature flag and experimentation platforms for progressive delivery and A/B testing in production.
- Background in chaos engineering tools and practices (Gremlin, Litmus, Chaos Monkey) for proactive resilience validation.
- Experience building internal developer platforms (IDPs) or platform-as-a-product organizations.
- Familiarity with FinOps practices and cloud cost optimization strategies.
- Contributions to open-source DevOps/SRE tools or active participation in the broader infrastructure community.
- Experience with service mesh technologies (Istio, Linkerd) for advanced traffic management and security.
BENEFITS & PERKS
- Unlimited PTO
- Excellent medical, dental, and vision coverage
- Employee Equity
- Employee Discounts, Virtual Wellness Classes, and Pet Insurance And more!!
SALARY RANGE
The salary range for this role is $180,000 - $210,000, depending on location and experience.
PEOPLE & CULTURE AT ZETA
Zeta considers applicants for employment without regard to, and does not discriminate on the basis of an individual’s sex, race, color, religion, age, disability, status as a veteran, or national or ethnic origin; nor does Zeta discriminate on the basis of sexual orientation, gender identity or expression.
We’re committed to building a workplace culture of trust and belonging, so everyone feels invited to bring their whole selves to work. We provide a forum for employees to celebrate, support and advocate for one another. Learn more about our commitment to diversity, equity and inclusion here: https://zetaglobal.com/blog/a-look-into-zetas-ergs/
ZETA IN THE NEWS!
https://zetaglobal.com/press/?cat=press-releases
#LI-YW1
Top Skills
Similar Jobs at Zeta Global
What you need to know about the Colorado Tech Scene
Key Facts About Colorado Tech
- Number of Tech Workers: 260,000; 8.5% of overall workforce (2024 CompTIA survey)
- Major Tech Employers: Lockheed Martin, Century Link, Comcast, BAE Systems, Level 3
- Key Industries: Software, artificial intelligence, aerospace, e-commerce, fintech, healthtech
- Funding Landscape: $4.9 billion in VC funding in 2024 (Pitchbook)
- Notable Investors: Access Venture Partners, Ridgeline Ventures, Techstars, Blackhorn Ventures
- Research Centers and Universities: Colorado School of Mines, University of Colorado Boulder, University of Denver, Colorado State University, Mesa Laboratory, Space Science Institute, National Center for Atmospheric Research, National Renewable Energy Laboratory, Gottlieb Institute

