Quantiphi Logo

Quantiphi

Infrastructure Architect (GCP)

Posted 12 Days Ago
Be an Early Applicant
Remote
Hiring Remotely in USA
Senior level
Remote
Hiring Remotely in USA
Senior level
Design and implement hybrid infrastructure solutions supporting AI/GenAI workloads, collaborating with teams to optimize performance and cost across cloud and on-prem environments.
The summary above was generated by AI

While technology is the heart of our business, a global and diverse culture is the heart of our success. We love our people and we take pride in catering them to a culture built on transparency, diversity, integrity, learning and growth.
If working in an environment that encourages you to innovate and excel, not just in professional but personal life, interests you- you would enjoy your career with Quantiphi!

About Quantiphi:
Quantiphi is an award-winning Applied AI and Big Data software and services company, driven by a deep desire to solve transformational problems at the heart of businesses. Our signature approach combines groundbreaking machine learning research with disciplined cloud and data-engineering practices to create breakthrough impact at unprecedented speed.

Company Highlights:
Quantiphi has seen 2.5x growth YoY since its inception in 2013, we don’t just innovate - we lead. Headquartered in Boston, with 4,000+ Quantiphi professionals across the globe. As an Elite/Premier Partner for Google Cloud, AWS, NVIDIA, Snowflake, and others, we’ve been recognized with:

  • 17x Google Cloud Partner of the Year awards in the last 8 years.
  • 3x AWS AI/ML award wins.
  • 3x NVIDIA Partner of the Year titles.
  • 2x Snowflake Partner of the Year awards.
  • We have also garnered top analyst recognitions from Gartner, ISG, and Everest Group.
  • We offer first-in-class industry solutions across Healthcare, Financial Services, Consumer Goods, Manufacturing, and more, powered by cutting-edge Generative AI and Agentic AI accelerators.
  • We have been certified as a Great Place to Work for the third year in a row- 2021, 2022, 2023.

Be part of a trailblazing team that’s shaping the future of AI, ML, and cloud innovation. Your next big opportunity starts here!

For more details, visitWebsite or LinkedIn Page.

Work Location: Dallas (preferred) but anywhere in US works.

Role Overview:

  • We are seeking a seasoned Infrastructure Architect with deep expertise in both cloud platforms and on-premise infrastructure to design, implement, and manage robust hybrid environments that can support high-compute AI and GenAI workloads.
  • You will work onsite with one of our key enterprise clients to assess existing infrastructure, define scalable architectures, and ensure optimal performance for AI/ML and GenAI solutions.
  • You’ll play a critical role in bridging infrastructure, DevOps, and AI solution delivery, ensuring our client has the right foundational stack to scale advanced AI workloads across their enterprise.

Key Responsibilities:

Hybrid Infrastructure Design & Deployment:

  • Architect and implement secure, scalable, and cost-effective infrastructure solutions across on-prem and cloud (GCP, AWS, Azure) environments.
  • Evaluate existing systems and define migration or integration strategies for deploying AI/GenAI workloads in hybrid setups.
  • Design infrastructure supporting GPU-intensive workloads, distributed training, inferencing, and vector database storage.

Cloud & On-Prem Operations:

  • Manage provisioning, automation, and orchestration across virtual machines, containers, and Kubernetes clusters.
  • Implement and monitor high-availability, low-latency, and disaster recovery strategies.
  • Optimize infrastructure for latency-sensitive applications, including real-time GenAI agentic workflows.

Collaboration & Enablement:

  • Work closely with AI/ML engineers, data scientists, solution architects, and DevOps to ensure smooth deployment and scaling of models and GenAI agents.
  • Recommend best practices on hybrid infrastructure for LLM fine-tuning, RAG architecture, and multi-agent orchestration platforms.
  • Guide teams on infrastructure security, IAM policies, and governance frameworks for GenAI applications.

Performance & Cost Optimization:

  • Continuously benchmark, profile, and optimize infrastructure for performance and efficiency.
  • Monitor resource utilization and propose capacity planning strategies for AI workload peaks.

Key Qualifications & Experience:

  • Bachelor’s or Master’s degree in Computer Science, Information Systems, or related field.
  • 8–15 years of experience in enterprise infrastructure architecture, with significant experience in both on-prem and cloud-native environments.
  • Proven track record in designing and deploying AI/ML or GenAI-supporting infrastructure (e.g., GPU clusters, Kubernetes for ML workloads, hybrid vector databases).
  • Deep knowledge of cloud services (GCP preferred; AWS or Azure acceptable), on-prem virtualization, storage, networking, and container orchestration.
  • Experience supporting multi-agentic GenAI frameworks, including task orchestration, distributed agents, and workflow automation.
  • Hands-on experience in DevOps and IaC tools (Terraform, Helm, Ansible, CI/CD).
  • Familiarity with AI governance, data security, and compliance in hybrid environments.

Required Skills:

GCP Infrastructure Design & Deployment
Deep hands-on expertise in architecting and managing solutions on Google Cloud Platform, including:

  • VPC design, subnetting, firewall rules, Private Service Connect, and Cloud Interconnect for secure hybrid networking.
  • Identity & Access Management (IAM), Workload Identity Federation, and service accounts for secure access control across services.
  • Cloud Load Balancing, Cloud NAT, and Cloud Armor for high-availability, secure ingress/egress management.
  • Resource hierarchy and organization policies to manage large-scale enterprise GCP environments.

AI/GenAI-Centric Compute & Storage Architecture
Strong understanding of compute services tailored to GenAI:

  • Compute Engine for custom VM/GPU provisioning (A100/H100, T4).
  • GKE (Google Kubernetes Engine) for containerized model deployments, including support for GPU workloads and node auto-provisioning.
  • Vertex AI and Vertex AI Workbench for managing ML pipelines, training, model registry, and deployments.

Storage architecture experience with:

  • Cloud Storage (standard, nearline, coldline) for unstructured datasets.
  • Filestore, Local SSDs, and Persistent Disks for high-throughput model training and inferencing.
  • Integration with BigQuery and Spanner for structured data workloads supporting GenAI applications.

Containerization, Orchestration & IaC on GCP:

  • Advanced experience with GKE:
  • Cluster autoscaling, workload identity, taints/tolerations for GPU scheduling.
  • Helm-based deployments and integration with Artifact Registry.

Proficient in Infrastructure as Code using:

  • Terraform (with GCP provider modules) for declarative infrastructure deployment.
  • Cloud Build, Cloud Deploy, or integration with GitHub Actions for CI/CD pipelines.
  • Ability to automate infrastructure provisioning, policy enforcement, and environment standardization.

Support for GenAI Architectures:

  • Experience deploying and optimizing infrastructure for:
  • LLM hosting using Triton Inference Server, vLLM, or Text Generation Inference on GKE or Compute Engine.
  • Vector database integrations (Weaviate, ChromaDB, FAISS) with GCS and BigQuery.
  • RAG pipeline infrastructure including document ingestion (e.g., via Pub/Sub, Cloud Functions) and scalable retrieval.
  • Multi-agent frameworks like LangGraph, CrewAI, or AutoGen, with secure multi-service orchestration across GCP services.

Observability, Security, and Governance
Monitoring & observability stack:

  • Cloud Monitoring, Cloud Logging, Cloud Trace, Profiler, and Error Reporting for full-stack visibility.
  • Experience setting up custom dashboards, alerts, and uptime checks.
  • Security and compliance capabilities:
  • VPC Service Controls, Shielded VMs, Confidential Computing, and data encryption strategies (at rest and in transit).
  • Experience with cloud security posture management (CSPM) and compliance frameworks (e.g., HIPAA, SOC 2, FedRAMP).

Governance:

  • Experience setting up Organization Policies, Folder Hierarchies, and Cloud Asset Inventory for enterprise governance.

Cost Optimization & Resource Efficiency:
Proven ability to:

  • Monitor and optimize spend using Billing Reports, Cost Table Reports, Budgets, and Recommendations Hub.
  • Implement rightsizing recommendations, sustained use discounts, and committed use contracts (CUDs) for GPU workloads.
  • Design cost-aware architecture balancing performance, latency, and throughput for GenAI use cases.

Soft Skills & Personality Traits:

  • Strong problem-solving and debugging skills.
  • Ability to communicate technical concepts clearly to non-technical stakeholders.
  • Collaborative mindset with ability to work cross-functionally across AI, DevOps, and business teams.
  • Detail-oriented, with a focus on reliability, scalability, and security.

Preferred:

  • GCP Professional Cloud Architect, AWS Solutions Architect, or similar certifications.
  • Familiarity with GPUs (NVIDIA A100, H100), inference acceleration, and edge deployments
  • Familiarity with AI/ML governance, compliance, and ethical AI frameworks.

What is in it for you:

  • Be part of a team and company that has won NVIDIA's AI Services Partner of the Year three times in a row with an unparalleled track record of building production AI applications on DGX and Cloud GPUs.
  • Strong peer learning which will accelerate your learning curve across Applied AI, GPU Computing and other softer aspects such as technical communication.
  • Exposure to working with highly experienced AI leaders at Fortune 500 companies and innovative market disruptors looking to transform their business with Generative AI.
  • Access to state-of-the-art GPU infrastructure on the cloud and on-premise.
  • Be part of the fastest-growing AI-first digital transformation and engineering company in the world.

If you like wild growth and working with happy, enthusiastic over-achievers, you'll enjoy your career with us!

Top Skills

Ansible
AWS
Azure
Ci/Cd
Cloud Services
GCP
Gpu Clusters
Helm
Kubernetes
Terraform

Similar Jobs

An Hour Ago
Remote or Hybrid
Framingham, MA, USA
Mid level
Mid level
Cloud • HR Tech • Information Technology • Software
As an NLP Data Scientist, you'll develop and deploy advanced NLP models, collaborate with cross-functional teams, and address ethical AI considerations.
Top Skills: GensimLangchainLlamaindexNltkPgvectorPineconePythonPyTorchSklearnSpacyTensorFlow
An Hour Ago
Remote or Hybrid
New York, NY, USA
100K-120K Annually
Mid level
100K-120K Annually
Mid level
Productivity • Sales • Software
Lead social impact initiatives, manage budgets, ensure financial compliance, evaluate project outcomes, and enhance community development efforts.
Top Skills: Monday.ComProject Management Tools
An Hour Ago
Remote
United States
156K-195K
Senior level
156K-195K
Senior level
Aerospace • Hardware • Information Technology • Software
Responsible for generating revenue by managing Key Accounts in business aviation, negotiating agreements, and developing new business while interacting with Gogo teams.
Top Skills: AerospaceBusiness AviationCommercial SalesFlight OperationsMroPart 135Part 23Part 25Part 91-K

What you need to know about the Colorado Tech Scene

With a business-friendly climate and research universities like CU Boulder and Colorado State, Colorado has made a name for itself as a startup ecosystem. The state boasts a skilled workforce and high quality of life thanks to its affordable housing, vibrant cultural scene and unparalleled opportunities for outdoor recreation. Colorado is also home to the National Renewable Energy Laboratory, helping cement its status as a hub for renewable energy innovation.

Key Facts About Colorado Tech

  • Number of Tech Workers: 260,000; 8.5% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Lockheed Martin, Century Link, Comcast, BAE Systems, Level 3
  • Key Industries: Software, artificial intelligence, aerospace, e-commerce, fintech, healthtech
  • Funding Landscape: $4.9 billion in VC funding in 2024 (Pitchbook)
  • Notable Investors: Access Venture Partners, Ridgeline Ventures, Techstars, Blackhorn Ventures
  • Research Centers and Universities: Colorado School of Mines, University of Colorado Boulder, University of Denver, Colorado State University, Mesa Laboratory, Space Science Institute, National Center for Atmospheric Research, National Renewable Energy Laboratory, Gottlieb Institute

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account