Voltage Park Logo

Voltage Park

Infrastructure Operations Engineer

Posted Yesterday
Remote
Hiring Remotely in USA
140K-200K Annually
Senior level
Remote
Hiring Remotely in USA
140K-200K Annually
Senior level
The Infrastructure Operations Engineer is responsible for ensuring the stability and performance of AI compute infrastructure, collaborating with various teams, and deploying system updates while participating in an on-call rotation.
The summary above was generated by AI

Voltage Park is your enterprise AI factory. We offer scalable compute power, on-demand and reserved bare metal AI infrastructure using NVIDIA GPUs, with world-class service, performance and value. Founded with the mission of making accessible AI computing for all – our flexible, affordable GPU solutions power everyone from builders to enterprises.

We are seeking a highly skilled and proactive Infrastructure Operations Engineer to be part of our 24/7 Infrastructure Operations team responsible for the stability, scalability, and performance of compute, storage, and platform infrastructure. This role plays a key part in delivering always-on, high-performance environments that support AI/ML training, inference, and HPC workloads at scale. The ideal candidate combines technical depth with strong interpersonal skills and a passion for operational excellence. 

This position offers full remote flexibility, although candidates must be based in the continental US and available to work during PST hours. Unfortunately, we are unable to provide sponsorship for this role.

Responsibilities

  • At the direction of the Manager of Infrastructure Operations, design, build, and roll out new platforms and patterns to minimize incidents and enable customer facing and internal features.

  • Deploy updates and improvements to support both Voltage Park’s internal and end customer use cases.

  • Collaborate with colleagues in Infrastructure Engineering, Network Operations, Customer Success and Software and Platform Development Teams.

  • Participate in the on-call rotation which is evenly distributed across all team members in a primary / secondary pattern where you are primary then move to a secondary position.

Qualifications

  • 8+ years working with Linux as a server / hosting platform, extra points for Ubuntu experience.

  • 5+ years experience with AWS.

  • 2+ years experience with Kubernetes and strong container fundamentals.

  • 2+ years experience with Terraform and Ansible

  • 2+ years with network attached storage management (via NFS, ceph, or other protocols). Extra points for experience with VAST storage systems.

  • Experience working in a Slack-first, asynchronous remote work environment.

  • Experience with monitoring systems (Prometheus, ELK stack).

  • Familiarity with the gitops workflow. 

  • Software development experience using Python, Go, bash,  or other languages for the purposes of automation & connecting systems & APIs together.

  • Deep networking fundamentals, extra points for experience with datacenter level networks, 400Gb ethernet, and Infiniband.

  • Experience building and delivering complex systems.

  • Effective at navigating tradeoffs between design, risk, cost, and outcomes.

  • Comfortable with navigating ambiguity.

  • Strong written and oral communication.

Ideal Experiences

  • Experience with bare metal hardware troubleshooting and provisioning, extra points for working with Dell hardware.

  • Experience with GPU servers, both in bare metal form or under virtualization.

  • Deep experience with network switches, routers, and firewalls, particularly SONiC switches, Palo Alto firewalls and Juniper Networks as vendors.

  • Experience with VAST storage systems

Culture

  • You enjoy working with a small group of friendly, highly motivated, execution focused colleagues.

  • You’re comfortable with a high degree of autonomy. We expect you to independently prioritize your work and understand how it maps to the overall needs and goals of the company.

  • You’re knowledgeable in your domain but also enjoy wearing multiple hats and venturing outside of your comfort zone when the need arises.

  • You value the ability to write well and understand the importance of good documentation.

Voltage Park is an equal opportunity employer and makes employment decisions on the basis of merit. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, protected veteran status, or any other characteristic under federal, state, or local law. If you require an accommodation during the job application process, please notify your recruiter. 

Compensation Range: $140K - $200K


#BI-Remote

Top Skills

Ansible
AWS
Bash
Ceph
Elk Stack
Go
Gpu
Kubernetes
Linux
Nfs
Prometheus
Python
Terraform
Vast Storage

Similar Jobs at Voltage Park

Yesterday
Remote
USA
150K-180K Annually
Senior level
150K-180K Annually
Senior level
Artificial Intelligence • Cloud • Hardware • Machine Learning • Other • Software • Infrastructure as a Service (IaaS)
The Storage Engineer will manage and optimize a customer-facing multi-petabyte VAST storage system, including performance tuning, troubleshooting, and collaboration with teams.
Top Skills: AnsibleHpc Storage SystemsLinuxNfsTerraformVast Storage Systems
4 Days Ago
Remote
2 Locations
145K-185K Annually
Mid level
145K-185K Annually
Mid level
Artificial Intelligence • Cloud • Hardware • Machine Learning • Other • Software • Infrastructure as a Service (IaaS)
As a Solutions Engineer at Voltage Park, you will be the technical expert supporting the sales team by addressing complex customer inquiries, designing tailored GPU cloud solutions, and collaborating with account executives to develop comprehensive proposals. Your role involves educating clients on GPUaaS offerings, configuring environments, and maintaining current technical knowledge in the AI and GPU technology space.
12 Days Ago
Remote
2 Locations
120K-180K Annually
Senior level
120K-180K Annually
Senior level
Artificial Intelligence • Cloud • Hardware • Machine Learning • Other • Software • Infrastructure as a Service (IaaS)
As a Platform Engineer, you'll maintain platforms, develop automation software, and ensure system reliability, leveraging strong Linux administration and scripting skills.
Top Skills: AnsibleBashCephDebianDockerElk StackGrafanaKubernetesLibvirtLinuxMaasNfsPostgresPrometheusPythonReactRedisTailwindTerraformUbuntu

What you need to know about the Colorado Tech Scene

With a business-friendly climate and research universities like CU Boulder and Colorado State, Colorado has made a name for itself as a startup ecosystem. The state boasts a skilled workforce and high quality of life thanks to its affordable housing, vibrant cultural scene and unparalleled opportunities for outdoor recreation. Colorado is also home to the National Renewable Energy Laboratory, helping cement its status as a hub for renewable energy innovation.

Key Facts About Colorado Tech

  • Number of Tech Workers: 260,000; 8.5% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Lockheed Martin, Century Link, Comcast, BAE Systems, Level 3
  • Key Industries: Software, artificial intelligence, aerospace, e-commerce, fintech, healthtech
  • Funding Landscape: $4.9 billion in VC funding in 2024 (Pitchbook)
  • Notable Investors: Access Venture Partners, Ridgeline Ventures, Techstars, Blackhorn Ventures
  • Research Centers and Universities: Colorado School of Mines, University of Colorado Boulder, University of Denver, Colorado State University, Mesa Laboratory, Space Science Institute, National Center for Atmospheric Research, National Renewable Energy Laboratory, Gottlieb Institute

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account