TRM Labs Logo

TRM Labs

Machine Learning Infrastructure Engineer

Posted Yesterday
Be an Early Applicant
Easy Apply
Remote
Hiring Remotely in United States
Senior level
Easy Apply
Remote
Hiring Remotely in United States
Senior level
Design, operate, and optimize GPU-backed ML/LLM inference infrastructure in cloud environments. Build scalable GPU clusters, implement high-throughput and distributed inference, integrate optimization stacks, schedule heterogeneous workloads, and add observability to improve performance and reliability.
The summary above was generated by AI
Build a Safer World. 

TRM Labs provides blockchain analytics and AI solutions to help law enforcement and national security agencies, financial institutions, and cryptocurrency businesses detect, investigate, and disrupt crypto-related fraud and financial crime. TRM’s blockchain intelligence and AI platforms include solutions to trace the source and destination of funds, identify illicit activity, build cases, and construct an operating picture of threats. TRM is trusted by leading agencies and businesses worldwide who rely on TRM to enable a safer, more secure world for all.

TRM Labs provides blockchain analytics and AI solutions to help law enforcement and national security agencies, financial institutions, and cryptocurrency businesses detect, investigate, and disrupt crypto-related fraud and financial crime. TRM’s blockchain intelligence and AI platforms include solutions to trace the source and destination of funds, identify illicit activity, build cases, and construct an operating picture of threats. TRM is trusted by leading agencies and businesses worldwide who rely on TRM to enable a safer, more secure world for all.

At TRM, we’re on a mission to build a safer financial system for billions of people around the world. Our next-generation platform, which combines threat intelligence with machine learning, enables financial institutions and governments to detect cryptocurrency fraud and financial crime at an unprecedented scale.

As a Senior Software Engineer, ML Infrastructure at TRM Labs, you will collaborate with data scientists, engineers, and product managers to design and operate scalable GPU-backed infrastructure that powers TRM’s AI systems. You will work at the intersection of distributed systems, cloud infrastructure, GPU performance engineering, and applied machine learning — building the foundation that enables high-throughput, production-grade ML workloads.

The impact you’ll have here:
  • Design and operate GPU cluster infrastructure.

    Build and manage GPU-backed environments in cloud settings, including orchestration, autoscaling, resource isolation, and workload management across multiple concurrent models and users.

  • Optimize high-throughput inference.

    Implement and tune serving systems that maximize token throughput, batching efficiency, GPU occupancy, and cost effectiveness across interactive and batch workloads.

  • Enable distributed inference strategies.

    Support and operationalize model parallelism, tensor parallelism, and other distributed serving patterns for large-scale models.

  • Implement model optimization and compilation workflows.

    Integrate and optimize acceleration stacks such as TensorRT, ONNX Runtime, vLLM, FlashAttention, and related tooling to improve performance and reduce inference cost.

  • Schedule heterogeneous workloads.

    Design systems that manage multiple models, multiple users, and mixed workload types across heterogeneous accelerators (e.g., NVIDIA GPUs, Inferentia), ensuring predictable performance under varying demand.

  • Build observability into ML infrastructure.

    Instrument systems to measure GPU load, memory utilization, batching efficiency, queue depth, and token throughput, and use data to continuously improve performance and reliability.

  • Partner across engineering teams.

    Work closely with infrastructure, ML, and product teams to ensure models transition smoothly from experimentation to production-grade, highly available services.

What we’re looking for:
  • Bachelor’s degree (or equivalent) in Computer Science or related field.
  • 5+ years of experience building and operating distributed systems or infrastructure in production environments.
  • Experience deploying and operating ML/LLM inference workloads on GPU clusters in cloud environments (AWS and/or GCP).
  • Deep understanding of high-throughput inference systems, including batching strategies, token throughput optimization, and the trade-offs between latency, throughput, and cost.
  • Experience with one or more ML serving frameworks such as Triton Inference Server, vLLM, Ray Serve, ONNX Runtime, or HuggingFace Optimum.
  • Experience optimizing GPU load, memory efficiency, and performance bottlenecks in production systems.
  • Familiarity with distributed inference strategies including model parallelism and tensor parallelism.
  • Experience working with Kubernetes or equivalent orchestration systems in cloud environments.
  • Familiarity with heterogeneous accelerators (e.g., Inferentia) is a plus.
  • CUDA familiarity and experience debugging GPU-related issues is a plus.
  • Adaptable. Goals can change fast. You anticipate and react quickly.
  • Autonomous. You own what you work on. You move fast and get things done.
  • Excellent communication. You communicate complex ideas effectively to both technical and non-technical audiences, verbally and in writing.
  • Collaborative. You work effectively in a cross-functional team and with people at all levels in an organization.
Life at TRM

We are building a safer world. That promise shows up in how we work every day.

TRM runs fast. Really fast. We’re a high‑velocity, high‑ownership team that expects clarity, follow‑through, and impact. People who thrive here are energized by hard problems, experimentation, and direct feedback. If something takes months elsewhere, it often ships here in days. 

That pace isn’t for everyone. If you are optimizing primarily for consistent work-life balance, use the interview process to pressure-test fit. We want teammates who thrive here, not just survive here.

AI Fluency at TRM

AI fluency is a baseline expectation at TRM.

We believe AI meaningfully changes how top performers operate. We expect every team member to use AI to accelerate and reimagine their craft, not just automate surface tasks.

At TRM, AI fluency means you are among the top 10 percent of operators in your function in how you apply AI to:

  • Accelerate repeatable workflows
  • Structure and solve problems
  • Improve output quality
  • Increase speed and leverage

You will be evaluated on applied AI fluency during the interview process.

Leadership Principles

We hire and grow against three leadership principles. They’re the standards for how we operate, treat each other, and make decisions.

  • Impact-Oriented Trailblazer: We put customers first and move with speed, focus, and adaptability. We treat every plan like an experiment – test, ship, measure, and iterate quickly.
  • Master Craftsperson: We care deeply about our craft. We balance speed with high standards, own outcomes end‑to‑end, and invest in getting better everyday.
  • Inspiring Colleague: We add clarity and energy, not noise. We bring humility, candor, and a one‑team mindset — giving and receiving feedback to make the team stronger.

Learn more: Interviewing at TRM: How We Hire and What Success Looks Like

The impact you will have

This work has real stakes. Depending on your role at TRM, your week might look like:

  • Driving critical investigations that can’t wait for typical business hours.
  • Shipping products in days when others would schedule quarters.
  • Partnering with teams across time zones to deliver insights while the story is still unfolding.
  • Building new solutions from first principles when the playbook doesn’t yet exist.
  • Protecting victims and customers by tracing illicit activity and disrupting criminal networks.
Join our Mission

At TRM we care deeply about our craft. We are looking for individuals who want their work to matter, who experiment with speed and rigor, and who take pride in building a safer world for billions of people. If you’re excited by TRM’s mission but don’t check every box, we encourage you to apply — we hire for slope, judgment, and the will to learn fast.

TRM is a Series C company with $220M in total funding, backed by Blockchain Capital, Goldman Sachs, Bessemer, Y Combinator, Thoma Bravo, and others. Headquartered in San Francisco, TRM operates as a distributed-first company with hubs in Los Angeles, San Francisco, New York, Washington D.C., London, and Singapore.


Recruitment agencies

TRM Labs does not accept unsolicited agency resumes. Please do not forward resumes to TRM employees. TRM Labs is not responsible for any fees related to unsolicited resumes and will not pay fees to any third-party agency or company without a signed agreement.

Privacy Policy

By submitting your application, you are agreeing to allow TRM to process your personal information in accordance with the TRM Privacy Policy

Learn More: Company Values | Interviewing | FAQs

Top Skills

Aws,Gcp,Kubernetes,Triton Inference Server,Vllm,Ray Serve,Onnx Runtime,Huggingface Optimum,Tensorrt,Flashattention,Cuda,Inferentia,Nvidia Gpus

Similar Jobs

13 Days Ago
In-Office or Remote
7 Locations
110K-300K Annually
Senior level
110K-300K Annually
Senior level
Financial Services
Design and scale the infrastructure for ML systems, enabling reliable trading system production, optimizing performance and ensuring correctness across complex environments.
Top Skills: Cloud-Native EnvironmentsData Processing PipelinesDistributed SystemsMachine LearningPython
19 Days Ago
Easy Apply
In-Office or Remote
Los Angeles, CA, USA
Easy Apply
150K-240K Annually
Senior level
150K-240K Annually
Senior level
Transportation
Lead the design and development of scalable ML infrastructure for autonomous vehicles, enabling efficient model training and deployment. Collaborate across teams to develop and manage automated pipelines and cloud-based systems while focusing on continuous integration and deployment.
Top Skills: AirflowAWSAzureGCPGitKubeflowMetaflowMlflowPythonSagemaker
3 Days Ago
Remote
USA
160K-220K Annually
Senior level
160K-220K Annually
Senior level
Artificial Intelligence • Machine Learning • Natural Language Processing • Software • Conversational AI
Lead the architecture and management of AI/ML infrastructure using Kubernetes and Terraform, optimizing hybrid environments for performance and scalability.
Top Skills: AWSBashGoKubernetesPythonSlurmTerraform

What you need to know about the Colorado Tech Scene

With a business-friendly climate and research universities like CU Boulder and Colorado State, Colorado has made a name for itself as a startup ecosystem. The state boasts a skilled workforce and high quality of life thanks to its affordable housing, vibrant cultural scene and unparalleled opportunities for outdoor recreation. Colorado is also home to the National Renewable Energy Laboratory, helping cement its status as a hub for renewable energy innovation.

Key Facts About Colorado Tech

  • Number of Tech Workers: 260,000; 8.5% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Lockheed Martin, Century Link, Comcast, BAE Systems, Level 3
  • Key Industries: Software, artificial intelligence, aerospace, e-commerce, fintech, healthtech
  • Funding Landscape: $4.9 billion in VC funding in 2024 (Pitchbook)
  • Notable Investors: Access Venture Partners, Ridgeline Ventures, Techstars, Blackhorn Ventures
  • Research Centers and Universities: Colorado School of Mines, University of Colorado Boulder, University of Denver, Colorado State University, Mesa Laboratory, Space Science Institute, National Center for Atmospheric Research, National Renewable Energy Laboratory, Gottlieb Institute

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account