Featherless AI Logo

Featherless AI

Machine Learning Engineer — Inference Optimization

Reposted 23 Days Ago
In-Office or Remote
Hiring Remotely in World Golf Village, FL
Mid level
In-Office or Remote
Hiring Remotely in World Golf Village, FL
Mid level
Optimize inference latency and throughput for large-scale ML models, collaborating on performance tuning, and building inference-serving systems.
The summary above was generated by AI
About the Role

We’re looking for a Machine Learning Engineer to own and push the limits of model inference performance at scale. You’ll work at the intersection of research and production—turning cutting-edge models into fast, reliable, and cost-efficient systems that serve real users.

This role is ideal for someone who enjoys deep technical work, profiling systems down to the kernel/GPU level, and translating research ideas into production-grade performance gains.

What You’ll Do
  • Optimize inference latency, throughput, and cost for large-scale ML models in production

  • Profile and bottleneck GPU/CPU inference pipelines (memory, kernels, batching, IO)

  • Implement and tune techniques such as:

    • Quantization (fp16, bf16, int8, fp8)

    • KV-cache optimization & reuse

    • Speculative decoding, batching, and streaming

    • Model pruning or architectural simplifications for inference

  • Collaborate with research engineers to productionize new model architectures

  • Build and maintain inference-serving systems (e.g. Triton, custom runtimes, or bespoke stacks)

  • Benchmark performance across hardware (NVIDIA / AMD GPUs, CPUs) and cloud setups

  • Improve system reliability, observability, and cost efficiency under real workloads

What We’re Looking For
  • Strong experience in ML inference optimization or high-performance ML systems

  • Solid understanding of deep learning internals (attention, memory layout, compute graphs)

  • Hands-on experience with PyTorch (or similar) and model deployment

  • Familiarity with GPU performance tuning (CUDA, ROCm, Triton, or kernel-level optimizations)

  • Experience scaling inference for real users (not just research benchmarks)

  • Comfortable working in fast-moving startup environments with ownership and ambiguity

Nice to Have
  • Experience with LLM or long-context model inference

  • Knowledge of inference frameworks (TensorRT, ONNX Runtime, vLLM, Triton)

  • Experience optimizing across different hardware vendors

  • Open-source contributions in ML systems or inference tooling

  • Background in distributed systems or low-latency services

Why Join Us
  • Real ownership over performance-critical systems

  • Direct impact on product reliability and unit economics

  • Close collaboration with research, infra, and product

  • Competitive compensation + meaningful equity at Series A

  • A team that cares about engineering quality, not hype

Top Skills

Cuda
Ml Inference Optimization
Onnx Runtime
PyTorch
Tensorrt
Triton

Similar Jobs

33 Minutes Ago
Easy Apply
Remote
USA
Easy Apply
244K-287K Annually
Senior level
244K-287K Annually
Senior level
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
The Senior Manager, Adversary Management leads cyber threat intelligence strategy, overseeing operational processes, team management, and ensuring intelligence support for security operations at Coinbase.
Top Skills: AIBlockchainThreat IntelligenceThreat Research TechnologiesWeb Technologies
33 Minutes Ago
Easy Apply
Remote
USA
Easy Apply
145K-170K Annually
Mid level
145K-170K Annually
Mid level
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
The Data Protection Engineer will implement and maintain data protection capabilities, ensuring security against threats while balancing speed in a decentralized tech environment. Responsibilities include expanding data loss prevention measures, collaborating across teams, and automating processes.
Top Skills: Agentic AiData Loss PreventionLlmsSecurity Information Event ManagementUser Behavioral Analytics
44 Minutes Ago
Remote
United States
Junior
Junior
Information Technology • Productivity • Professional Services • Software • Business Intelligence
The Business Development Representative will drive new customer acquisition, conduct outbound prospecting, and collaborate with sales teams to create pipeline growth.
Top Skills: Gong EngageLinkedInSalesforce

What you need to know about the Colorado Tech Scene

With a business-friendly climate and research universities like CU Boulder and Colorado State, Colorado has made a name for itself as a startup ecosystem. The state boasts a skilled workforce and high quality of life thanks to its affordable housing, vibrant cultural scene and unparalleled opportunities for outdoor recreation. Colorado is also home to the National Renewable Energy Laboratory, helping cement its status as a hub for renewable energy innovation.

Key Facts About Colorado Tech

  • Number of Tech Workers: 260,000; 8.5% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Lockheed Martin, Century Link, Comcast, BAE Systems, Level 3
  • Key Industries: Software, artificial intelligence, aerospace, e-commerce, fintech, healthtech
  • Funding Landscape: $4.9 billion in VC funding in 2024 (Pitchbook)
  • Notable Investors: Access Venture Partners, Ridgeline Ventures, Techstars, Blackhorn Ventures
  • Research Centers and Universities: Colorado School of Mines, University of Colorado Boulder, University of Denver, Colorado State University, Mesa Laboratory, Space Science Institute, National Center for Atmospheric Research, National Renewable Energy Laboratory, Gottlieb Institute

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account