Featherless AI

Machine Learning Engineer — Inference Optimization

Reposted 23 Days Ago

In-Office or Remote

Hiring Remotely in World Golf Village, FL

Mid level

In-Office or Remote

Hiring Remotely in World Golf Village, FL

Mid level

Optimize inference latency and throughput for large-scale ML models, collaborating on performance tuning, and building inference-serving systems.

The summary above was generated by AI

About the Role

We’re looking for a Machine Learning Engineer to own and push the limits of model inference performance at scale. You’ll work at the intersection of research and production—turning cutting-edge models into fast, reliable, and cost-efficient systems that serve real users.

This role is ideal for someone who enjoys deep technical work, profiling systems down to the kernel/GPU level, and translating research ideas into production-grade performance gains.

What You’ll Do

Optimize inference latency, throughput, and cost for large-scale ML models in production
Profile and bottleneck GPU/CPU inference pipelines (memory, kernels, batching, IO)
Implement and tune techniques such as:
- Quantization (fp16, bf16, int8, fp8)
- KV-cache optimization & reuse
- Speculative decoding, batching, and streaming
- Model pruning or architectural simplifications for inference
Collaborate with research engineers to productionize new model architectures
Build and maintain inference-serving systems (e.g. Triton, custom runtimes, or bespoke stacks)
Benchmark performance across hardware (NVIDIA / AMD GPUs, CPUs) and cloud setups
Improve system reliability, observability, and cost efficiency under real workloads

What We’re Looking For

Strong experience in ML inference optimization or high-performance ML systems
Solid understanding of deep learning internals (attention, memory layout, compute graphs)
Hands-on experience with PyTorch (or similar) and model deployment
Familiarity with GPU performance tuning (CUDA, ROCm, Triton, or kernel-level optimizations)
Experience scaling inference for real users (not just research benchmarks)
Comfortable working in fast-moving startup environments with ownership and ambiguity

Nice to Have

Experience with LLM or long-context model inference
Knowledge of inference frameworks (TensorRT, ONNX Runtime, vLLM, Triton)
Experience optimizing across different hardware vendors
Open-source contributions in ML systems or inference tooling
Background in distributed systems or low-latency services

Why Join Us

Real ownership over performance-critical systems
Direct impact on product reliability and unit economics
Close collaboration with research, infra, and product
Competitive compensation + meaningful equity at Series A
A team that cares about engineering quality, not hype

Top Skills

Cuda

Ml Inference Optimization

Onnx Runtime

PyTorch

Tensorrt

Triton

Similar Jobs

Coinbase

Senior Manager, Adversary Management

33 Minutes Ago

Easy Apply

Remote

USA

Easy Apply

244K-287K Annually

Senior level

244K-287K Annually

Senior level

Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3

The Senior Manager, Adversary Management leads cyber threat intelligence strategy, overseeing operational processes, team management, and ensuring intelligence support for security operations at Coinbase.

Top Skills: AIBlockchainThreat IntelligenceThreat Research TechnologiesWeb Technologies

Coinbase

Data Protection Engineer

33 Minutes Ago

Easy Apply

Remote

USA

Easy Apply

145K-170K Annually

Mid level

145K-170K Annually

Mid level

Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3

The Data Protection Engineer will implement and maintain data protection capabilities, ensuring security against threats while balancing speed in a decentralized tech environment. Responsibilities include expanding data loss prevention measures, collaborating across teams, and automating processes.

Top Skills: Agentic AiData Loss PreventionLlmsSecurity Information Event ManagementUser Behavioral Analytics

M-Files

Business Development Representative

44 Minutes Ago

Remote

United States

Junior

Information Technology • Productivity • Professional Services • Software • Business Intelligence

The Business Development Representative will drive new customer acquisition, conduct outbound prospecting, and collaborate with sales teams to create pipeline growth.

Top Skills: Gong EngageLinkedInSalesforce

What you need to know about the Colorado Tech Scene

With a business-friendly climate and research universities like CU Boulder and Colorado State, Colorado has made a name for itself as a startup ecosystem. The state boasts a skilled workforce and high quality of life thanks to its affordable housing, vibrant cultural scene and unparalleled opportunities for outdoor recreation. Colorado is also home to the National Renewable Energy Laboratory, helping cement its status as a hub for renewable energy innovation.

Key Facts About Colorado Tech

Number of Tech Workers: 260,000; 8.5% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Lockheed Martin, Century Link, Comcast, BAE Systems, Level 3
Key Industries: Software, artificial intelligence, aerospace, e-commerce, fintech, healthtech
Funding Landscape: $4.9 billion in VC funding in 2024 (Pitchbook)
Notable Investors: Access Venture Partners, Ridgeline Ventures, Techstars, Blackhorn Ventures
Research Centers and Universities: Colorado School of Mines, University of Colorado Boulder, University of Denver, Colorado State University, Mesa Laboratory, Space Science Institute, National Center for Atmospheric Research, National Renewable Energy Laboratory, Gottlieb Institute