Featherless AI Logo

Featherless AI

Machine Learning Engineer — Inference Optimization

Posted 15 Hours Ago
Be an Early Applicant
In-Office or Remote
Hiring Remotely in World Golf Village, FL
Mid level
In-Office or Remote
Hiring Remotely in World Golf Village, FL
Mid level
Optimize inference latency and throughput for large-scale ML models, collaborating on performance tuning, and building inference-serving systems.
The summary above was generated by AI
About the Role

We’re looking for a Machine Learning Engineer to own and push the limits of model inference performance at scale. You’ll work at the intersection of research and production—turning cutting-edge models into fast, reliable, and cost-efficient systems that serve real users.

This role is ideal for someone who enjoys deep technical work, profiling systems down to the kernel/GPU level, and translating research ideas into production-grade performance gains.

What You’ll Do
  • Optimize inference latency, throughput, and cost for large-scale ML models in production

  • Profile and bottleneck GPU/CPU inference pipelines (memory, kernels, batching, IO)

  • Implement and tune techniques such as:

    • Quantization (fp16, bf16, int8, fp8)

    • KV-cache optimization & reuse

    • Speculative decoding, batching, and streaming

    • Model pruning or architectural simplifications for inference

  • Collaborate with research engineers to productionize new model architectures

  • Build and maintain inference-serving systems (e.g. Triton, custom runtimes, or bespoke stacks)

  • Benchmark performance across hardware (NVIDIA / AMD GPUs, CPUs) and cloud setups

  • Improve system reliability, observability, and cost efficiency under real workloads

What We’re Looking For
  • Strong experience in ML inference optimization or high-performance ML systems

  • Solid understanding of deep learning internals (attention, memory layout, compute graphs)

  • Hands-on experience with PyTorch (or similar) and model deployment

  • Familiarity with GPU performance tuning (CUDA, ROCm, Triton, or kernel-level optimizations)

  • Experience scaling inference for real users (not just research benchmarks)

  • Comfortable working in fast-moving startup environments with ownership and ambiguity

Nice to Have
  • Experience with LLM or long-context model inference

  • Knowledge of inference frameworks (TensorRT, ONNX Runtime, vLLM, Triton)

  • Experience optimizing across different hardware vendors

  • Open-source contributions in ML systems or inference tooling

  • Background in distributed systems or low-latency services

Why Join Us
  • Real ownership over performance-critical systems

  • Direct impact on product reliability and unit economics

  • Close collaboration with research, infra, and product

  • Competitive compensation + meaningful equity at Series A

  • A team that cares about engineering quality, not hype

Top Skills

Cuda
Ml Inference Optimization
Onnx Runtime
PyTorch
Tensorrt
Triton

Similar Jobs

3 Minutes Ago
Easy Apply
Remote
United States
Easy Apply
164K-226K Annually
Senior level
164K-226K Annually
Senior level
Artificial Intelligence • Fintech • Machine Learning • Social Impact • Software
The Senior Analytics Engineer will design scalable data models, collaborate with teams to create analytics pipelines, and implement data governance best practices.
Top Skills: AirflowAWSAzureBigQueryDatabricksDbtETLGCPLookerLookmlModePower BIPythonRedshiftSQLTableau
5 Minutes Ago
Remote or Hybrid
United States
106K-178K Annually
Senior level
106K-178K Annually
Senior level
Artificial Intelligence • Cloud • Sales • Security • Software • Cybersecurity • Data Privacy
Solution Architects lead customer projects, gathering use cases, educating on product architecture, delivering training, and providing technical solutions.
Top Skills: AdAngularAWSAzureCassandraCSSGCPHTMLJavaScriptLdapMongoDBMssqlMySQLNode.jsOracleReactRedisSoapSpmlSpring BootSpringmvcSybaseTypescriptXML
6 Minutes Ago
Remote or Hybrid
United States
150K-210K Annually
Mid level
150K-210K Annually
Mid level
Big Data • Cloud • Productivity • Software • Database • Analytics • Automation
The Lead Product Data Analyst will drive data-informed decision-making across teams, build data products, and collaborate with stakeholders while leveraging analytics tools like Amplitude and AI technologies.
Top Skills: AmplitudeDatabricksDbtNumpyPandasPythonRedshiftSklearnSnowflakeSQL

What you need to know about the Colorado Tech Scene

With a business-friendly climate and research universities like CU Boulder and Colorado State, Colorado has made a name for itself as a startup ecosystem. The state boasts a skilled workforce and high quality of life thanks to its affordable housing, vibrant cultural scene and unparalleled opportunities for outdoor recreation. Colorado is also home to the National Renewable Energy Laboratory, helping cement its status as a hub for renewable energy innovation.

Key Facts About Colorado Tech

  • Number of Tech Workers: 260,000; 8.5% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Lockheed Martin, Century Link, Comcast, BAE Systems, Level 3
  • Key Industries: Software, artificial intelligence, aerospace, e-commerce, fintech, healthtech
  • Funding Landscape: $4.9 billion in VC funding in 2024 (Pitchbook)
  • Notable Investors: Access Venture Partners, Ridgeline Ventures, Techstars, Blackhorn Ventures
  • Research Centers and Universities: Colorado School of Mines, University of Colorado Boulder, University of Denver, Colorado State University, Mesa Laboratory, Space Science Institute, National Center for Atmospheric Research, National Renewable Energy Laboratory, Gottlieb Institute

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account