Sully.ai Logo

Sully.ai

Senior AI Systems Engineer (LLM Inference & Infra Optimization)

Posted Yesterday
Remote
Hiring Remotely in US
Senior level
Remote
Hiring Remotely in US
Senior level
Lead efforts in deploying and optimizing large language models on GPU hardware, optimizing inference pipelines and managing multi-cloud infrastructures.
The summary above was generated by AI
About Us

At Sully.ai, we’re building cutting-edge AI-native infrastructure to power real-time, intelligent healthcare applications. Our team operates at the intersection of high-performance computing, ML systems, and cloud infrastructure — optimizing inference pipelines to support next-generation multimodal AI agents. We're looking for a deeply technical engineer who thrives at the systems level and loves building performant, scalable infrastructure.

The Role

We’re looking for a senior-level engineer to lead efforts in deploying and optimizing large language models on high-end GPU hardware and building the infrastructure that supports them. You'll work across the stack — from C++ and CUDA kernels to Python APIs — while also shaping our DevOps practices for scalable, multi-cloud deployments. This role blends systems performance, ML inference, and infrastructure-as-code to deliver low-latency, production-grade AI services.

What You’ll Do
  • LLM Inference Optimization: Develop and optimize inference pipelines using quantization, attention caching, speculative decoding, and memory-efficient serving.

  • Systems Programming: Build and maintain low-level modules in C++/CUDA/NCCL to squeeze the most out of GPUs and high-throughput architectures.

  • DevOps & Infrastructure Engineering: Stand up and manage multi-cloud environments using modern IaC frameworks such as Pulumi or Terraform. Automate infrastructure provisioning, deployment pipelines, and GPU fleet orchestration.

  • Real-Time Architectures: Design low-latency streaming and decision-support systems leveraging embedding models, VRAM token caches, and fast interconnects.

  • Developer Enablement: Build robust tooling, interfaces, and sandbox environments so that other engineers can contribute safely to the ML systems layer.

What We’re Looking For
  • Proficiency in C++, CUDA, and Python with experience in systems or ML infrastructure engineering.

  • Deep understanding of GPU architectures, inference optimization, and large model serving techniques.

  • Hands-on experience with multi-cloud environments (GCP, AWS, etc.) and infrastructure-as-code tools such as Pulumi, Terraform, or similar.

  • Familiarity with ML deployment frameworks (TensorRT, vLLM, DeepSpeed, Hugging Face Transformers, etc.).

  • Comfortable with DevOps workflows, containerization (Docker), CI/CD, and distributed system debugging.

  • (Bonus) Experience with streaming embeddings, semantic search, or hybrid retrieval architectures.

  • (Bonus) Interest in building tools that democratize high-performance systems for broader engineering teams.

Why Join Us
  • Collaborate with a highly technical team solving hard problems at the edge of AI and healthcare.

  • Work with bleeding-edge GPU infrastructure and build systems that push what's possible.

  • Be a foundational part of shaping AI-native infrastructure for real-time, mission-critical applications.

  • Help accelerate a meaningful product that improves how clinicians work and patients are cared for.

Sully.ai is an equal opportunity employer. In addition to EEO being the law, it is a policy that is fully consistent with our principles. All qualified applicants will receive consideration for employment without regard to status as a protected veteran or a qualified individual with a disability, or other protected status such as race, religion, color, national origin, sex, sexual orientation, gender identity, genetic information, pregnancy or age. Sully.ai prohibits any form of workplace harassment. 

Top Skills

C++
Cuda
Deepspeed
Docker
Hugging Face Transformers
Pulumi
Python
Tensorrt
Terraform
Vllm

Similar Jobs

57 Minutes Ago
Easy Apply
In-Office or Remote
San Francisco, CA, USA
Easy Apply
Mid level
Mid level
Healthtech • Information Technology • Mobile • Productivity • Software • Analytics • Telehealth
The Client Success Manager at Doximity will manage hospital accounts, execute marketing campaigns, and partner with clients to achieve success. Responsibilities include reporting on campaign performance, coordinating information across projects, and collaborating with sales teams.
Top Skills: AirtableSalesforce
An Hour Ago
Easy Apply
In-Office or Remote
San Francisco, CA, USA
Easy Apply
130K-185K
Mid level
130K-185K
Mid level
Healthtech • Information Technology • Mobile • Productivity • Software • Analytics • Telehealth
The role involves overseeing the physician user experience on the Amion platform, enhancing UI, collaborating with cross-functional teams, prioritizing product opportunities, and leading the product development process.
Top Skills: SQL
An Hour Ago
Easy Apply
In-Office or Remote
2 Locations
Easy Apply
Mid level
Mid level
Healthtech • Information Technology • Mobile • Productivity • Software • Analytics • Telehealth
The Regional Vice President, Pharma is responsible for driving sales through strategic digital marketing solutions, client relationship management, and data-informed decision making to exceed sales goals.
Top Skills: Business AnalyticsGoogle Workspace

What you need to know about the Colorado Tech Scene

With a business-friendly climate and research universities like CU Boulder and Colorado State, Colorado has made a name for itself as a startup ecosystem. The state boasts a skilled workforce and high quality of life thanks to its affordable housing, vibrant cultural scene and unparalleled opportunities for outdoor recreation. Colorado is also home to the National Renewable Energy Laboratory, helping cement its status as a hub for renewable energy innovation.

Key Facts About Colorado Tech

  • Number of Tech Workers: 260,000; 8.5% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Lockheed Martin, Century Link, Comcast, BAE Systems, Level 3
  • Key Industries: Software, artificial intelligence, aerospace, e-commerce, fintech, healthtech
  • Funding Landscape: $4.9 billion in VC funding in 2024 (Pitchbook)
  • Notable Investors: Access Venture Partners, Ridgeline Ventures, Techstars, Blackhorn Ventures
  • Research Centers and Universities: Colorado School of Mines, University of Colorado Boulder, University of Denver, Colorado State University, Mesa Laboratory, Space Science Institute, National Center for Atmospheric Research, National Renewable Energy Laboratory, Gottlieb Institute

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account