Runpod
Runpod Innovation, Technology & Agility
Runpod Employee Perspectives
What project are you most excited to work on in 2025? What is particularly compelling about this work for you?
Right now, I am focused on a novel project at RunPod called MultiNode. With the rise of massive AI models, developers have turned to distributed graphics processing unit clusters for their training and fine-tuning needs. Bridging these clusters results in a high-performance network, starting at a 200 gigabits per second baseline with the potential to reach multi-terabit bandwidths, enabling GPUs to communicate and work together seamlessly.
The key challenge when developing a public cloud is isolation, as each customer requires a totally sealed environment. One workload cannot interfere with another, even on a networking level. I have been working with a small team to develop a hardware accelerated isolation engine, which enables us to create a software-defined overlay network scalable to millions of customers.
This project has bridged my personal interests in scientific computing and high-performance computing. A favorite saying of mine is that “a supercomputer is a device for turning compute-bound problems into input/output-bound problems.” This paradigm is all the more relevant in the age of AI. Bringing together my experience with general purpose computing on graphics processing units and learning how industrial-grade networking works deep under the hood has been an amazing experience.
What does the roadmap for this project look like? Who will you collaborate with, and what challenges will you need to overcome in the process?
I’m most excited about going to market and launching this at a massive scale, as we’re collaborating with the entire organization. Sales, marketing, hardware supply, leadership — everyone is involved and supportive in making this project a success. Seeing beta customers try out this feature on large, highly networked Nvidia H100 clusters and run their demanding workloads at scale is nothing short of thrilling. The biggest challenge I foresee is scaling to meet demand; these networks are complex and expensive, and ensuring both cluster quality and quantity will define our success.
What in your past projects, education or work history best prepares you to tackle this project? What do you hope to learn from this work to apply in the future?
My background spans AI, machine learning, mathematics, real-time LiDAR processing pipelines and compute unified device architecture accelerated scientific computing. At RunPod, we’re building a public cloud for over 100,000 customers and 21 datacenters across the world. This means that every engineer within the team has enormous ownership and impact. There’s incredible latitude to build the best products with the best technologies, and we’re constantly pushing what’s possible. Each engineer owns a major scope of impact, and these projects are critical to the success of the company. I personally look forward to diving deeper into the performance engineering portion of my background and applying it to RunPod’s products and vision — bridging AI and cloud infrastructure.
