Hewlett Packard Enterprise Logo

Hewlett Packard Enterprise

NCCL/RCCL SW Engineer

Posted Yesterday
Be an Early Applicant
In-Office
4 Locations
109K-207K Annually
Junior
In-Office
4 Locations
109K-207K Annually
Junior
The NCCL/RCCL SW Engineer tests and validates high-performance computing applications, executing test plans, debugging issues, optimizing performance, and documenting processes.
The summary above was generated by AI
NCCL/RCCL SW Engineer

  

This role has been designed as ‘Hybrid’ with an expectation that you will work on average 2 days per week from an HPE office.

Who We Are:

Hewlett Packard Enterprise is the global edge-to-cloud company advancing the way people live and work. We help companies connect, protect, analyze, and act on their data and applications wherever they live, from edge to cloud, so they can turn insights into outcomes at the speed required to thrive in today’s complex world. Our culture thrives on finding new and better ways to accelerate what’s next. We know varied backgrounds are valued and succeed here. We have the flexibility to manage our work and personal needs. We make bold moves, together, and are a force for good. If you are looking to stretch and grow your career our culture will embrace you. Open up opportunities with HPE.

Job Description:

   

An NCCL/RCCL engineer plays a crucial role in ensuring the quality and performance of NVIDIA Collective Communication Library (NCCL) and Radeon Collective Communication Library (RCCL) based applications and middleware, particularly in High-Performance Computing (HPC) environments. They are responsible for testing, debugging, and validating parallel programming frameworks and their implementations to meet established standards and specifications. This involves working with both the hardware and software aspects of HPC Systems (GPU-accelerated) systems, ensuring optimal functionality and efficiency for communication middleware.

Key responsibilities

  • Test plan development and execution: Designing and executing comprehensive test plans to validate MPI and SHMEM features, functionality, and performance.

  • Debugging and Root Cause Analysis: Identifying, analyzing, and resolving issues found during validation and testing, collaborating with development teams to implement corrective actions.

  • Performance Evaluation and Optimization: Evaluating and optimizing the performance of MPI and SHMEM based applications and middleware, including communication collective algorithms like AllReduce.

  • Automation and Infrastructure Development: Developing and maintaining post-silicon validation infrastructure including software, hardware, and automation environments.

  • Collaboration: Working closely with hardware teams, software developers, architects, and various stakeholders to ensure seamless integration and validation of systems.

  • Documentation: Generating and maintaining detailed documentation of validation activities, test results, and compliance reports

  • Troubleshooting: Providing technical expertise and support for troubleshooting and resolving technical issues related to MPI and SHMEM.

  • Staying updated with technology: Maintaining knowledge of validation trends, industry standards, and new technologies in high-performance computing, parallel programming, and communication middleware. 

This position will support government accounts.  Therefore, due to federal export-control regulations, the selected candidate must hold U.S. citizenship

Required skills

  • Parallel Programming and Communication: Strong understanding of parallel programming models, development, validation and performance analysis of GPU communication libraries (NCCL/RCCL) in distributed AI/HPC systems.

  • HPC Architectures: Knowledge of high-performance memory subsystems, SoC/ASIC memory architecture, high-speed I/O interfaces, and their interaction with parallel programming models.

  • Programming and Scripting: Proficiency in programming languages like C/C++, Python, and potentially others like Perl, for developing validation tests, scripts, and tools.

  • Validation Methodologies: Experience with various validation methodologies, including formal analysis and runtime instrumentation, for detecting MPI bugs and ensuring correctness.

  • Debugging Tools and Techniques: Expertise in utilizing debugging tools, methodologies, and techniques for identifying and resolving hardware and software issues at various levels.

  • Test Automation: Experience with test automation frameworks and methodologies for developing and maintaining automated regression tests and scripts.

  • Analytical and Problem-Solving Skills: Excellent analytical and problem-solving abilities to dissect complex systems, identify issues, and propose innovative solutions.

  • Communication and Collaboration: Strong communication and interpersonal skills for effective collaboration with cross-functional teams and stakeholders.

  • Attention to Detail: Meticulous attention to detail to catch discrepancies and ensure thorough validation of systems and processes. 

Education and experience

  • Bachelor's or Master's degree in Computer Science, Computer Engineering, Electrical Engineering, or a related field is required.

  • 2+ years of experience in SoC/ASIC validation and debug, particularly within the context of high-performance computing and parallel programming, is highly beneficial. 

Additional Skills:

Cloud Architectures, Cross Domain Knowledge, Design Thinking, Development Fundamentals, DevOps, Distributed Computing, Microservices Fluency, Full Stack Development, Security-First Mindset, Solutions Design, Testing & Automation, User Experience (UX)

What We Can Offer You:

Health & Wellbeing

We strive to provide our team members and their loved ones with a comprehensive suite of benefits that supports their physical, financial and emotional wellbeing.

Personal & Professional Development

We also invest in your career because the better you are, the better we all are. We have specific programs catered to helping you reach any career goals you have — whether you want to become a knowledge expert in your field or apply your skills to another division.

Unconditional Inclusion

We are unconditionally inclusive in the way we work and celebrate individual uniqueness. We know varied backgrounds are valued and succeed here. We have the flexibility to manage our work and personal needs. We make bold moves, together, and are a force for good.

Let's Stay Connected:

Follow @HPECareers on Instagram to see the latest on people, culture and tech at HPE.

#unitedstates#storage

Job:

Engineering

Job Level:

TCP_03

    

States with Pay Range Requirement

The expected salary/wage range for a U.S.-based hire filling this position is provided below. Actual offer may vary from this range based upon geographic location, work experience, education/training, and/or skill level. If this is a sales role, then the listed salary range reflects combined base salary and target-level sales compensation pay. If this is a non-sales role, then the listed salary range reflects base salary only. Variable incentives may also be offered. Information about employee benefits offered can be found at https://myhperewards.com/main/new-hire-enrollment.html.

USD Annual Salary: $108,500.00 - $206,500.00

The estimated job application period closure is November 29 2025; this timeline is provided for transparency and internal planning purposes.

HPE is an Equal Employment Opportunity/ Veterans/Disabled/LGBT employer. We do not discriminate on the basis of race, gender, or any other protected category, and all decisions we make are made on the basis of qualifications, merit, and business need. Our goal is to be one global team that is representative of our customers, in an inclusive environment where we can continue to innovate and grow together. Please click here: Equal Employment Opportunity.

Hewlett Packard Enterprise is EEO Protected Veteran/ Individual with Disabilities.

   

HPE will comply with all applicable laws related to employer use of arrest and conviction records, including laws requiring employers to consider for employment qualified applicants with criminal histories.

Top Skills

C/C++
Mpi
Nvidia Collective Communication Library
Python
Radeon Collective Communication Library
Shmem

Similar Jobs

9 Minutes Ago
In-Office
Amarillo, TX, USA
80K-220K Annually
Entry level
80K-220K Annually
Entry level
Other • Professional Services • Retail
As an Outside Sales Representative, you'll travel for pre-scheduled home appointments, conduct product demonstrations, and provide quotes to homeowners, leveraging a structured sales process.
9 Minutes Ago
In-Office
San Marcos, TX, USA
50K-100K Annually
Entry level
50K-100K Annually
Entry level
Other • Professional Services • Retail
The Territory Sales Representative engages with potential customers door to door, educating them on Leaf Home products and generating sales leads.
9 Minutes Ago
In-Office
Round Rock, TX, USA
50K-100K Annually
Entry level
50K-100K Annually
Entry level
Other • Professional Services • Retail
The Territory Sales Representative will generate leads, educate customers about Leaf Home products, report results, and collaborate with sales teams.

What you need to know about the Colorado Tech Scene

With a business-friendly climate and research universities like CU Boulder and Colorado State, Colorado has made a name for itself as a startup ecosystem. The state boasts a skilled workforce and high quality of life thanks to its affordable housing, vibrant cultural scene and unparalleled opportunities for outdoor recreation. Colorado is also home to the National Renewable Energy Laboratory, helping cement its status as a hub for renewable energy innovation.

Key Facts About Colorado Tech

  • Number of Tech Workers: 260,000; 8.5% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Lockheed Martin, Century Link, Comcast, BAE Systems, Level 3
  • Key Industries: Software, artificial intelligence, aerospace, e-commerce, fintech, healthtech
  • Funding Landscape: $4.9 billion in VC funding in 2024 (Pitchbook)
  • Notable Investors: Access Venture Partners, Ridgeline Ventures, Techstars, Blackhorn Ventures
  • Research Centers and Universities: Colorado School of Mines, University of Colorado Boulder, University of Denver, Colorado State University, Mesa Laboratory, Space Science Institute, National Center for Atmospheric Research, National Renewable Energy Laboratory, Gottlieb Institute

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account