Optimizing HPC For Enhanced
Performance & Experience
A leading Data Science and Modeling platform empowers Explorers and Builders to create cutting-edge models. We upgraded their HPC and Extended Compute bare metal services to unlock next-level capabilities.
The Problem
Post-upgrade, critical quantum chemistry software experienced severe performance issues on HPC Workers due to throttling effects.
Additionally, the Compute service posed confusion for admins during app installations, requiring unnecessary user switches that slowed down operations.
The Solution
- Selected appropriate Azure File Share tiers based on bandwidth, IOPS, and throttling limits.
- Integrated OneDrive and enabled Condor_ssh on nodes for efficient job monitoring.
- Leveraged grid computing software to initiate jobs on spot instances for significant cost reduction.
Key Results
Delivering tangible improvements across the infrastructure.
Optimized Storage
Implementation of cost-effective long-term storage solutions, reducing overhead while maintaining data accessibility.
UX Improved
Integration of OneDrive and direct conversion of "user" to user accounts, eliminating unnecessary user switches during installations.
Cost & Speed
Condor_ssh enabled for efficient job monitoring and grid computing utilized for spot instances, significantly reducing costs.
Objectives
- • Enhance HPC worker performance, addressing throttling on quantum chemistry apps.
- • Streamline user experience on Compute Services by removing admin switches.
Achievements
- • Successfully achieved enhanced performance.
- • Streamlined user experiences and optimized costs for HPC and Compute Service.