Case Study

Optimizing HPC For Enhanced
Performance & Experience

A leading Data Science and Modeling platform empowers Explorers and Builders to create cutting-edge models. We upgraded their HPC and Extended Compute bare metal services to unlock next-level capabilities.

K8s Kubernetes
Azure Azure File Share
HTCondor / Grid
Python Python

The Problem

Post-upgrade, critical quantum chemistry software experienced severe performance issues on HPC Workers due to throttling effects.

Additionally, the Compute service posed confusion for admins during app installations, requiring unnecessary user switches that slowed down operations.

The Solution

  • Selected appropriate Azure File Share tiers based on bandwidth, IOPS, and throttling limits.
  • Integrated OneDrive and enabled Condor_ssh on nodes for efficient job monitoring.
  • Leveraged grid computing software to initiate jobs on spot instances for significant cost reduction.

Key Results

Delivering tangible improvements across the infrastructure.

Optimized Storage

Implementation of cost-effective long-term storage solutions, reducing overhead while maintaining data accessibility.

UX Improved

Integration of OneDrive and direct conversion of "user" to user accounts, eliminating unnecessary user switches during installations.

Cost & Speed

Condor_ssh enabled for efficient job monitoring and grid computing utilized for spot instances, significantly reducing costs.

Objectives

  • • Enhance HPC worker performance, addressing throttling on quantum chemistry apps.
  • • Streamline user experience on Compute Services by removing admin switches.

Achievements

  • • Successfully achieved enhanced performance.
  • • Streamlined user experiences and optimized costs for HPC and Compute Service.