black and gray motherboard

High Performance Computing (HPC) on Azure

 Introduction to HPC High Performance Computing (HPC), also called “Big Compute”, uses a large number of CPU or GPU-based computers to solve complex mathematical tasks. Many industries use HPC to solve some of their most difficult problems.

Source: High Performance Computing (HPC) on Azure

How is HPC different on the cloud?

One of the primary differences between an on-premises HPC system and one in the cloud is the ability for resources to dynamically be added and removed as they're needed. Dynamic scaling removes compute capacity as a bottleneck and instead allow customers to right size their infrastructure for the requirements of their jobs.

The following articles provide more detail about this dynamic scaling capability.

Implementation checklist

As you're looking to implement your own HPC solution on Azure, ensure you're reviewed the following topics:

  • Choose the appropriate architecture based on your requirements
  • Know which compute options is right for your workload
  • Identify the right storage solution that meets your needs
  • Decide how you're going to manage all your resources
  • Optimize your application for the cloud
  • Secure your Infrastructure

Infrastructure

There are a number of infrastructure components necessary to build an HPC system. Compute, Storage, and Networking provide the underlying components, no matter how you choose to manage your HPC workloads.

Example HPC architectures

There are a number of different ways to design and implement your HPC architecture on Azure. HPC applications can scale to thousands of compute cores, extend on-premises clusters, or run as a 100% cloud-native solution.

The following scenarios outline a few of the common ways HPC solutions are built.

  • Computer-aided engineering services on AzureProvide a software-as-a-service (SaaS) platform for computer-aided engineering (CAE) on Azure.
  • Computational fluid dynamics (CFD) simulations on AzureExecute computational fluid dynamics (CFD) simulations on Azure.
  • 3D video rendering on AzureRun native HPC workloads in Azure using the Azure Batch service

Compute

Azure offers a range of sizes that are optimized for both CPU & GPU intensive workloads.

CPU-based virtual machines

GPU-enabled virtual machines

N-series VMs feature NVIDIA GPUs designed for compute-intensive or graphics-intensive applications including artificial intelligence (AI) learning and visualization.

Storage

Large-scale Batch and HPC workloads have demands for data storage and access that exceed the capabilities of traditional cloud file systems. There are a number of solutions to manage both the speed and capacity needs of HPC applications on Azure

For more information comparing Lustre, GlusterFS, and BeeGFS on Azure, review the Parallel Files Systems on Azure e-book and the Lustre on Azure blog.

Networking

H16r, H16mr, A8, and A9 VMs can connect to a high throughput back-end RDMA network. This network can improve the performance of tightly coupled parallel applications running under Microsoft MPI or Intel MPI.

Management

Do-it-yourself

Building an HPC system from scratch on Azure offers a significant amount of flexibility, but is often very maintenance intensive.

  1. Set up your own cluster environment in Azure virtual machines or virtual machine scale sets.
  2. Use Azure Resource Manager templates to deploy leading workload managers, infrastructure, and applications.
  3. Choose HPC and GPU VM sizes that include specialized hardware and network connections for MPI or GPU workloads.
  4. Add high performance storage for I/O-intensive workloads.

Hybrid and cloud Bursting

If you have an existing on-premises HPC system that you'd like to connect to Azure, there are a number of resources to help get you started.

First, review the Options for connecting an on-premises network to Azure article in the documentation. From there, you may want information on these connectivity options:

Once network connectivity is securely established, you can start using cloud compute resources on-demand with the bursting capabilities of your existing workload manager.

Marketplace solutions

There are a number of workload managers offered in the Azure Marketplace.

Azure Batch

Azure Batch is a platform service for running large-scale parallel and high-performance computing (HPC) applications efficiently in the cloud. Azure Batch schedules compute-intensive work to run on a managed pool of virtual machines, and can automatically scale compute resources to meet the needs of your jobs.

SaaS providers or developers can use the Batch SDKs and tools to integrate HPC applications or container workloads with Azure, stage data to Azure, and build job execution pipelines.

Azure CycleCloud

SaleBestseller No. 1
HP Elite Desktop PC Computer Intel Core i5 3.1-GHz, 8 gb Ram, 1 TB Hard Drive, DVDRW, 19 Inch LCD Monitor, Keyboard, Mouse, Wireless WiFi, Windows 10 (Renewed)
  • This Certified Refurbished product is tested and...
  • HP Elite 6200 Small Form Factor Desktop PC, Intel...
  • Includes: USB Keyboard & Mouse, WiFi Adapter,...
  • Ports: USB 2.0, DisplayPort, VGA, PS/2 keyboard,...
  • Operating System: Windows 10 64 Bit –...
SaleBestseller No. 2
HP 2022 Newest All-in-One Desktop, 21.5" FHD Display, Intel Celeron J4025 Processor, 16GB RAM, 512GB PCIe SSD, Webcam, HDMI, RJ-45, Wired Keyboard&Mouse, WiFi, Windows 11 Home, White
  • 【High Speed RAM And Enormous Space】16GB DDR4...
  • 【Processor】Intel Celeron J4025 processor (2...
  • 【Display】21.5" diagonal FHD VA ZBD anti-glare...
  • 【Tech Specs】2 x SuperSpeed USB Type-A 5Gbps...
  • 【Authorized KKE Mousepad】Include KKE Mousepad

Azure CycleCloud Provides the simplest way to manage HPC workloads using any scheduler (like Slurm, Grid Engine, HPC Pack, HTCondor, LSF, PBS Pro, or Symphony), on Azure

CycleCloud allows you to:

  • Deploy full clusters and other resources, including scheduler, compute VMs, storage, networking, and cache
  • Orchestrate job, data, and cloud workflows
  • Give admins full control over which users can run jobs, as well as where and at what cost
  • Customize and optimize clusters through advanced policy and governance features, including cost controls, Active Directory integration, monitoring, and reporting
  • Use your current job scheduler and applications without modification
  • Take advantage of built-in autoscaling and battle-tested reference architectures for a wide range of HPC workloads and industries

Workload managers

The following are examples of cluster and workload managers that can run in Azure infrastructure. Create stand-alone clusters in Azure VMs or burst to Azure VMs from an on-premises cluster.

Containers

Containers can also be used to manage some HPC workloads. Services like the Azure Kubernetes Service (AKS) makes it simple to deploy a managed Kubernetes cluster in Azure.

Cost management

Managing your HPC cost on Azure can be done through a few different ways. Ensure you've reviewed the Azure purchasing options to find the method that works best for your organization.

Security

For an overview of security best practices on Azure, review the Azure Security Documentation.

In addition to the network configurations available in the Cloud Bursting section, you may want to implement a hub/spoke configuration to isolate your compute resources:

HPC applications

Run custom or commercial HPC applications in Azure. Several examples in this section are benchmarked to scale efficiently with additional VMs or compute cores. Visit the Azure Marketplace for ready-to-deploy solutions.

 Note

Check with the vendor of any commercial application for licensing or other restrictions for running in the cloud. Not all vendors offer pay-as-you-go licensing. You might need a licensing server in the cloud for your solution, or connect to an on-premises license server.

Engineering applications

Graphics and rendering

AI and deep learning

MPI providers

Remote visualization

New
HP Stream 14 inch Laptop for Student and Business, Intel Quad-Core Processor, 16GB RAM, 64GB eMMC, 1-Year Office 365, Webcam, 12H Long Battery Life, Lightweight & Slim Laptop, Wi-Fi, Win 11 H in S
  • 【Processor】Intel Celeron N4120, 4 Cores & 4...
  • 【Display】14.0-inch diagonal, HD (1366 x 768),...
  • 【Storage】16GB high-bandwidth DDR4 Memory (2400...
  • 【Connectivity】1 x USB 3.1 Type-C ports, 2 x...
  • 【System】Windows 11 Home in S mode operating...
New
HAJAAN SuperX Gaming PC | Liquid Cooled | GeForce RTX 4060 8GB | AMD Ryzen 5 5600G | 32GB DDR4 | 1TB SSD | Windows 11 Pro | WiFi | Bluetooth - Black
  • Configured with AMD Ryzen 5 5600G Processor and...
  • 8GB GeForce RTX 4060 GDDR6 dedicated graphics card...
  • Liquid cooling system keeps internal components at...
  • Integrated PCIE Wi-Fi provides excellent wireless...
  • Includes USB Gaming RGB Mechanical Keyboard, Mouse...
New
Lenovo 2023 IdeaPad 1i Essential Laptop Computer, Intel Core i5-1235U 12th Gen, 15.6" FHD Anti-Glare Display, (16GB DDR4 RAM, 512GB SSD), HDMI, Bluetooth, Windows 11, Cloud Grey, W/GaLiMu
  • ✔【Display】 15.6" FHD (1920x1080) TN 220nits...
  • ✔【Memory & Storage】RAM Size 16GB 3200MHz...
  • ✔【Connectivity】 1x USB 2.0, 1x USB 3.2 Gen...
  • ✔【Processor & Graphics】 12th Generation...
  • ✔【Operating System】 Windows 11

Run GPU-powered virtual machines in Azure in the same region as the HPC output for the lowest latency, access, and to visualize remotely through Azure Virtual Desktop, Citrix, or VMware Horizon.

Performance benchmarks

Customer stories

There are a number of customers who have seen great success by using Azure for their HPC workloads. You can find a few of these customer case studies below:

Other important information

  • Ensure your vCPU quota has been increased before attempting to run large-scale workloads.

Next steps

For the latest announcements, see:

Microsoft Batch Examples

These tutorials will provide you with details on running applications on Microsoft Batch