39. HPC

HPC = High Performance Compute
Different services we can use to achieve HPC are:
i. Data Transfer
ii. Compute & Networking
iii. Storage
iv. Orchestration & Automation

Ways to get our data into AWS (Data Transfer):
i. Snowball, snowmobile (terabytes/ petabytes worth of data) (large amounts of data)
ii. AWS DataSync to store on S3, EFS, FSx for Windows etc
iii. Direct Connect: A cloud service solution that makes it easy to establish a dedicated n/w connection from your premises to AWS. We can establish private connectivity between AWS and your data center, office or co-location environment. In many cases we can reduce n/w costs, increase bandwidth throughput & provide a more consistent n/w experience than internet based connections.

Compute & networking services that allow to achieve HPC:

Compute Services Networking Services
EC2 instances that are GPU or CPU optimized Enhanced networking (EN)
EC2 fleets (spot instances or spot fleets) Elastic network adapters (ENA)
Placement groups (cluster placement groups) Elastic fabric adapters (EFA)
Storage services that allow to achieve HPC:
Instance attached storage:
i. EBS: Scale up to 64000 IOPS with provisioned IOPS (PIOPS)
ii. Instance store: Scale up to millions of IOPS with low latency
Network storage:
i. S3: Distributed object based storage. Not a file system
ii. EFS: File system. Scale IOPS based on total size, or use PIOPS
iii. FSx for lustre: HPC optimized distributed file system, millions of IOPS, also backed by S3.

Orchestration & automation services that allow to achieve HPC:
AWS Batch:
i. Enables developers, scientists & engineers to easily & efficiently run hundereds of thousands of batch computing jobs on AWS.
ii. AWS batch supports multinode parallel jobs, which allows us to run a single job that spans multiple EC2 instances.
iii. We can easily schedule jobs & launch EC2 instances according to needs.
AWS Parallel cluster:
i. Open source cluster management tool that makes it easy for you to deploy & manage HPS clusters on AWS.
ii. Parallel cluster uses a simple text file to model & provision all the resources needed for your HPC application in an automated & secure manner
iii. Automate creation of VPC, subnet, cluster type & instance type

Question 1:
An ivy-league university is assisting NASA to find potential landing sites for exploration vehicles of unmanned missions to our neighboring planets. The university uses High Performance Computing (HPC) driven application architecture to identify these landing sites. Which of the following EC2 instance topologies should this application be deployed on?
Answer: The EC2 instances should be deployed in a cluster placement group so that the underlying workload can benefit from low n/w latency and high n/w throughput

Question 2:
An ivy-league university is assisting NASA to find potential landing sites for exploration vehicles of unmanned missions to our neighboring planets. The university uses High Performance Computing (HPC) driven application architecture to identify these landing sites.
Which of the following EC2 instance topologies should this application be deployed on?
Options:
A. The EC2 instances should be deployed in a partition placement group so that distributed workloads can be handled effectively
B. The EC2 instances should be deployed in a cluster placement group so that the underlying workload can benefit from low network latency and high network throughput
C. The EC2 instances should be deployed in a spread placement group so that there are no correlated failures
D. The EC2 instances should be deployed in an Auto Scaling group so that application meets high availability requirements
Answer: B
Explanation
Correct option:
The EC2 instances should be deployed in a cluster placement group so that the underlying workload can benefit from low network latency and high network throughput
The key thing to understand in this question is that HPC workloads need to achieve low-latency network performance necessary for tightly-coupled node-to-node communication that is typical of HPC applications. Cluster placement groups pack instances close together inside an Availability Zone. These are recommended for applications that benefit from low network latency, high network throughput, or both. Therefore this option is the correct answer.
Incorrect options:
The EC2 instances should be deployed in a partition placement group so that distributed workloads can be handled effectively – A partition placement group spreads your instances across logical partitions such that groups of instances in one partition do not share the underlying hardware with groups of instances in different partitions. This strategy is typically used by large distributed and replicated workloads, such as Hadoop, Cassandra, and Kafka. A partition placement group can have a maximum of seven partitions per Availability Zone. Since a partition placement group can have partitions in multiple Availability Zones in the same region, therefore instances will not have low-latency network performance. Hence the partition placement group is not the right fit for HPC applications.
The EC2 instances should be deployed in a spread placement group so that there are no correlated failures – A spread placement group is a group of instances that are each placed on distinct racks, with each rack having its own network and power source. The instances are placed across distinct underlying hardware to reduce correlated failures. You can have a maximum of seven running instances per Availability Zone per group. Since a spread placement group can span multiple Availability Zones in the same Region, therefore instances will not have low-latency network performance. Hence spread placement group is not the right fit for HPC applications.
The EC2 instances should be deployed in an Auto Scaling group so that application meets high availability requirements – An Auto Scaling group contains a collection of Amazon EC2 instances that are treated as a logical grouping for the purposes of automatic scaling. You do not use Auto Scaling groups per se to meet HPC requirements.

Question 3:
A biotechnology company has multiple High Performance Computing (HPC) workflows that quickly and accurately process and analyze genomes for hereditary diseases. The company is looking to migrate these workflows from their on-premises infrastructure to AWS Cloud.
As a solutions architect, which of the following networking components would you recommend on the EC2 instances running these HPC workflows?
• Elastic Network Interface
• Elastic Fabric Adapter
• Elastic Network Adapter
• Elastic IP Address
Answer: B
Explanation
Correct option:
Elastic Fabric Adapter
An Elastic Fabric Adapter (EFA) is a network device that you can attach to your Amazon EC2 instance to accelerate High Performance Computing (HPC) and machine learning applications. It enhances the performance of inter-instance communication that is critical for scaling HPC and machine learning applications. EFA devices provide all Elastic Network Adapter (ENA) devices functionalities plus a new OS bypass hardware interface that allows user-space applications to communicate directly with the hardware-provided reliable transport functionality.
Incorrect options:
Elastic Network Interface – An Elastic Network Interface (ENI) is a logical networking component in a VPC that represents a virtual network card. You can create a network interface, attach it to an instance, detach it from an instance, and attach it to another instance. The ENI is the simplest networking component available on AWS and is insufficient for HPC workflows.
Elastic Network Adapter – Elastic Network Adapter (ENA) devices support enhanced networking via single root I/O virtualization (SR-IOV) to provide high-performance networking capabilities. Although enhanced networking provides higher bandwidth, higher packet per second (PPS) performance, and consistently lower inter-instance latencies, still EFA is a better fit for the given use-case because the EFA device provides all the functionality of an ENA device, plus hardware support for applications to communicate directly with the EFA device without involving the instance kernel (OS-bypass communication) using an extended programming interface.
Elastic IP Address – An Elastic IP address is a static IPv4 address associated with your AWS account. An Elastic IP address is a public IPv4 address, which is reachable from the internet. It is not a networking device that can be used to facilitate HPC workflows.