38. EC2 Placement Groups

Three types of placement groups

Clustered Placement Group Spread Placement Group Partitioned Placement Group
Grouping of instances within a single AZ. CPG are recommended for applications that need low network latency, high network throughput or both Group of instances that are each placed on distinct underlying hardware When using PPG, EC2 divides each group into logical segments called Partitions. EC2 ensures that each partition within a placement group has its own set of racks. Each rack has its own n/w & power source. No two partitions within a placement group share the same racks, allowing to isolate the impact of hardware failure within application.
Putting EC2 instances very very close to each other These will be on separate racks, separate n/w inputs, separate power requirements. So if you have one rack that fails its only going to affect that one EC2 instance. Multiple EC2 instances
Only certain instances can be launched into a CPG SPG are recommended for applications that have a small number of critical instances that should be kept separate from each other. Use cases: HDFC, HBase, Cassandra
Opposite of CPG
Think of a single instance
Individual critical EC2 instances & we need them to be on separate pieces of hardware.
Cluster placement groups pack instances close together inside an Availability Zone. These are recommended for applications that benefit from low network latency, high network throughput, or both.
i. A CPG cant span across multiple AZs.
ii. A Spread & Partitioned placement group can span across multiple AZs but they have to be within the same region.
iii. The name we specify for a placement group must be unique within AWS account
iv. Only certain types of instances can be launched in a placement group (compute optimized, CPU, memory optimized & storage optimized)
v. AWS recommend homogenous instances within clustered placement groups
vi. We cant merge placement groups
vii. We can move an existing instance into a placement groups. Before you move the instance, the instance must be in the stopped state. We can move or remove an instance using AWS CLI or AWS SDK, we cant do it via the console yet.

Question 1:
A big-data consulting firm is working on a client engagement where the ETL workloads are currently handled via a Hadoop cluster deployed in the on-premises data center. The client wants to migrate their ETL workloads to AWS Cloud. The AWS Cloud solution needs to be highly available with about 50 EC2 instances per Availability Zone.
As a solutions architect, which of the following EC2 placement groups would you recommend handling the distributed ETL workload?
A• Both Spread placement group and Partition placement group
B• Spread placement group
C• Cluster placement group
D• Partition placement group
Answer: D
Explanation
Correct option:
Partition placement group
You can use placement groups to influence the placement of a group of interdependent instances to meet the needs of your workload. Depending on the type of workload, you can create a placement group using one of the following placement strategies:
Partition – spreads your instances across logical partitions such that groups of instances in one partition do not share the underlying hardware with groups of instances in different partitions. This strategy is typically used by large distributed and replicated workloads, such as Hadoop, Cassandra, and Kafka. Therefore, this is the correct option for the given use-case.
Incorrect options:
Cluster placement group
Cluster – packs instances close together inside an Availability Zone. This strategy enables workloads to achieve the low-latency network performance necessary for tightly-coupled node-to-node communication that is typical of HPC applications. This is not suited for distributed and replicated workloads such as Hadoop.
Spread placement group
Spread – strictly places a small group of instances across distinct underlying hardware to reduce correlated failures. This is not suited for distributed and replicated workloads such as Hadoop.
Both Spread placement group and Partition placement group – As mentioned earlier, the spread placement group is not suited for distributed and replicated workloads such as Hadoop. So this option is also incorrect.