16. DataSync Overview

DataSync basically allows to move large amounts of data into AWS and typically use it on your on premise data center.
On-premises Data Center [The DataSync agent is deployed as an agent on a server and connects to NAS or file system to copy data to AWS and write data from AWS] >> Network Transfers [DataSync automatically encrypts data and accelerates transfer over the WAN. DataSync performs automatic data integrity checks in-transit and at-rest] >> AWS Region [DataSync seamlessly and securely connects to Amazon S3, Amazon EFS or Amazon FSx for Windows file server to copy data and metadata to and from AWS]

Recap:
i) Used to move large amounts of data from on-premises to AWS
ii) Used with NFS and SMB compatible file systems
iii) Replication can be done hourly, daily or weekly
iv) Install the DataSync agent to start the replication
v) Can be used to replicate EFS to EFS

Question 1:
An organization has a large amount of data on Windows (SMB) file shares in their on-premises data center. The organization would like to move data into Amazon S3. They would like to automate the migration of data over their AWS Direct Connect link.
Which AWS service can assist them?
Options:
A. AWS Snowball
B. AWS DataSync
C. AWS CloudFormation
D. AWS Database Migration Service (DMS)
Answer: B
Explanation
AWS DataSync can be used to move large amounts of data online between on-premises storage and Amazon S3 or Amazon Elastic File System (Amazon EFS). DataSync eliminates or automatically handles many of these tasks, including scripting copy jobs, scheduling and monitoring transfers, validating data, and optimizing network utilization. The source datastore can be Server Message Block (SMB) file servers.
CORRECT: “AWS DataSync” is the correct answer.
INCORRECT: “AWS Database Migration Service (DMS)” is incorrect. AWS Database Migration Service (DMS) is used for migrating databases, not data on file shares.
INCORRECT: “AWS CloudFormation” is incorrect. AWS CloudFormation can be used for automating infrastructure provisioning. This is not the best use case for CloudFormation as DataSync is designed specifically for this scenario.
INCORRECT: “AWS Snowball” is incorrect. AWS Snowball is a hardware device that is used for migrating data into AWS. The organization plan to use their Direct Connect link for migrating data rather than sending it in via a physical device. Also, Snowball will not automate the migration.

Question 2:
A company runs an application in an on-premises data center that collects environmental data from production machinery. The data consists of JSON files stored on network attached storage (NAS) and around 5 TB of data is collected each day. The company must upload this data to Amazon S3 where it can be processed by an analytics application. The data must be transferred securely.
Which solution offers the MOST reliable and time-efficient data transfer?
Options:
A. AWS Database Migration Service over the internet
B. Multiple AWS Snowcone devices
C. AWS DataSync over AWS Direct Connect
D. Amazon S3 Transfer Acceleration over the Internet
Answer: C
Explanation
The most reliable and time-efficient solution that keeps the data secure is to use AWS DataSync and synchronize the data from the NAS device directly to Amazon S3. This should take place over an AWS Direct Connect connection to ensure reliability, speed, and security.
AWS DataSync can copy data between Network File System (NFS) shares, Server Message Block (SMB) shares, self-managed object storage, AWS Snowcone, Amazon Simple Storage Service (Amazon S3) buckets, Amazon Elastic File System (Amazon EFS) file systems, and Amazon FSx for Windows File Server file systems.
CORRECT: “AWS DataSync over AWS Direct Connect” is the correct answer.
INCORRECT: “AWS Database Migration Service over the Internet” is incorrect. DMS is for migrating databases, not files.
INCORRECT: “Amazon S3 Transfer Acceleration over the Internet” is incorrect. The Internet does not offer the reliability, speed or performance that this company requires.
INCORRECT: “Multiple AWS Snowcone devices” is incorrect. This is not a time-efficient approach as it can take time to ship these devices in both directions.

Question 3:
A global pharmaceutical company wants to move most of the on-premises data into Amazon S3, Amazon EFS, and Amazon FSx for Windows File Server easily, quickly, and cost-effectively.
As a solutions architect, which of the following solutions would you recommend as the BEST fit to automate and accelerate online data transfers to these AWS storage services?
Options:
A• Use File Gateway to automate and accelerate online data transfers to the given AWS storage services
B• Use AWS Transfer Family to automate and accelerate online data transfers to the given AWS storage services
C• Use AWS Snowball Edge Storage Optimized device to automate and accelerate online data transfers to the given AWS storage services
D• Use AWS DataSync to automate and accelerate online data transfers to the given AWS storage services
Answer: D
Explanation
Correct option:
Use AWS DataSync to automate and accelerate online data transfers to the given AWS storage services
AWS DataSync is an online data transfer service that simplifies, automates, and accelerates copying large amounts of data to and from AWS storage services over the internet or AWS Direct Connect.
AWS DataSync fully automates and accelerates moving large active datasets to AWS, up to 10 times faster than command-line tools. It is natively integrated with Amazon S3, Amazon EFS, Amazon FSx for Windows File Server, Amazon CloudWatch, and AWS CloudTrail, which provides seamless and secure access to your storage services, as well as detailed monitoring of the transfer.
DataSync uses a purpose-built network protocol and scale-out architecture to transfer data. A single DataSync agent is capable of saturating a 10 Gbps network link.
DataSync fully automates the data transfer. It comes with retry and network resiliency mechanisms, network optimizations, built-in task scheduling, monitoring via the DataSync API and Console, and CloudWatch metrics, events, and logs that provide granular visibility into the transfer process. DataSync performs data integrity verification both during the transfer and at the end of the transfer.
Incorrect options:
Use AWS Snowball Edge Storage Optimized device to automate and accelerate online data transfers to the given AWS storage services – Snowball Edge Storage Optimized is the optimal choice if you need to securely and quickly transfer dozens of terabytes to petabytes of data to AWS. It provides up to 80 TB of usable HDD storage, 40 vCPUs, 1 TB of SATA SSD storage, and up to 40 Gb network connectivity to address large scale data transfer and pre-processing use cases. As each Snowball Edge Storage Optimized device can handle 80TB of data, you can order 10 such devices to take care of the data transfer for all applications. The original Snowball devices were transitioned out of service and Snowball Edge Storage Optimized are now the primary devices used for data transfer. You may see the Snowball device on the exam, just remember that the original Snowball device had 80TB of storage space.
AWS Snowball Edge is suitable for offline data transfers, for customers who are bandwidth constrained or transferring data from remote, disconnected, or austere environments. Therefore, it cannot support automated and accelerated online data transfers.
Use AWS Transfer Family to automate and accelerate online data transfers to the given AWS storage services – The AWS Transfer Family provides fully managed support for file transfers directly into and out of Amazon S3 and Amazon EFS. Therefore, it cannot support migration into the other AWS storage services mentioned in the given use-case (Amazon FSx for Windows File Server).
Use File Gateway to automate and accelerate online data transfers to the given AWS storage services – AWS Storage Gateway’s file interface, or file gateway, offers you a seamless way to connect to the cloud to store application data files and backup images as durable objects on Amazon S3 cloud storage. File gateway offers SMB or NFS-based access to data in Amazon S3 with local caching. It can be used for on-premises applications, and for Amazon EC2-based applications that need file protocol access to S3 object storage. Therefore, it cannot support migration into the other AWS storage services mentioned in the given use-case (such as EFS and Amazon FSx for Windows File Server).