11. S3 Select and Glacier Select
S3 Select enables applications to retrieve only a subset of data from an object by using simple SQL expressions. We can get data by rows or columns using simple SQL expressions. S3 select retrieves only the data that is needed by application and we can achieve drastic performance increases. In many cases we can get as much as a 400% improvement.
Assume that your data is stored in S3 in zip files that contain csv files. Without S3 Select, we need to download, decompress and process the entire csv to get the data we needed.
With S3 Select, we can use a simple SQL expression to return only the data from the store we are intended to retrieve instead of retrieving the entire object. This means we are dealing with an order of magnitude less data, which improves the performance of underlying applications.
Some companies in highly regulated industries like financial services, healthcare and others write data directly to Amazon Glacier to satisfy compliance needs like SEC Rule 17a-4 or HIPAA. Many S3 users have lifecycle policies designed to save on storage costs by moving their data into Glacier when they no longer need to access it on a regular basis.
Glacier Select allows to run SQL queries against Glacier directly.