AWS Cheat Sheet Storage S3

Different type of Storage provided by Amazon AWS

Amazon is providing very different type of storages and this blog will cover everything you need to know about them to pass your AWS Associate Architect Certification.

AWS is providing 3 different types of Storages :

Object Storage like S3 managed data as object
File System Storage like EFS manage data as files and fire hierarchy
Block level Storage like EBS which manages data as blocks within sector and tracks

Object Storage S3

Source : AWS S3 FAQ

S3 - Simple Storage Service is a cloud object storage. You can upload any kind of data in any format on it and it’s unlimited storage capacity.

It’s secure, extremely durable, highly available and infinitely scalable storage.

It’s extremely durable as it’s provide 99.999999999% durability (99,9 nine ). Amazon S3 storage store redundantly objects on multiples devices across a minimum of 3 availabilities zones ( 3AZ ) in an AWS region. AWS can sustain loss of data in 2 AZ without any loss !

There is only one exception on this rule, S3 One-Zone IA (Infrequent Access) will store data across multiples in One AZ only.

You can add extra protection to avoid deletion or lose of data :

Versioning
CCR - Cross region Replication
MFA Delete ( require code for deletion )
S3 Lock

S3 hqs unlimited storage capacity, files can be from 0 to 5TB but the largest object you can upload in a single PUT is 5GB. If you want to upload more than 100MB, better to use MultiPart Upload tools.

You can integrate it with a lot of AWS Service and even using it for Static website hosting like the one you are in.

All the data is managed as object using a rest API ( Application Program Interface), using of standard http(s) requests : CRUD

CREATE: PUT ( and POST)
READ: GET
UPDATE: POST
DELETE: DELETE

As it’s using Rest API and the system is replicate in different location, S3 is eventually consistent system as it can take time to propagate in case of :

PUTs of existing object ( overwrite )
DELETE objects

For all the other operations, S3 storage is consistent system.

Amazon S3 is a simple key-based object store.

When you store data, you assign a unique object key that can later be used to retrieve the data.

All your data are available via a web URL as follow :

‘http://bucketname.s3.amazonaws.com/file.doc'

if folder :

‘http://bucketname.s3.amazonaws.com/folder/file.doc'

You can see why each bucket name has to be unique.

S3 Objets may consists of :

Key : Name of the object
Value : Data itself, made of sequence of bytes
Version ID : the version of object when versioning is enabled
MetaData : Additional information attached to the object like and system meta used by AWS (md5 digest, size, ..)

Each file are identified by a combination of 3 elements :

bucket
key
Version ID ( Optional )

S3 Bucket holds objects and can have also folder holding objects. Bucket’s Name has to be unique globally ( think like a domain name ) and names has to follow these rules :

Names has to be from 3 to 63 char
Names can not contains IP address
Only Lowercase letters, number and dots (.), and hyphens (-).
Bucket names cannot begin with xn–
Bucket names must begin and end with a letter or number.

For best compatibility, AWS recommend that you avoid using dots (.) in bucket names, except for buckets that are used only for static website hosting.

By default, you can create up to 100 buckets in each of your AWS accounts.

S3 Bucket and Data are regional and data are automatically stored in a minimum of 3AZ . Objects stored in the S3 One Zone-IA storage class are stored redundantly within a single Availability Zone in the AWS Region you select but in the other side, it’s a lot cheaper.

Data stored in this storage class is susceptible to loss in an AZ destruction event.

You will chose your region for you S3 storage follow few criteria:

Proximity with your customer and your others AWS resources
Specific legal and regulatory requirements ( Casino regulation by example)
Allows you to reduce storage costs. You can choose a lower priced region to save money. The price is not the same in all region.

Costs

With Amazon S3, you pay only for what you use. Amazon S3 charges you for the following types of usage :

Storage Used based on “TimedStorage-ByteHrs, an average for the month.
Network Data Transferred In. Data Write/Upload. This represents the amount of data sent to your Amazon S3 buckets.
Network Data Transferred Out: Data read.
Data Requests as it’s Rest API, so each PUT, GET, LIST, SELECT and POST requests. DELETE and CANCEL requests are free
Data Retrieval - ONLY for S3 Standard-Infrequent Access (S3 Standard-IA) and S3 One Zone-IA storage classes and Glacier class

Security on S3

On creation, only the resource owners have access to Amazon S3 resources they create. All new bucket are private by default.

Amazon S3 supports user authentication to control access to data.

You can use access control mechanisms such as bucket policies and Access Control Lists (ACLs) to selectively grant permissions to users and groups of users.

You can securely upload/download your data to Amazon S3 via SSL endpoints using the HTTPS protocol, so data is encrypted in transit.

If you need extra security you can use the Server-Side Encryption (SSE) option to encrypt data stored at rest.

For Server-Side Encryption (SSE), you can use this different Encryption

S3 Managed Keys : Amazon manages all the keys
SSE-AES : S3 handles the keys - use AES-256 algorithm
SSE-KMS : Envelope Encryption, AWS KMS and you managed the keys
SSE-C : Customer provided key and then you managed the keys

You can configure your Amazon S3 buckets to automatically encrypt objects before storing them if the incoming storage requests do not have any encryption information. Alternatively, you can use your own encryption libraries to encrypt data before storing it in Amazon S3.

Customers may use four mechanisms for controlling access to Amazon S3 resources:

Identity and Access Management (IAM) policies Grant IAM users fine-grained control to their Amazon S3 bucket or objects while also retaining full control over everything the users do.
Bucket policies - Json Policy - more complex - Fine-grained ACL With bucket policies, customers can define complex rules which apply broadly across all requests to their Amazon S3 resources, such as granting write privileges to a subset of Amazon S3 resources. With Bucket policies, you can configure who can access which data and WHEN and even from WHERE ( IP or CIDR).
Access Control Lists (ACLs) - Legacy and simple - read, write and FULL. With ACLs, customers can grant specific permissions (i.e. READ, WRITE, FULL_CONTROL) to specific users for an individual bucket or object.
Query String Authentication or Pre-signed URL Customers can create a URL to an Amazon S3 object which is only valid for a limited time. Think like a time limited URL to share object with others. It’s very useful to provided access to private object temporary. You can use AWS Cli or SDK to generate Pre-signed URL:

aws s3 presign S3://bucket --expire-in 300 ( seconds)

The command will provide you a temporary URL that you can share.

Logging and monitoring in Amazon S3

Monitoring is an important part of maintaining the reliability, availability, and performance of Amazon S3 and your AWS solutions.

Aws provide tools for monitoring your S3 resources :

Amazon CloudWatch Alarms - check metric and be notify on SNS
AWS CloudTrail Logs - CloudTrail provides a record of actions taken by a user, role, or an AWS service in Amazon S3.
Amazon S3 Access Logs to see all API requests made on your buckets
AWS Trusted Advisor to provide you best practice using S3 and give you recomemndations

Features

Versioning - or version control

Versioning will give you the feature to keep different version of the same object and it will protect data against accidental or malicious deletion. Each version will by identified by an unique ID on the object.

It’s fully integrate with S3 lifecycle rules.

⚠️ Once the versioning is enabled, it’s not possible to disable it, you can only suspended it.

MFA Delete

It will provide one extra protection against deletion of your data. In case of DELETE requests, an MFA code will be asked. Bucket must have versioning enabled. It can be turn on only from AWS CLI. Root account is the only one allowed to delete object.

S3 Object Lock

With S3 Object Lock, you can store objects using a write-once-read-many (WORM) model. You can use it to prevent an object from being deleted or overwritten for a fixed amount of time or indefinitely. Object Lock helps you meet regulatory requirements that require WORM storage, or simply add another layer of protection against object changes and deletion.

Object Lock provides two ways to manage object retention: retention periods and legal holds.

A retention period specifies a fixed period of time during which an object remains locked. During this period, your object is WORM-protected and can’t be overwritten or deleted.
A legal hold provides the same protection as a retention period, but it has no expiration date. Instead, a legal hold remains in place until you explicitly remove it. Legal holds are independent from retention periods.

CCR - Croos region Replication

If enabled, any object uploaded will be replicate to another region. It will provide you higher durability and potential disaster recovery for object. Replication enables automatic, asynchronous copying of objects across Amazon S3 buckets. It has to be enable on bucket level and you must have versioning enabled on both source and destination buckets. You can CCR to another AWS account or region.

⚠️ AWS CCR and SSR will not replicate data already on the bucket, it will replicate only the new data or the date changed after it’s enabled from both side.

You can replicate objects from a source bucket to only one destination bucket.

After Amazon S3 replicates an object, the object can’t be replicated again.

If you want to replicate existing object, CCR is not fit for that, you need to contact AWS support.

US-east-1-bucket —> CCR –> EU-PARIS-1-bucket

SSR - Same region Replication

Same-Region replication (SRR) is used to copy objects across Amazon S3 buckets in the same AWS Region.

Why use Replication

Replicate objects while retaining metadata
Replicate objects into different storage classes
Maintain object copies under different ownership
Replicate objects within 15 minutes

When use CRR

Meet compliance requirements
Minimize latency
Increase operational efficiency

When use SRR

Aggregate logs into a single bucket
Configure live replication between production and test accounts
Abide by data sovereignty laws

Amazon S3 Transfer Acceleration

Fast and secure transfer of files over long distance between your end user and your s3 bucket. Think like cloudfront for S3. It will make use of cloudfront and Edge Location to provide to customer the lowest latency. As the data arrives at an edge location, data is routed to Amazon S3 over an optimized network path.

⚠️ When using Transfer Acceleration, additional data transfer charges may apply.

User/Customer will use distinct URL for a close edge location:

‘bucketname.s3-accelerate.amazonaws.com’

‘accelerate.dualstack.amazonaws.com’ to connect to the enabled bucket over IPv6.

⚠️ The name of the bucket used for Transfer Acceleration must be DNS-compliant and must not contain periods (".").

Amazon S3 LifeCycle Management

Object can be moved between storage class or delete ( expired ) automatically based on a schedule.

Ex:

StoreData on S3 Standard –> 30 days –> S3 IA–>90 days–> S3 Glacier –> 2 years –> Delete ( Expire )

LifeCycle are attached to bucket and you can apply to all object or only to object specified by prefix or tags. It’s integrate with versioning, then you can move/transition current version or previous version of object.

Static website hosting

For static content only like this website, website which doesn’t need server-side processing, backend, you can host your website on S3.

It can be dynamic and interactive using client-side script like javascript.

It’s a fast, secure and scalable way to deploy a website.

This is the process to deploy a website on AWS S3:

Create a bucket with the name as the desired website
Upload the static files to the bucket
Make all files Public
Enable static website hosting for the bucket Specify the index and error file
The website will be available from this URL : ‘https://bucketname.s3-website-aws-awsregion-amazonaws.com’
Optionally, create a DNS NAME on AWS and a Cname or Alias that point to this AWS S3 URL.

I’m sure you will ask why i didn’t use S3 for hosting my website ? Simply, because I like GitLab and I want to use the pipeline of GitLab after each commit to test and deploy my website and moreover i was trying to keep it free.

S3 Batch Operations

You can use S3 Batch Operations to perform large-scale batch operations on Amazon S3 objects. S3 Batch Operations can execute a single operation on lists of Amazon S3 objects that you specify.

S3 Batch Operations supports several different operations like :

invoking a lambda function
PUT Object copy - Copy object to another bucket to the same region or different region
Initiate Restore Object
Add Tag
Add and change ACL
Managing Object Lock

Amazon S3 analytics – Storage Class Analysis

By using Amazon S3 analytics Storage Class Analysis you can analyze storage access patterns to help you decide when to transition the right data to the right storage class. You can use the analysis results to help you improve your lifecycle policies.

Amazon S3 event notifications

The Amazon S3 notification feature enables you to receive notifications when certain events happen in your bucket. You can be alert for event like :

New object created events
Object removal events
Restore object events
Reduced Redundancy Storage (RRS) object lost events
Replication events

The destination for the notification can be :

Aws Lambda - AWS Lambda is a compute service that makes it easy for you to build applications that respond quickly to new information.
AWS SQS - Amazon Simple Queue Service (Amazon SQS) queue
AWS SNS - Amazon Simple Notification Service (Amazon SNS) topic

The CORS specification gives you the ability to build web applications that make requests to domains other than the one which supplied the primary content. Think like a way for two Web services to state that they trust each other. CORS is a mechanism for Web services to announce that they will listen to certain requests from Web applications not hosted on their own servers.

Different S3 Storage Class ( From more expensive to cheaper )

Aws is providing you different storage class following your need.

S3 Standard The default storage class. If you don’t specify the storage class when you upload an object, Amazon S3 assigns the S3 Standard storage class.

It’s used for frequently access, it’s replicated across min 3 AZ in a region.

S3 Reduced Redundancy RRS Same class of storage but it’s replicated only on 1 AZ. It’s designed for noncritical, reproducible data that can be stored with less redundancy. Durability of 99,99%.
S3 Intelligent-Tiering The smart storage class, it will automatically moving data to the most cost-effective storage class.There are no retrieval fees when using the S3 Intelligent-Tiering storage class. If an object in the infrequent access tier is accessed, it is automatically moved back to the frequent access tier

⚠️ The S3 Intelligent-Tiering storage class is suitable for objects larger than 128 KB that you plan to store for at least 30 days. If the size of an object is less than 128 KB, it is not eligible for auto-tiering. If you delete an object before the end of the 30-day minimum storage duration period, you are charged for 30 days.

The S3 Standard-IA ( Infrequent Access ) and S3 One Zone-IA storage classes are designed for long-lived and infrequently accessed data. S3 Standard-IA and S3 One Zone-IA ( only one AZ) objects are available for millisecond access but you will pay retrieval fee for these objects. You will pay for a minimum of 30 days as well.

You might choose the S3 Standard-IA and S3 One Zone-IA storage classes:

For storing backups.
For older data that is accessed infrequently, but that still requires millisecond access.

The S3 Glacier and S3 Glacier Deep Archive storage classes are designed for low-cost data archiving

S3 Glacier : Use for archives where portions of the data might need to be retrieved in minutes.
- Data can be accessed in as little as 1-5 minutes using expedited retrieval.
- You are charged for 90 days minimum.
S3 Glacier Deep Archive : Use for archiving data that rarely needs to be accessed.
- default retrieval time of 12 hours.
- Minimum pay for 180 days
- Lowest Cost

So basically, if you chose Infrequent Access Storage ( IA or Glacier ), you will pay for retrieval fee and you pay as well for a minimum of day but globally for a lower cost.