Voiced by Amazon Polly |
Introduction
In today’s digital world, the demand for storing and managing large files, such as videos, high-resolution images, and data backups, is rising. Amazon Simple Storage Service (S3) is a commonly favored option for this particular task. A frequent challenge involves transferring large files between Amazon S3 buckets, where accomplishing a reliable upload in a single operation can be quite demanding. To address this issue, AWS provides a Multipart Upload feature, allowing you to upload large objects in smaller parts for improved efficiency and reliability. This comprehensive tutorial will take you through each process step, demonstrating how to utilize the Multipart Upload feature to transfer large files from one Amazon S3 bucket to another, guaranteeing a seamless and dependable experience.
Pioneers in Cloud Consulting & Migration Services
- Reduced infrastructural costs
- Accelerated application deployment
Why Multipart Upload?
There are several reasons to use Multipart Upload:
- Optimal for Large Files: Uploading large files as a single object can be problematic, as network interruptions or upload failures could lead to data loss. Multipart Upload mitigates this risk by breaking the object into smaller parts.
- Parallel Processing: Multipart Upload enables parallel processing, which can significantly improve upload speed, especially when dealing with large files.
- Pause and Resume: With this feature, you can pause and resume the process later, eliminating the need to restart from the beginning. This capability proves especially convenient for handling extensive data transfers on less dependable connections.
- Resilience: Multipart Upload increases the chances of a successful upload. If a part fails to upload, you can retry that part without re-uploading the entire file.
Prerequisites
Before you get started, make sure you have the following:
- AWS Account: You need an AWS account with appropriate permissions to use Amazon S3 and Amazon EC2 instances.
- Amazon S3 Bucket: Create an Amazon S3 bucket where you want to upload your large files. Note down the bucket name, as you’ll need it in the upcoming steps.
- Amazon EC2 Instance: Launch an Amazon EC2 instance as an intermediary for the data transfer.
- Putty: Download and install Putty to establish an SSH connection with the Amazon EC2 instance.
Step-by-Step Guide
Step 1: First, we must create a role for Amazon EC2 with S3FullAccess permission.
Step 2: Make sure you have two Amazon S3 buckets, one containing a large file and the other to which you want to upload a large file.
Step 3: Create an Amazon EC2 instance with Linux OS, attach the role we created before, create keypair with .ppk extension for putty, and create a file into the Amazon EC2 instance
1 2 3 4 5 |
#!/bin/bash sudo su yum update –y mkdir /home/ec2-user/Tech/ aws s3 cp s3://multipartupload-bucket1/video27.mp4 /home/ec2-user/Tech/ |
(Note: Name of the bucket1 and directory which we created before)
Step 4: Once you SSH into the Amazon EC2 instance, use this command to view the newly created directory “Tech”.
1 2 3 |
# sudo su # ls # cd Tech |
Step 5: Split the file into segments
- The split command will split a large file into many segments based on the specified option.
- Here, we segment a 127 MB file into 50 MB parts, with the [-b] option indicating bytes.
1 |
# split -b 50M original file (video27.mp4) |
- To view the chunked files, run the below command.
1 |
# ls –lh |
- Info: Here, “xaa”, “xab”, and “xac” are the chunked files that have been renamed.
- Each file is 50MB in size, except for the final one. The number of segments varies depending on the size of your initial file and the byte value used to divide the segments. For ex. (a 127 MB file can be divided into 50M, 50M, and 27M with part numbers 1,2 and 3, respectively).
Step 6: Create a Multipart Upload
- We are commencing the multipart upload by utilizing an AWS CLI command, and this action will generate a UploadID that we’ll utilize in subsequent steps.
1 |
# aws s3api create-multipart-upload --bucket [Bucket name] --key [original file name] |
(Note: Replace the bucket name above with our bucket name and the key with the original video file name.)
Step 7: Uploading the File Segments
- Upload each file segment individually, specifying the part number. Part numbers are assigned based on the alphabetical order of the files.
1 |
# aws s3api upload-part --bucket [bucketname] --key [original filename] --part-number [number] --body [segment file name] --upload-id [id] |
(Note: Be sure to substitute the placeholder values with your actual information and keep a record of the ETag ID and Part number for future reference.)
- Execute the CLI command mentioned above for each individual file segment. Replace –part-number & –body values.
- Make sure to save the ETag value each time you upload a segment for future reference.
Step 8: Create a JSON file for Multipart Upload
- Create a JSON file with all part numbers and their Etag values.
1 |
# nano list.json |
- Paste the JSON Script in the list.json file.
Step 9: Complete the Multipart Upload:
- Next, we will merge all the segments/chunks with the help of the JSON file created in the previous step.
1 |
# aws s3api complete-multipart-upload --multipart-upload [JSON file link] --bucket [upload bucket name] --key [original file name] --upload-id [upload id] |
(Note: Replace the example above with your bucket name, JSON file link(file://list.json), the key with an original video file, and Upload-Id value with your upload ID)
- Finally, check whether the video file was uploaded into the specified bucket or not. If uploaded, then multipart upload has been completed.
Conclusion
Multipart uploads to Amazon S3 are a powerful feature that can significantly improve the efficiency of uploading large files and objects. By adhering to the detailed instructions provided above, you can fully harness the capabilities of this feature, thereby guaranteeing a seamless and efficient transfer of your data to Amazon S3. Whether dealing with large media files, backups, or other substantial data, multipart uploads should be your go-to method for a seamless upload experience.
Drop a query if you have any questions regarding Multipart uploads to Amazon S3, and we will get back to you quickly.
Making IT Networks Enterprise-ready – Cloud Management Services
- Accelerated cloud migration
- End-to-end view of the cloud environment
About CloudThat
CloudThat is an official AWS (Amazon Web Services) Advanced Consulting Partner and Training partner, AWS Migration Partner, AWS Data and Analytics Partner, AWS DevOps Competency Partner, Amazon QuickSight Service Delivery Partner, Amazon EKS Service Delivery Partner, and Microsoft Gold Partner, helping people develop knowledge of the cloud and help their businesses aim for higher goals using best-in-industry cloud computing practices and expertise. We are on a mission to build a robust cloud computing ecosystem by disseminating knowledge on technological intricacies within the cloud space. Our blogs, webinars, case studies, and white papers enable all the stakeholders in the cloud computing sphere.
To get started, go through our Consultancy page and Managed Services Package, CloudThat’s offerings.
FAQs
1. Is there a limit to the size of files that can be uploaded using multipart upload in Amazon S3?
ANS: – Amazon S3 supports files up to 5 terabytes, which can be uploaded using multipart upload. However, you can choose the part size, and it’s important to optimize it based on your use case.
2. What happens if a part of the multipart upload fails?
ANS: – If a part fails to upload, you can retry the failed part. Amazon S3 retains successfully uploaded parts, so you don’t have to re-upload them.
WRITTEN BY Rajeshwari B Mathapati
Rajeshwari B Mathapati is working as a Research Associate (WAR and Media Services) at CloudThat. She is Google Cloud Associate certified. She is interested in learning new technologies and writing technical blogs.
Click to Comment