Voiced by Amazon Polly |
Overview
This guide provides a step-by-step approach to automate the deletion of .webm files from an Amazon S3 bucket using a bash script and the AWS CLI. By leveraging AWS CLI, system administrators and DevOps engineers can efficiently manage file storage in Amazon S3, eliminating outdated or unnecessary files based on file age, reducing storage costs, and optimizing bucket performance.
Pioneers in Cloud Consulting & Migration Services
- Reduced infrastructural costs
- Accelerated application deployment
Introduction
Managing storage on Amazon S3 can become cumbersome over time as unused or outdated files pile up, driving up storage costs and reducing operational efficiency. One common scenario is dealing with old .webm files (video format), which might be irrelevant after a certain period. In such cases, automating the deletion of these files can save time, effort, and costs.
Using the AWS CLI combined with a bash script, you will learn how to efficiently clean up Amazon S3 storage while controlling the type and age of removed files.
Why Automate File Deletion?
In scenarios where specific file types (e.g., .webm files) are no longer relevant, deleting them periodically can help maintain storage hygiene and reduce costs.
To automate the deletion of these files, we can use the AWS CLI combined with a bash script to delete objects older than a certain age.
Prerequisites
Before diving into the script and execution steps, ensure that you have the following:
- Operating System: Linux, macOS, or WSL (Windows Subsystem for Linux)
- AWS CLI: Installed and configured (Step-by-step guide provided below)
- IAM User: An AWS IAM user with the necessary permissions to delete objects in the S3 bucket
- Permissions: s3:ListBucket and s3:DeleteObject
Step-by-Step Guide
Step 1: Install AWS CLI
The AWS Command Line Interface (CLI) lets you interact with AWS services using command-line tools. The first step is to install the AWS CLI on your system.
Installation Guide
Refer to the official AWS documentation installation guide for in-depth instructions.
For Linux systems, you can install the AWS CLI using the following commands:
1 2 3 |
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" unzip awscliv2.zip sudo ./aws/install |
For Windows Server or macOS, please follow the official AWS documentation for installation instructions tailored to your OS.
Step 2: Configure AWS CLI
Once the AWS CLI is installed, you must configure it with your AWS IAM credentials. You must set up an AWS IAM user with programmatic access to accomplish this. You will receive a Secret Access Key and an Access Key ID.
Start the configuration process by executing the command provided below:
1 |
aws configure |
You will be prompted to enter:
- AWS Access Key ID: The access key ID belonging to the IAM user
- AWS Secret Access Key: The secret access key for the same IAM user
- Default region: e.g., us-east-1 (choose the region where your S3 bucket is located)
- Default output format: J JSON or any other format (like text) is the default output format
- Here’s an example of the output:
1 2 3 4 |
AWS Access Key ID [None]: ABCDEFGHIJKLMNOPQRST AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY Default region name [None]: us-west-2 Default output format [None]: json |
Step 3: Writing the Bash Script
Now that you have AWS CLI installed and configured, the next step is to create a bash script that will delete .webm files older than a certain number of days from your Amazon S3 bucket.
Create the Script
Open a terminal and create a new file called delete_old_webm_files.sh to begin writing the script:
1 |
nano delete_old_webm_files.sh |
Next, copy and paste the script that follows into the document:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
#!/bin/bash # Bucket name BUCKET_NAME=" your-s3-bucket-name " # Number of days to filter (change this value if needed) DAYS=60 # Calculate the date threshold date_threshold=$(date -d "-$DAYS days" +%Y-%m-%d) # List all objects recursively, filter for '.txt' files aws s3 ls --recursive s3://$BUCKET_NAME/ | grep '\.webm$' > webm_files.txt # Initialize counter count=0 # Check if the file is not empty if [ -s webm_files.txt ]; then # Read each line from the file while read -r line; do # Extract the date and file details date=$(echo $line | awk '{print $1}') time=$(echo $line | awk '{print $2}') size=$(echo $line | awk '{print $3}') file=$(echo $line | awk '{print $4}') # Check if the file's date is older than the specified threshold if [[ "$date" < "$date_threshold" ]]; then aws s3 rm "s3://$BUCKET_NAME/$file" count=$((count + 1)) echo "Deleted: $file | Date: $date $time | Size: $size bytes" fi done < webm_files.txt if [ $count -gt 0 ]; then echo "All '.webm' files older than $DAYS days have been deleted." echo "Total '.webm' files deleted: $count" else echo "No '.webm' files found older than $DAYS days." fi else echo "No '.webm' files found." fi # Clean up rm webm_files.txt |
This above script performs the following tasks:
- Defines the Amazon S3 bucket name, number of days, and file suffix (in this case, .webm).
- Lists objects in the Amazon S3 bucket and filters them based on the specified suffix.
- Compare the last modified date of each file with the current date to determine if the file is older than the specified number of days.
- Deletes the files that meet the criteria.
Save and Exit the Editor
In nano, save the file by pressing CTRL + O and exit the editor by pressing CTRL + X.
Step 4: Make the Script Executable
To ensure the script can be executed, you need to make it executable:
1 |
chmod +x delete_old_webm_files.sh |
Step 5: Run the Script
You’re now ready to run the script. Execute it by running:
1 |
./delete_old_webm_files.sh |
The script will iterate through the .webm files in the Amazon S3 bucket, checking their age and deleting those older than the specified number of days.
Step 6: Automating the Script
You can schedule this script to run at regular intervals using cron jobs in Linux. For example, to run the script daily, add it to your crontab:
1 |
crontab -e |
Then, add the following line:
1 |
0 0 * * * /path/to/delete_old_webm_files.sh |
This will cause the script to execute every day at midnight.
Conclusion
In this blog, we walked through setting up a bash script to delete .webm files older than a specified number of days from an Amazon S3 bucket. This is a handy automation technique for anyone managing large volumes of data on Amazon S3, as it helps control costs and maintain storage hygiene.
Drop a query if you have any questions regarding Amazon S3 and we will get back to you quickly.
Experience Effortless Cloud Migration with Our Expert Solutions
- Stronger security
- Accessible backup
- Reduced expenses
About CloudThat
CloudThat is a leading provider of Cloud Training and Consulting services with a global presence in India, the USA, Asia, Europe, and Africa. Specializing in AWS, Microsoft Azure, GCP, VMware, Databricks, and more, the company serves mid-market and enterprise clients, offering comprehensive expertise in Cloud Migration, Data Platforms, DevOps, IoT, AI/ML, and more.
CloudThat is the first Indian Company to win the prestigious Microsoft Partner 2024 Award and is recognized as a top-tier partner with AWS and Microsoft, including the prestigious ‘Think Big’ partner award from AWS and the Microsoft Superstars FY 2023 award in Asia & India. Having trained 650k+ professionals in 500+ cloud certifications and completed 300+ consulting projects globally, CloudThat is an official AWS Advanced Consulting Partner, Microsoft Gold Partner, AWS Training Partner, AWS Migration Partner, AWS Data and Analytics Partner, AWS DevOps Competency Partner, AWS GenAI Competency Partner, Amazon QuickSight Service Delivery Partner, Amazon EKS Service Delivery Partner, AWS Microsoft Workload Partners, Amazon EC2 Service Delivery Partner, Amazon ECS Service Delivery Partner, AWS Glue Service Delivery Partner, Amazon Redshift Service Delivery Partner, AWS Control Tower Service Delivery Partner, AWS WAF Service Delivery Partner and many more.
To get started, go through our Consultancy page and Managed Services Package, CloudThat’s offerings.
FAQs
1. Why should I delete old .webm files from my Amazon S3 bucket?
ANS: – Old .webm files, or any unused files, can accumulate over time and take up unnecessary space, increasing storage costs. Deleting them helps to reduce these costs and maintain a clean storage environment.
2. Can I modify the script to delete other file types?
ANS: – Yes! Change the file suffix (e.g., .webm to .log or .tmp) in the bash script to target different file types.
3. Is this script compatible with Windows?
ANS: – The script is designed to run on Linux, macOS, or WSL (Windows Subsystem for Linux) on Windows. If you’re using a Windows server, you can install WSL or adapt the script to PowerShell.
WRITTEN BY Shaikh Mohammed Fariyaj Najam
Mohammed Fariyaj Shaikh works as a Research Associate at CloudThat. He has strong analytical thinking and problem-solving skills, knowledge of AWS Cloud Services, migration, infrastructure setup, and security, as well as the ability to adopt new technology and learn quickly.
Click to Comment