Voiced by Amazon Polly |
Overview
In data analytics, Amazon Redshift has established itself as a cornerstone for organizations seeking powerful and scalable data warehousing solutions. A pivotal aspect of leveraging the full potential of Redshift lies in the efficient loading of data, and at the heart of this process is the COPY command. This blog delves into the significance and capabilities of Amazon Redshift’s COPY command, highlighting its role in simplifying and optimizing the data loading journey.
Empowering organizations to become ‘data driven’ enterprises with our Cloud experts.
- Reduced infrastructure costs
- Timely data-driven decisions
Introduction
Amazon Redshift is a fully managed data warehouse service known for its speed, scalability, and simplicity. The COPY command is one key feature that lets users efficiently load data into Amazon Redshift. In this blog, we’ll explore the COPY command and look into the advanced capabilities Amazon Redshift COPY jobs offer.
COPY Command
The COPY command in Amazon Redshift plays a pivotal role in seamlessly loading large volumes of data from various sources into Redshift tables. Whether you’re dealing with CSV, JSON, or other supported file formats, the COPY command simplifies the data ingestion.
Features of COPY Command:
Direct Loading from Amazon S3:
The COPY command allows for seamless loading of data directly from Amazon S3, eliminating the need for intermediate staging tables. This direct integration makes data loading easy.
File conversion:
The COPY command supports various file conversions, such as timestamp formatting and file format conversion. This change ensures the loaded data stays consistent with the target table’s schema.
Error handling:
COPY provides a powerful error-handling mechanism. It logs errors and allows users to identify and resolve problems effectively. It also supports automatic rejection, where invalid rows can be sent to a separate table for further analysis.
COPY Command when loading data from Amazon S3 into Amazon Redshift:
Prerequisites:
- Amazon S3 bucket
- Amazon Redshift Cluster
- The AWS IAM role will be attached to the Amazon Redshift cluster with s3 access.
Steps:
- Table Creation: Create a table in Amazon Redshift using the below schema.
1 2 3 4 5 |
Create table sales( Customerid int Name varchar(100) Sales decimal ); |
- COPY Command: Run the below copy command in the Amazon Redshift query editor.
1 2 3 4 5 6 |
COPY sales FROM ‘s3://<bucket-name>/<folder-name>’ IAM_ROLE ‘<arn of the role>’ FORMAT ‘<file-format>’ IGNOREHEADER <number> DELIMITER ‘,’; |
- table_name:The name of the target table where data will be
FROM: Specifies the source location of the data, often an Amazon S3 bucket.
IAM_Role: Provides the AWS IAM role for accessing the data in Amazon S3.
FORMAT: Specifies the data format (e.g., CSV, JSON).
DELIMITER: Defines the delimiter used in the source file.
IGNOREHEADER: Specifies the number of header rows to skip.
COPY JOB
The COPY Job command is an extension to the COPY command that automatically loads data from an Amazon S3 bucket. When you create a COPY Job, Amazon Redshift detects when new Amazon S3 files are created in the specified path and uploads them without your intervention. Manage COPY commands that load data into tables.
LIMITATIONS OF COPY JOB:
- COPY Job doesn’t support other credentials like access key or secret key; it only supports copy command with AWS IAM role.
- COPY Job gets the data from the empty Amazon S3 bucket.
- COPY Job doesn’t get already existing data.
- You cannot specify an Amazon S3 file with the following columnar data formats: ORC and Parquet.
COPY JOB when loading data from Amazon S3 into Amazon Redshift:
- COPY JOB: Run the below COPY job command to get the incremental data into Amazon Redshift from Amazon S3.
1 2 3 4 5 6 7 |
COPY sales FROM ‘s3://<bucket-name>/<folder-name>’ IAM_ROLE ‘<arn of the role>’ FORMAT ‘<file-format>’ IGNOREHEADER <number> DELIMITER ‘,’ JOB CREATE sales_job AUTO ON; |
- AUTO ON|OFF:
When ON, Amazon Redshift monitors the source Amazon S3 path for newly created files, and if found, a COPY command is run with the COPY parameters in the job definition. This is the default.
When OFF, Amazon Redshift does not run the COPY JOB automatically.
Monitoring COPY JOB:
- When you want to see the job, run the below command:
1 |
SELECT * FROM sys_copy_job; |
- When you want to check the history of the job, run the below command:
1 |
SELECT * FROM sys_load_history where copy_job_id = <id>; |
- When you want to see exceptions, run the below command:
1 |
SELECT * FROM sys_load_errors where copy_job_id = <id>; |
Conclusion
By understanding and using these capabilities, users can optimize data loads in Amazon Redshift storage and increase performance and data integrity.
Drop a query if you have any questions regarding Amazon Redshift and we will get back to you quickly.
Pioneers in Cloud Consulting & Migration Services
- Reduced infrastructural costs
- Accelerated application deployment
About CloudThat
CloudThat is a leading provider of Cloud Training and Consulting services with a global presence in India, the USA, Asia, Europe, and Africa. Specializing in AWS, Microsoft Azure, GCP, VMware, Databricks, and more, the company serves mid-market and enterprise clients, offering comprehensive expertise in Cloud Migration, Data Platforms, DevOps, IoT, AI/ML, and more.
CloudThat is recognized as a top-tier partner with AWS and Microsoft, including the prestigious ‘Think Big’ partner award from AWS and the Microsoft Superstars FY 2023 award in Asia & India. Having trained 650k+ professionals in 500+ cloud certifications and completed 300+ consulting projects globally, CloudThat is an official AWS Advanced Consulting Partner, AWS Training Partner, AWS Migration Partner, AWS Data and Analytics Partner, AWS DevOps Competency Partner, Amazon QuickSight Service Delivery Partner, Amazon EKS Service Delivery Partner, Microsoft Gold Partner, AWS Microsoft Workload Partners, Amazon EC2 Service Delivery Partner, and many more.
To get started, go through our Consultancy page and Managed Services Package, CloudThat’s offerings.
FAQs
1. Does Amazon Redshift Serverless support COPY Command?
ANS: – Yes, COPY Command is supported by Amazon Redshift Serverless.
2. Does Amazon Redshift Serverless support COPY Job?
ANS: – Yes, COPY Job is supported by Amazon Redshift Serverless.
WRITTEN BY Lakshmi P Vardhini
Click to Comment