Data Loading in Amazon Redshift

Overview

In data analytics, Amazon Redshift has established itself as a cornerstone for organizations seeking powerful and scalable data warehousing solutions. A pivotal aspect of leveraging the full potential of Redshift lies in the efficient loading of data, and at the heart of this process is the COPY command. This blog delves into the significance and capabilities of Amazon Redshift’s COPY command, highlighting its role in simplifying and optimizing the data loading journey.

Empowering organizations to become ‘data driven’ enterprises with our Cloud experts.

Reduced infrastructure costs
Timely data-driven decisions

Get Started

Introduction

Amazon Redshift is a fully managed data warehouse service known for its speed, scalability, and simplicity. The COPY command is one key feature that lets users efficiently load data into Amazon Redshift. In this blog, we’ll explore the COPY command and look into the advanced capabilities Amazon Redshift COPY jobs offer.

COPY Command

The COPY command in Amazon Redshift plays a pivotal role in seamlessly loading large volumes of data from various sources into Redshift tables. Whether you’re dealing with CSV, JSON, or other supported file formats, the COPY command simplifies the data ingestion.

Features of COPY Command:

Direct Loading from Amazon S3:

The COPY command allows for seamless loading of data directly from Amazon S3, eliminating the need for intermediate staging tables. This direct integration makes data loading easy.

File conversion:

The COPY command supports various file conversions, such as timestamp formatting and file format conversion. This change ensures the loaded data stays consistent with the target table’s schema.

Error handling:

COPY provides a powerful error-handling mechanism. It logs errors and allows users to identify and resolve problems effectively. It also supports automatic rejection, where invalid rows can be sent to a separate table for further analysis.

COPY JOB

The COPY Job command is an extension to the COPY command that automatically loads data from an Amazon S3 bucket. When you create a COPY Job, Amazon Redshift detects when new Amazon S3 files are created in the specified path and uploads them without your intervention. Manage COPY commands that load data into tables.

LIMITATIONS OF COPY JOB:

COPY Job doesn’t support other credentials like access key or secret key; it only supports copy command with AWS IAM role.
COPY Job gets the data from the empty Amazon S3 bucket.
COPY Job doesn’t get already existing data.
You cannot specify an Amazon S3 file with the following columnar data formats: ORC and Parquet.

COPY JOB when loading data from Amazon S3 into Amazon Redshift:

COPY JOB: Run the below COPY job command to get the incremental data into Amazon Redshift from Amazon S3.

COPY sales
FROM ‘s3://<bucket-name>/<folder-name>’
IAM_ROLE ‘<arn of the role>’
FORMAT ‘<file-format>’
IGNOREHEADER <number>
DELIMITER ‘,’
JOB CREATE sales_job AUTO ON;

COPY sales

FROM ‘s3://<bucket-name>/<folder-name>’

IAM_ROLE ‘<arn of the role>’

FORMAT ‘<file-format>’

IGNOREHEADER <number>

DELIMITER ‘,’

JOB CREATE sales_job AUTO ON;

AUTO ON|OFF:

When ON, Amazon Redshift monitors the source Amazon S3 path for newly created files, and if found, a COPY command is run with the COPY parameters in the job definition. This is the default.

When OFF, Amazon Redshift does not run the COPY JOB automatically.

Monitoring COPY JOB:

When you want to see the job, run the below command:

SELECT * FROM sys_copy_job;

1	SELECT * FROM sys_copy_job;

When you want to check the history of the job, run the below command:

SELECT * FROM sys_load_history where copy_job_id = <id>;

1	SELECT * FROM sys_load_history where copy_job_id = <id>;

When you want to see exceptions, run the below command:

SELECT * FROM sys_load_errors where copy_job_id = <id>;

1	SELECT * FROM sys_load_errors where copy_job_id = <id>;

Conclusion

In the world of Amazon Redshift, the COPY Command and the COPY Job are essential tools for loading and managing large data sets. While the COPY command provides a simple yet powerful way to transfer data, the COPY Job increases functionality by providing advanced orchestration, monitoring, and job management.

By understanding and using these capabilities, users can optimize data loads in Amazon Redshift storage and increase performance and data integrity.

Drop a query if you have any questions regarding Amazon Redshift and we will get back to you quickly.

Pioneers in Cloud Consulting & Migration Services

Reduced infrastructural costs
Accelerated application deployment

Get Started

About CloudThat

CloudThat is a leading provider of Cloud Training and Consulting services with a global presence in India, the USA, Asia, Europe, and Africa. Specializing in AWS, Microsoft Azure, GCP, VMware, Databricks, and more, the company serves mid-market and enterprise clients, offering comprehensive expertise in Cloud Migration, Data Platforms, DevOps, IoT, AI/ML, and more.

CloudThat is recognized as a top-tier partner with AWS and Microsoft, including the prestigious ‘Think Big’ partner award from AWS and the Microsoft Superstars FY 2023 award in Asia & India. Having trained 650k+ professionals in 500+ cloud certifications and completed 300+ consulting projects globally, CloudThat is an official AWS Advanced Consulting Partner, AWS Training Partner, AWS Migration Partner, AWS Data and Analytics Partner, AWS DevOps Competency Partner, Amazon QuickSight Service Delivery Partner, Amazon EKS Service Delivery Partner, Microsoft Gold Partner, AWS Microsoft Workload Partners, Amazon EC2 Service Delivery Partner, and many more.

To get started, go through our Consultancy page and Managed Services Package, CloudThat’s offerings.