AWS, Cloud Computing

3 Mins Read

Streamlining Analytics with Amazon’s Zero-ETL Integration for Amazon DynamoDB and Amazon Redshift

Voiced by Amazon Polly

Introduction

Amazon recently announced the general availability (GA) of its zero-ETL integration between Amazon DynamoDB and Amazon Redshift. This integration allows users to run analytics on Amazon DynamoDB data within Amazon Redshift without building and maintaining complex data pipelines. With zero-ETL (Extract, Transform, Load), data written into Amazon DynamoDB table is automatically available in Amazon Redshift, facilitating analytics with minimal impact on Amazon DynamoDB’s performance.

Pioneers in Cloud Consulting & Migration Services

  • Reduced infrastructural costs
  • Accelerated application deployment
Get Started

Zero-ETL Integration

Zero-ETL integration transfers data directly from one system to another without requiring the traditional ETL process. In the case of Amazon DynamoDB and Amazon Redshift, this integration automates the movement of data from Amazon DynamoDB tables to Amazon Redshift for analytics.

It supports high-performance SQL queries, machine learning, data sharing, and cross-database joins. Zero-ETL simplifies the ETL pipelines, making analytics more efficient and less prone to operational issues.

Benefits of Zero-ETL Integration

This integration enables seamless data replication from Amazon DynamoDB to Amazon Redshift, eliminating the need for manual data pipelines and incremental data updates every 15-30 minutes. It facilitates point-to-point data movement without affecting Amazon DynamoDB performance. The initial data transfer is a full load, while subsequent changes are captured incrementally. Multiple Amazon DynamoDB tables can be integrated into a single Redshift cluster or serverless workgroup, providing a unified view of data from various sources.

How It Works?

Data replication happens with little to no performance impact on Amazon DynamoDB, and no additional read capacity units are consumed. As the integration is fully managed, users can continue using Amazon DynamoDB for operational workloads while the data is simultaneously replicated to Amazon Redshift for analytics. This integration supports managing configurations via the AWS CLI, SDKs, APIs, or Management Console.

Prerequisites for Setting Up the Integration

Before setting up zero-ETL integration, certain prerequisites must be met:

  1. Enable Point-in-Time Recovery (PITR): The source Amazon DynamoDB table needs PITR enabled for data consistency and backups.
  2. Enable Case Sensitivity for Amazon Redshift: The target Amazon Redshift database must enable case sensitivity.
  3. Configure AWS IAM Policies: Attach necessary resource-based policies for both Amazon DynamoDB and Amazon Redshift, ensuring proper permissions for data replication.

Creating the Integration

The integration can be created via either the Amazon DynamoDB or Amazon Redshift console. Steps involve:

  1. Selecting a Source Table: Choose the Amazon DynamoDB table for replication. Each table requires a separate integration.
  2. Configuring Amazon Redshift as the Target: Select the target Amazon Redshift data warehouse, which can be in the same or a different AWS account.
  3. Handling Prerequisite Configurations Automatically: The console provides options to enable PITR or update resource policies if they are not already configured.

Data Structure in Amazon Redshift

Once the integration is active, a new database is created in Amazon Redshift, where a table is replicated under the default schema. The replicated table follows Amazon DynamoDB’s structure with columns for partition key, sort key, and a SUPER column that contains all other attributes in Amazon DynamoDB JSON format. The partition key serves as the distribution key, and the combination of partition and sort keys is used for sorting in Redshift. Users can change the sort key settings as needed.

Querying and Validating Data

Data can be queried in Amazon Redshift using SQL, and incremental updates can be verified in near real-time. The SUPER data type allows working with semi-structured data, making it possible to extract specific attributes using Amazon Redshift’s PartiQL SQL support. Incremental updates, such as inserting, deleting, or modifying items in Amazon DynamoDB, are automatically reflected in Amazon Redshift.

Materialized Views for Analytics

For analytics, materialized views can be created on the replicated tables. These views provide optimized data access by automatically refreshing with changes in the underlying data, thus reducing query execution times. They are particularly useful for dashboards and reports that require frequent data aggregation or transformation.

Monitoring and Metrics

Users can monitor the integration’s performance through the Amazon Redshift console or Amazon CloudWatch. Available metrics include data transfer rates, lag times, and table statistics. System views such as SVV_INTEGRATION, and SYS_INTEGRATION_ACTIVITY provide detailed insights into the integration’s configuration and performance.

Pricing Considerations

There are no additional charges specifically for the zero-ETL integration. However, costs associated with Amazon DynamoDB PITR, data exports, Amazon Redshift storage, and compute resources still apply.

Cleaning Up

Users can delete the zero-ETL integration from the Amazon Redshift console to stop data replication. This action stops future data transfers but does not remove existing data from Amazon DynamoDB or Amazon Redshift.

Conclusion

The zero-ETL integration simplifies data analytics by automating data transfer from Amazon DynamoDB to Amazon Redshift, eliminating traditional ETL complexities. This streamlined approach allows organizations to gain insights across multiple applications and reduce operational overhead while improving cost efficiency.

Drop a query if you have any questions regarding Amazon DynamoDB, Amazon Redshift or Zero-ETL and we will get back to you quickly.

Making IT Networks Enterprise-ready – Cloud Management Services

  • Accelerated cloud migration
  • End-to-end view of the cloud environment
Get Started

About CloudThat

CloudThat is a leading provider of Cloud Training and Consulting services with a global presence in India, the USA, Asia, Europe, and Africa. Specializing in AWS, Microsoft Azure, GCP, VMware, Databricks, and more, the company serves mid-market and enterprise clients, offering comprehensive expertise in Cloud Migration, Data Platforms, DevOps, IoT, AI/ML, and more.

CloudThat is the first Indian Company to win the prestigious Microsoft Partner 2024 Award and is recognized as a top-tier partner with AWS and Microsoft, including the prestigious ‘Think Big’ partner award from AWS and the Microsoft Superstars FY 2023 award in Asia & India. Having trained 650k+ professionals in 500+ cloud certifications and completed 300+ consulting projects globally, CloudThat is an official AWS Advanced Consulting Partner, Microsoft Gold Partner, AWS Training PartnerAWS Migration PartnerAWS Data and Analytics PartnerAWS DevOps Competency PartnerAWS GenAI Competency PartnerAmazon QuickSight Service Delivery PartnerAmazon EKS Service Delivery Partner AWS Microsoft Workload PartnersAmazon EC2 Service Delivery PartnerAmazon ECS Service Delivery PartnerAWS Glue Service Delivery PartnerAmazon Redshift Service Delivery PartnerAWS Control Tower Service Delivery PartnerAWS WAF Service Delivery PartnerAmazon CloudFront and many more.

To get started, go through our Consultancy page and Managed Services PackageCloudThat’s offerings.

FAQs

1. How does the Zero-ETL integration benefit data engineers and analysts?

ANS: – Zero-ETL integration saves time and effort by automating the data replication process between Amazon DynamoDB and Amazon Redshift. It allows data engineers to focus on building analytics solutions rather than managing complex ETL workflows. It provides timely access to the most current data for data analysts, enabling more accurate and real-time analysis.

2. Can Zero-ETL integration handle large-scale data replication?

ANS: – Yes, Zero-ETL integration is designed to handle large-scale data replication. It supports automatic scaling to manage high volumes of data and frequent updates, ensuring that even large Amazon DynamoDB tables can be efficiently synchronized with Amazon Redshift.

WRITTEN BY Rachana Kampli

Share

Comments

    Click to Comment

Get The Most Out Of Us

Our support doesn't end here. We have monthly newsletters, study guides, practice questions, and more to assist you in upgrading your cloud career. Subscribe to get them all!