- Consulting
- Training
- Partners
- About Us
x
Software Development
Amazon S3, AWS Glue, AWS Lambda, Amazon MSK
Streamlined data processing with AWS Glue, AWS Lambda, and Amazon Athena, deployed in Oregon and replicated in Singapore.
CustomFit.ai is an AI-powered, Intelligent, Precise Personalization platform for B2B websites. Established in 2019, it offers a no-code website personalization solution. The platform utilizes artificial intelligence to identify and understand individual visitors, enabling it to dynamically modify website content based on their preferences.
Manual Intervention Reduced
Reduction in Amazon Athena query cost
Streamlined access to large datasets, saving time and cost
The client is facing an issue with the data management. They are encountering difficulties in applying complex filtering, which involves joining multiple tables. To resolve this, we created a robust data pipeline that automates the extraction, transformation, and loading (ETL) process. This pipeline will efficiently convert JSON data into a structured format suitable for querying and analysis, handle increasing volumes of data, and accommodate future growth.
• The solution is deployed in the Oregon region, with Amazon S3 buckets replicated in Singapore.
• Data is sourced from Amazon MSK via an Amazon MSK sink connector for Amazon S3, and it is stored as a single file in a day-wise partition in a staging bucket.
• An AWS Glue Crawler runs on this bucket, creating a raw table in the database.
• There are five glue jobs, which run on top of the raw table daily, extracting and storing files in 5 separate partitions in a processed data bucket. The extraction is done based on the keys extracted from the payload.
• The sub-partitions are created in the 5 main partitions, which are further partitioned into a day-wise basis.
• Everyday crawlers run on the main partitions populating the AWS Glue Catalog. The crawler is dynamically created using AWS Lambda.
• The processed data bucket is used as an event notification trigger for the AWS Lambda to create the crawlers if required.
• Amazon Athena creates the dataset based on the tables created in the glue catalog for the main data partition key. i.e. n tables for n partitions.
Automated Glue jobs streamline data processing, partitioning reduces scans by 40%, and the migration pipeline is fully automated.
Our support doesn't end here. We have monthly newsletters, study guides, practice questions, and more to assist you in upgrading your cloud career. Subscribe to get them all!