AWS, Cloud Computing, Data Analytics

3 Mins Read

A Guide to Install Pandas Library on Amazon MWAA

Voiced by Amazon Polly

Introduction

Amazon Managed Workflows for Apache Airflow (MWAA) is a fully managed service that makes building and running workflow automation on AWS easy.

When hosting MWAA in a private network, installing additional Python libraries like Pandas requires a specific setup due to the restricted internet access.

This blog post provides a detailed step-by-step guide to help you install the Pandas library on MWAA hosted in a private network.

Managed Workflows for Apache Airflow (MWAA) simplifies the deployment and management of Airflow, a popular tool for orchestrating complex data workflows. However, extending MWAA’s functionality by adding custom Python libraries like Pandas can be challenging, especially when the environment is hosted in a private network with no direct internet access. This guide walks you through installing Pandas on MWAA while ensuring compliance with the network restrictions.

Prerequisites

Before you start, ensure you have the following:

  • An AWS account with appropriate permissions to create and manage MWAA environments and Amazon S3 buckets.
  • An existing MWAA environment hosted in a private network.
  • Basic knowledge of Amazon S3, AWS IAM roles, and Amazon VPC configurations.

Pioneers in Cloud Consulting & Migration Services

  • Reduced infrastructural costs
  • Accelerated application deployment
Get Started

Step-by-Step Guide

  1. Prepare a Custom Requirements File

Create a requirements.txt file that lists Pandas and any other dependencies you might need. This file will install the necessary packages in your MWAA environment.

  1. Set Up an Amazon S3 Bucket for Dependencies

Create an Amazon S3 Bucket: If you don’t already have an Amazon S3 bucket for your MWAA environment, create one. This bucket will be used to store your requirements file.

Upload the Requirements File: Upload the requirements.txt file to the Amazon S3 bucket.

  1. Configure the MWAA Environment
  • Navigate to the MWAA Console: Open the Amazon MWAA console and select your MWAA environment.
  • Update the Environment Configuration: Go to the Environment details section.

In the Python requirements file field, specify the Amazon S3 path to your requirements.txt file, for example: s3://your-bucket-name/requirements.txt.

  • Save Changes: Save the changes and wait for the environment to update. This may take a few minutes.

4. Verify Pandas Installation

To verify that Pandas has been successfully installed, you can create a simple DAG that imports and uses Pandas. Here’s an example:

  • How to Install Pandas Library on Amazon MWAA Deploy the DAG: Save the DAG file in your DAGs folder (usually in the Amazon S3 bucket associated with your MWAA environment).
  • Trigger the DAG: Trigger the DAG from the Airflow UI to ensure it runs successfully.

Handling Private Network Restrictions

Suppose your MWAA environment is hosted in a private network. In that case, you need to ensure that the necessary endpoints and permissions are in place to allow the environment to access the Amazon S3 bucket and other required services:

Amazon VPC Endpoints: Ensure that your Amazon VPC has the following endpoints configured:

Amazon S3 Endpoint: To allow access to the Amazon S3 bucket.

Other Service Endpoints: Depending on your specific requirements.

AWS IAM Roles and Policies: The AWS IAM role associated with your MWAA environment should have the necessary permissions to access the Amazon S3 bucket and any other AWS resources required.

Conclusion

Installing the Pandas library on MWAA hosted in a private network requires a few additional steps compared to a public network setup. By preparing a custom requirements file, configuring your Amazon S3 bucket, and ensuring the necessary network endpoints and permissions are in place, you can successfully extend your MWAA environment’s functionality with Pandas. This guide provides a comprehensive overview to help you navigate the installation process smoothly.

By following these steps, you can take full advantage of the powerful data manipulation capabilities of Pandas within your MWAA workflows, enabling more efficient and effective data processing.

Drop a query if you have any questions regarding Pandas and we will get back to you quickly.

Making IT Networks Enterprise-ready – Cloud Management Services

  • Accelerated cloud migration
  • End-to-end view of the cloud environment
Get Started

About CloudThat

CloudThat is a leading provider of Cloud Training and Consulting services with a global presence in India, the USA, Asia, Europe, and Africa. Specializing in AWS, Microsoft Azure, GCP, VMware, Databricks, and more, the company serves mid-market and enterprise clients, offering comprehensive expertise in Cloud Migration, Data Platforms, DevOps, IoT, AI/ML, and more.

CloudThat is recognized as a top-tier partner with AWS and Microsoft, including the prestigious ‘Think Big’ partner award from AWS and the Microsoft Superstars FY 2023 award in Asia & India. Having trained 650k+ professionals in 500+ cloud certifications and completed 300+ consulting projects globally, CloudThat is an official AWS Advanced Consulting Partner, Microsoft Gold Partner, AWS Training PartnerAWS Migration PartnerAWS Data and Analytics PartnerAWS DevOps Competency PartnerAmazon QuickSight Service Delivery PartnerAmazon EKS Service Delivery PartnerAWS Microsoft Workload PartnersAmazon EC2 Service Delivery Partner, and many more.

To get started, go through our Consultancy page and Managed Services PackageCloudThat’s offerings.

FAQs

1. What if my MWAA environment doesn't update with the new requirements file?

ANS: – Ensure the Amazon S3 URI is correct and your requirements.txt file is accessible. Also, make sure there are no syntax errors in the file. If the problem persists, check the MWAA logs for any errors.

2. How do I specify a specific version of Pandas in the requirements file?

ANS: – To specify a specific Pandas version, use the == operator, like pandas==1.3.3.

3. How long does it take for MWAA to update after changing the requirements file?

ANS: – The update process can take several minutes, depending on the size of the requirements file and the current load on your MWAA environment.

WRITTEN BY Sunil H G

Sunil H G is a highly skilled and motivated Research Associate at CloudThat. He is an expert in working with popular data analysis and visualization libraries such as Pandas, Numpy, Matplotlib, and Seaborn. He has a strong background in data science and can effectively communicate complex data insights to both technical and non-technical audiences. Sunil's dedication to continuous learning, problem-solving skills, and passion for data-driven solutions make him a valuable asset to any team.

Share

Comments

    Click to Comment

Get The Most Out Of Us

Our support doesn't end here. We have monthly newsletters, study guides, practice questions, and more to assist you in upgrading your cloud career. Subscribe to get them all!