Voiced by Amazon Polly |
Overview
Debezium is an innovative open-source platform for Change Data Capture (CDC). It captures real-time data changes from various databases and transforms them into a stream of change events. Debezium supports popular databases like MySQL, PostgreSQL, and MongoDB and seamlessly integrates with Apache Kafka for efficient data streaming. With Debezium, organizations can unlock the power of real-time analytics, data integration, and event-driven architectures. Its resilience, low latency, and scalability make it an indispensable tool for capturing and leveraging database changes.
This entire solution is explained in 3 parts, in the 1st part, we have seen the creation of VPC and the installation of an Amazon EC2 machine (private) which can be SSH without using Bastion host that is with using Amazon EC2 Instance Connect (EIC) Endpoint, this is the 2nd part where we will be launching Amazon RDS, install Apache Kafka and configure Debezium on Private Amazon EC2.
Pioneers in Cloud Consulting & Migration Services
- Reduced infrastructural costs
- Accelerated application deployment
Apache Kafka
Apache Kafka is an innovative open source distributed streaming platform.
It has become the standard choice for building scalable, real-time data pipelines and event-driven architectures. Debezium will be integrated with Apache Kafka for efficient data streaming.
Amazon RDS MySQL
Amazon RDS MySQL is a popular managed relational database service that Amazon Web Services (AWS) provides. It offers a simplified and scalable solution for deploying and managing MySQL databases in the cloud. Amazon RDS MySQL automates time-consuming administrative tasks such as hardware provisioning, software patching, and database backups. It provides high availability through automated backups, software patching, and replication options. Amazon RDS MySQL is known for its reliability, performance, and ease of use, making it an excellent choice for applications requiring a robust and scalable MySQL database solution. We will use Amazon RDS MySQL to connect with Debezium Connector.
Steps to Launch Amazon RDS MySQL
Step 1: Launch an Amazon RDS MySQL
Goto Amazon RDS service -> Parameter Groups -> Create – as following
Step 2: Once created, go into it, click on edit parameters
Step 3: Binlog_format – > Select ROW and click on save changes
Step 4: Now let’s create Amazon RDS MySQL Database
Step 5: Select the VPC which is created earlier
Step 6: Enter the Initial database name, select the DB Parameter logbin which was created, and enable backup for 1 day for binlog purposes, or else it wouldn’t work.
Keep other as default (if required, edit) and click on create (it would take a few mins to create)
Steps to Install Apache Kafka and configure Debezium on Private Amazon EC2
Step 1: Let’s install Apache Kafka and configure Debezium on Private Amazon EC2
In part 1, we have SSH into Amazon EC2 using Amazon EC2 Instance Connect Endpoint, now let’s run these commands (NAT Gateway helps us to install these with the web)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
sudo su apt-get update apt-get install -y wget net-tools netcat tar openjdk-8-jdk wget https://archive.apache.org/dist/kafka/2.7.0/kafka_2.12-2.7.0.tgz tar -xzf kafka_2.12-2.7.0.tgz mv kafka_2.12-2.7.0 kafka cd ./kafka/config/ vi zookeeper.properties (edit the file, press i for insert mode) dataDir=/root/zookeeper (this needs to be added in place of dataDir, after replacing, press escape, and :wq to save the file and quit) rm -rf server.properties vi server.properties (i to insert mode and paste the below code, edit IP address – add private address of instance and press escape, :wq to save and quit) #starting of the code broker.id=0 # listeners = PLAINTEXT://your.host.name:9092 advertised.listeners=PLAINTEXT://PrivateIP:9092 zookeeper.connect=PrivateIP:2181 num.network.threads=3 num.io.threads=8 socket.send.buffer.bytes=102400 socket.receive.buffer.bytes=102400 socket.request.max.bytes=104857600 auto.create.topics.enable=true log.dirs=/home/ubuntu/kafka-logs num.partitions=1 num.recovery.threads.per.data.dir=1 offsets.topic.replication.factor=1 transaction.state.log.replication.factor=1 transaction.state.log.min.isr=1 log.retention.hours=168 log.segment.bytes=1073741824 log.retention.check.interval.ms=300000 zookeeper.connection.timeout.ms=6000 #end of the code cd /home/ubuntu wget https://repo1.maven.org/maven2/io/debezium/debezium-connector-mysql/1.3.1.Final/debezium-connector-mysql-1.3.1.Final-plugin.tar.gz tar -xvzf debezium-connector-mysql-1.3.1.Final-plugin.tar.gz cd kafka mkdir connect cd .. sudo mv debezium-connector-mysql ./kafka/connect vi ./kafka/config/connect-standalone.properties (edit these two lines) bootstrap.servers=EC2PrivateIPAddress:9092 plugin.path=/home/ubuntu/kafka/connect/ (add this) |
Step 2: Connect to MySQL
1 2 |
sudo apt install mysql-server mysql -h RDS-MySQL-Endpoint -u admin -p |
Password: enter your password which gave while creating
Show global variables like ‘log_bin’; show global variables like ‘binlog_format’;
1 2 3 4 |
use mysqldb; create table sample (id int, name varchar(20)); insert into sample values(2,'cloudthat'); select * from sample; |
1 |
GRANT ALL PRIVILEGES ON mysqldb.* TO 'admin'@'%'; |
Keep this tab like that and duplicate this tab for next steps
Step 3: Debezium configuration
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
vi ./kafka/config/connect-debezium-mysql.properties name=test-connector connector.class=io.debezium.connector.mysql.MySqlConnector database.hostname=RDS-ENDPOINT database.port=3306 database.user=admin database.password=adminpass database.server.id=1 database.server.name=mysql database.include.list=mysqldb table.include.list=mysqldb.sample database.history.kafka.bootstrap.servers= EC2PrivateIPAdress:9092 database.history.kafka.topic=dbhistory.test include.schema.changes=true tombstones.on.delete=false |
Step 4: Topic creation with Debenzium and listing
1 2 3 4 |
sudo systemctl restart mysql ./kafka/bin/zookeeper-server-start.sh -daemon ./kafka/config/zookeeper.properties ./kafka/bin/kafka-server-start.sh -daemon ./kafka/config/server.properties ./kafka/bin/kafka-topics.sh --list --bootstrap-server EC2PrivateIPAdress:9092 |
The above command doesn’t display any topics since we didn’t start the Debenzium connector
1 2 |
./kafka/bin/connect-standalone.sh ./kafka/config/connect-standalone.properties ./kafka/config/connect-debezium-mysql.properties - single command |
Launch duplicate tab of it and check if topics created
1 |
./kafka/bin/kafka-topics.sh --list --bootstrap-server EC2PrivateIPAdress:9092 |
We will do further process in part 3 where we will see CRUD operation on MySQL Database to check if Debezium is working.
Conclusion
In the above process, we have installed Apache Kafka on Amazon EC2, which has the Kafka topics to store the CDC data with the help of the Debezium connector. Multiple tables can be configured in the Debezium configuration file, separated by the databases as well. The topics will be created based on the tables specified in the configuration file. The CDC data will send to the respective topics.
Drop a query if you have any questions regarding Apache Kafka and we will get back to you quickly.
Making IT Networks Enterprise-ready – Cloud Management Services
- Accelerated cloud migration
- End-to-end view of the cloud environment
About CloudThat
CloudThat is an official AWS (Amazon Web Services) Advanced Consulting Partner and Training partner and Microsoft Gold Partner, helping people develop knowledge of the cloud and help their businesses aim for higher goals using best in industry cloud computing practices and expertise. We are on a mission to build a robust cloud computing ecosystem by disseminating knowledge on technological intricacies within the cloud space. Our blogs, webinars, case studies, and white papers enable all the stakeholders in the cloud computing sphere.
To get started, go through our Consultancy page and Managed Services Package that is CloudThat’s offerings.
FAQs
1. What if I get a port 8083 error?
ANS: – Use lsof -i :8083 and kill the process and run it again.
2. Can we have more than one table?
ANS: – Yes, we can use multiple tables.
WRITTEN BY Suresh Kumar Reddy
Yerraballi Suresh Kumar Reddy is working as a Research Associate - Data and AI/ML at CloudThat. He is a self-motivated and hard-working Cloud Data Science aspirant who is adept at using analytical tools for analyzing and extracting meaningful insights from data.
Click to Comment