Voiced by Amazon Polly |
Introduction
Convolutional neural networks (CNNs) are deep learning algorithm that has revolutionized image and video recognition tasks.
These filters can detect the input image’s edges, corners, or other patterns. As we move deeper into the network, each layer combines and recombines features extracted by previous layers to form more complex representations. Finally, these representations are fed into fully connected layers for classification. CNNs have been used successfully in various applications, such as object detection, facial recognition, and natural language processing.
Their ability to learn complex features from raw data has made them an essential tool for machine learning practitioners working with image or video data. Convolutional neural networks (CNNs) architecture is designed to process images and other types of multidimensional data effectively. A typical CNN consists of multiple layers, including convolutional, pooling, and fully connected layers. Convolutional layers are the backbone of CNNs and use a set of learnable filters to extract features from input images.
Fully connected layers are used at the end of a CNN to perform classification or regression tasks based on the extracted features. These layers connect every neuron in one layer to every neuron in the next layer. Overall, the architecture and components of CNNs allow for efficient processing and analysis of complex visual data such as images and videos. Convolutional neural networks (CNNs) are a type of deep learning algorithm that has revolutionized image and video recognition.
Pioneers in Cloud Consulting & Migration Services
- Reduced infrastructural costs
- Accelerated application deployment
A Sample Python Code for Implementing CNN
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 |
import mnist import numpy as np from conv import Conv3x3 from maxpool import MaxPool2 from softmax import Softmax train_images = mnist.train_images()[:1000] train_labels = mnist.train_labels()[:1000] test_images = mnist.test_images()[:1000] test_labels = mnist.test_labels()[:1000] conv = Conv3x3(8) # 28x28x1 -> 26x26x8 pool = MaxPool2() # 26x26x8 -> 13x13x8 softmax = Softmax(13 * 13 * 8, 10) # 13x13x8 -> 10 def forward(image, label): ''' Completes a forward pass of the CNN and calculates the accuracy and cross-entropy loss. - image is a 2d numpy array - label is a digit ''' # We transform the image from [0, 255] to [-0.5, 0.5] to make it easier # to work with. This is standard practice. out = conv.forward((image / 255) - 0.5) out = pool.forward(out) out = softmax.forward(out) # Calculate cross-entropy loss and accuracy. np.log() is the natural log. loss = -np.log(out[label]) acc = 1 if np.argmax(out) == label else 0 return out, loss, acc def train(im, label, lr=.005): ''' Completes a full training step on the given image and label. Returns the cross-entropy loss and accuracy. - image is a 2d numpy array - label is a digit - lr is the learning rate ''' # Forward out, loss, acc = forward(im, label) # Calculate initial gradient gradient = np.zeros(10) gradient[label] = -1 / out[label] # Backprop gradient = softmax.backprop(gradient, lr) gradient = pool.backprop(gradient) gradient = conv.backprop(gradient, lr) return loss, acc print('MNIST CNN initialized!') # Train the CNN for 3 epochs for epoch in range(3): print('--- Epoch %d ---' % (epoch + 1)) # Shuffle the training data permutation = np.random.permutation(len(train_images)) train_images = train_images[permutation] train_labels = train_labels[permutation] # Train! loss = 0 num_correct = 0 for i, (im, label) in enumerate(zip(train_images, train_labels)): if i % 100 == 99: print( '[Step %d] Past 100 steps: Average Loss %.3f | Accuracy: %d%%' % (i + 1, loss / 100, num_correct) ) loss = 0 num_correct = 0 l, acc = train(im, label) loss += l num_correct += acc # Test the CNN print('\n--- Testing the CNN ---') loss = 0 num_correct = 0 for im, label in zip(test_images, test_labels): _, l, acc = forward(im, label) loss += l num_correct += acc num_tests = len(test_images) print('Test Loss:', loss / num_tests) print('Test Accuracy:', num_correct / num_tests) |
Applications
- CNNs are used to identify objects on the road, such as other vehicles, pedestrians, traffic lights, etc. This helps the vehicle make informed decisions about its surroundings and navigate safely through traffic. Overall, object recognition and classification using CNNs have various applications across various industries, such as security surveillance systems, healthcare diagnostics, the retail industry, etc.
- CNNs can identify patterns in medical images that are difficult for humans to detect. This is particularly useful in identifying tumors and other abnormalities. For instance, by analyzing mammograms, CNNs can help radiologists detect early signs of breast cancer. In addition to identifying diseases, CNNs can also help doctors plan treatments. They can analyze CT scans to determine the size and location of a tumor, which helps doctors plan radiation therapy or surgery.
- By providing accurate diagnoses and treatment plans, these algorithms have significantly improved patient outcomes while reducing the burden on healthcare providers. In recent years, convolutional neural networks (CNNs) have proven to be very effective for analyzing and classifying natural language data. CNNs are used in natural language processing (NLP) applications such as sentiment analysis and automatic translation.
- Sentiment analysis involves identifying the emotional tone of a piece of text or speech, which is useful for businesses to understand their customer’s feedback. The automatic translation uses NLP techniques to translate from one language to another, which is essential for global communication. CNNs are also used for text classification tasks such as spam filtering and topic modeling. Topic modeling automatically discovers hidden topics in large collections of documents.
Conclusion
CNNs are designed to mimic how the human brain processes visual information, making them well-suited for object detection, facial recognition, and image classification. One of the most important applications of CNNs in image recognition is object detection. By analyzing an image with multiple layers of filters, a CNN can identify specific objects within an image and locate them within the frame.
Empowering organizations to become ‘data driven’ enterprises with our Cloud experts.
- Reduced infrastructure costs
- Timely data-driven decisions
About CloudThat
CloudThat is an official AWS (Amazon Web Services) Advanced Consulting Partner and Training partner and Microsoft Gold Partner, helping people develop knowledge of the cloud and help their businesses aim for higher goals using best-in-industry cloud computing practices and expertise. We are on a mission to build a robust cloud computing ecosystem by disseminating knowledge on technological intricacies within the cloud space. Our blogs, webinars, case studies, and white papers enable all the stakeholders in the cloud computing sphere.
Drop a query if you have any questions regarding CNN, I will get back to you quickly.
To get started, go through our Consultancy page and Managed Services Package, CloudThat’s offerings.
FAQs
1. Why do we use a Pooling Layer in a CNN?
ANS: –
- CNN uses pooling layers to reduce the size of the input image to speed up the computation of the network.
- It is applied after convolution and RELU operations.
- It reduces the dimension of each feature map by retaining the most important information.
- Since the number of hidden layers required to learn the complex relations present in the image would be large.
- As a result of pooling, even if the picture were a little tilted, the largest number in a certain region of the feature map would have been recorded.
2. What is the feature map size for a given input size image, Filter Size, Stride, and Padding amount?
ANS: – Stride tells us how many pixels we will jump when convolving filters. If our input image has a size of n x n and filters size f x f and p is the Padding amount, and s is the Stride, then the dimension of the feature map is given by: Dimension = floor[ ((n-f+2p)/s)+1] x floor[ ((n-f+2p)/s)+1]
WRITTEN BY Neetika Gupta
Neetika Gupta works as a Senior Research Associate in CloudThat has the experience to deploy multiple Data Science Projects into multiple cloud frameworks. She has deployed end-to-end AI applications for Business Requirements on Cloud frameworks like AWS, AZURE, and GCP and Deployed Scalable applications using CI/CD Pipelines.
Click to Comment