Voiced by Amazon Polly |
Overview
So far, we have seen in Part 1 what is Mphasis DeepInsights Text Summarizer and its Applications in a real-world scenario. Now we will implement its algorithm with the steps below:
To run the Text Summarizer Algorithm, we need to access the following AWS Services:
- Access to AWS SageMaker and the model package.
- An S3 bucket to specify input/output.
- A role for AWS SageMaker to access input/output from S3.
Customized Cloud Solutions to Drive your Business Success
- Cloud Migration
- Devops
- AIML & IoT
Implementation of Text Summarizer Algorithm
Usage Information
Usage Methodology for the algorithm:
- Input should have a ‘.txt’ extension with ‘utf-8’ encoding.
- Note- Model performance will interrupt if the file ‘.txt’ is not ‘utf-8’ encoded.
- To ensure that the input data is ‘UTF-8’ encoded, please ‘Save As’ using Encoding as ‘UTF-8’
- The input can have a maximum of 512 words (which is the Sagemaker limit)
- Input should contain a minimum of 3 sentences (Model restriction)
- Supported content types: text/plain.
Invoking Endpoint
Python
1 2 3 4 5 6 |
Real-time inference : sample = 'input text file url location' Transformer_x = model.transformer(1, 'ml.m5.xlarge') Transformer_x.transform(sample, content_type="text") Transformer_x.wait() print("Batch Transform final output " + transformer.output_url) |
Set up the Environment
- Update Boto Client and AWS SDK
- Initializing API in AWS Sagemaker to update Boto Client and AWS SDK, the new cells set it up to invoke the launched API.
Private Beta Setup
The private beta is limited to the us-east-2 region. The client we are setting up will only be hard-coded for the us-east-2 endpoint.
Sample input data
1 2 3 |
with open('./self_driving_test.txt', 'rb') as file_stream: input_text = file_stream.read().decode('utf-8') print(input_text) |
Output:
Create the session
The session remembers our connection parameters to SageMaker. We will use it to perform all of our SageMaker operations.
1 2 3 4 5 6 7 8 9 10 |
import sagemaker as sage from time import gmtime, strftime from sagemaker import get_execution_role sess = sage.Session() role = get_execution_role() |
Create Model
Now we use the Model Package to produce a model
1 2 3 4 5 6 7 8 9 10 |
model_package_arn = 'arn:aws:sagemaker:us-east-2:786796469737:model-package/marketplace-text-summarizer-11-4' from sagemaker import ModelPackage import sagemaker as sage from sagemaker import get_execution_role role = get_execution_role() sagemaker_session = sage.Session() model = ModelPackage(model_package_arn=model_package_arn, role = role, sagemaker_session = sagemaker_session) |
Input File
Now we pull a sample input train for testing the model.
1 |
sample_txt="s3://aws-marketplace-mphasis-assets/Text Summarizer/self_driving.txt" |
Batch Transform Job
Now let’s use the model erected to run a batch conclusion job and corroborate it works.
1 2 3 4 5 6 7 8 |
import json import uuid transformer = model.transformer(1, 'ml.m5.xlarge') transformer.transform(sample_txt, content_type='text/plain') transformer.wait() #transformer.output_path print("Batch Transform complete") |
Output from Batch Transform
Note The following package is installed on the original system boto3
1 2 3 4 5 6 7 8 |
print(transformer.output_path) bucketFolder = transformer.output_path.rsplit('/')[3] #print(s3bucket,s3prefix) s3_conn = boto3.client("s3") bucket_name="sagemaker-us-east-2-786796469737" with open('result.txt', 'wb') as f: s3_conn.download_fileobj(bucket_name,bucketFolder+'/self_driving.txt.out', f) print("Output file loaded from bucket") |
Output:
s3://sagemaker-us-east-2-786796469737/marketplace-text-summarizer-11-4-2020-0-2020-04-11-17-47-35-070
Output file loaded from the bucket
1 2 3 4 5 |
with open('./result.txt', 'rb') as file_stream: output_text = file_stream.read().decode('utf-8') print(output_text) |
Output:
Invoking through Endpoint
This is another way of planting the model that provides results as the real-time conclusion. Then’s a sample endpoint for reference.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 |
import json import uuid from sagemaker import ModelPackage import sagemaker as sage from sagemaker import get_execution_role from sagemaker import ModelPackage import boto3 from IPython.display import Image from PIL import Image as ImageEdit role = get_execution_role() sagemaker_session = sage.Session() bucket=sagemaker_session.default_bucket() content_type='text/plain' model_name='summarizer-model' real_time_inference_instance_type='ml.c4.2xlarge' model_package_arn = 'arn:aws:sagemaker:us-east-2:786796469737:model-package/marketplace-text-summarizer-11-4' from sagemaker import ModelPackage import sagemaker as sage from sagemaker import get_execution_role role = get_execution_role() sagemaker_session = sage.Session() def predict_wrapper(endpoint, session): return sage.RealTimePredictor(endpoint, session,content_type=content_type) #create a deployable model from the model package. model = ModelPackage(role=role, model_package_arn=model_package_arn, sagemaker_session=sagemaker_session, predictor_cls=predict_wrapper) predictor = model.deploy(1, real_time_inference_instance_type, endpoint_name=model_name) Invoking endpoint result through python code f = open('./self_driving_test.txt', mode='r') data=f.read() prediction = predictor.predict(data) from io import StringIO s=str(prediction,'utf-8') data = StringIO(s) print(data.read()) |
Output:
Conclusion
Therefore, we’ve seen how we’ve got useful information from the long textbook using AWS Text Summarizer. The intention is to produce a coherent and fluent summary having only the main points outlined in the document. Furthermore, applying textbook summarization reduces reading time, accelerates the process of probing for information, and increases the quantum of information that can fit in an area. The introductory idea is to count the frequency of the words occurring in the textbook and assume that the loftiest occurring words are important given the occurrence threshold and grounded upon it, epitomizing the textbook.
Get your new hires billable within 1-60 days. Experience our Capability Development Framework today.
- Cloud Training
- Customized Training
- Experiential Learning
About CloudThat
CloudThat is also the official AWS (Amazon Web Services) Advanced Consulting Partner and Training partner and Microsoft gold partner, helping people develop knowledge of the cloud and help their businesses aim for higher goals using best in industry cloud computing practices and expertise. We are on a mission to build a robust cloud computing ecosystem by disseminating knowledge on technological intricacies within the cloud space. Our blogs, webinars, case studies, and white papers enable all the stakeholders in the cloud computing sphere.
Drop a query if you have any questions regarding Amazon Text Summarizer and I will get back to you quickly.
To get started, go through our Consultancy page and Managed Services Package that is CloudThat’s offerings.
FAQs
1. Which algorithm is used in text summarization?
ANS: – Text summarization using the frequency system In this system, we find the frequency of all the words in our textbook data and store the textbook data and its frequency in a wordbook. After that, we tokenize our textbook data. The rulings which contain further high-frequency words will be kept in our final summary data.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 |
import pandas as pd import numpy as np data = "my name is neetika gupta. It's my pleasure to got occasion to write composition for abc related to nlp" from nltk.tokenize import word_tokenize, sent_tokenize from nltk.corpus import stopwords def solve(text): stopwords1 = set(stopwords.words("english")) words = word_tokenize(text) freqTable = {} for word in words: word = word.lower() if word in stopwords1: continue if word in freqTable: freqTable[word] += 1 else : freqTable[word] = 1 sentences = sent_tokenize(text) sentenceValue = {} for sentence in sentences: for word, freq in freqTable.items(): if word in sentence.lower(): if sentence in sentenceValue: sentenceValue[sentence] += freq else : sentenceValue[sentence] = freq sumValues = 0 for sentence in sentenceValue: sumValues += sentenceValue[sentence] average = int(sumValues / len(sentenceValue)) summary = '' for sentence in sentences: if (sentence in sentenceValue) and(sentenceValue[sentence] > (1.2 * average)): summary += "" + sentence return summary |
2. How is automatic summarization of text helpful?
ANS: – Automatic textbook summarization is an instigative exploration area with several operations on the assiduity. By condensing large amounts of information into short, summarization can prop numerous downstream operations, like creating news abridgments, report generation, news summarization, and caption generation. Summarization is the task of compressing text into a shorter version, reducing the size of the source text while preserving important elements of the information and the meaning of the content. Since manual text summarization is time-consuming and often tedious, automated tasks are gaining popularity, thus providing a strong impetus for academic research. Text summarization has important uses in various NLP-related tasks, such as B, text classification, question answering, legal text summarization, news summarization, and headline news production. Furthermore, these systems can be integrated with creating summaries as an intermediate step, which helps reduce document length. In the era of big data, the amount of text data from various sources is exploding. This text is an invaluable source of information that must be effectively summarized to be useful. The increase in document availability calls for extensive research in the field of NLP for automatic text summarization. Automatic text summarization is creating concise and fluent summaries without human intervention while preserving the meaning of the original text document. This is very challenging because, as humans, when we summarize a text, we usually read it in its entirety to deepen our understanding and then write a key-point summary. Since computers lack human knowledge and language skills, automatic text summarization becomes difficult and non-trivial. Various machine learning-based models have been proposed for this task. Most of these methods model this problem as a classification problem, returning whether a sentence should be included in the summary. Other methods use topic information, Latent Semantic Analysis (LSA), sequence-to-sequence models, reinforcement learning, and adversarial procedures.
WRITTEN BY Neetika Gupta
Neetika Gupta works as a Senior Research Associate in CloudThat has the experience to deploy multiple Data Science Projects into multiple cloud frameworks. She has deployed end-to-end AI applications for Business Requirements on Cloud frameworks like AWS, AZURE, and GCP and Deployed Scalable applications using CI/CD Pipelines.
Click to Comment