Voiced by Amazon Polly |
1. Overview
Audio-to-text is the process of converting audio to textual format. For computer software and programs, audio files are near impossible to be used for analysis and to get the essential data out of it in a meaningful way. Therefore, there is a need to convert these audio files to text before they can be used for analysis.
Currently, there are many tools created by software providers who have created their models and algorithms to provide this speech-to-text as a service. In this blog, we will go through one such service provided by AWS for speech to text named AWS Transcribe.
Customized Cloud Solutions to Drive your Business Success
- Cloud Migration
- Devops
- AIML & IoT
2. Introduction to AWS Transcribe
Amazon Transcribe is a fully managed and continuously trained automatic speech recognition service that automatically generates time-stamped text transcripts from audio files. Amazon Transcribe makes it easy for developers to add speech-to-text capabilities to their applications. Audio data is virtually impossible for computers to search and analyze. Therefore, recorded speech needs to be converted to text before being used in applications. Historically, customers had to work with transcription providers that required them to sign expensive contracts and were hard to integrate into their technology stacks to accomplish this task. Many of these providers use outdated technology that does not adapt well to different scenarios, like low-fidelity phone audio standards in contact centers, which results in poor accuracy.
3. AWS Transcribe
We will make use of S3 triggers that will make it possible to automate transcribing from start to end. Below is a detailed overview of what we will accomplish in this article.
- Create a Lambda Role having access to the S3, Cloud Watch, and AWS Transcribe service
- Create an S3 bucket and an output bucket for AWS Transcribe.
- Create a Lambda function using python as a runtime to trigger AWS Transcribe whenever a new .mp3 file is uploaded to the input S3 bucket.
4. Setting up a Trigger on S3
Click on the ‘Add Trigger’ option on the lambda, select ‘S3’ as a source, and select the Event Type as ‘PUT.’ Prefix means the folder & suffix means the file type. We are considering only .mp3 files for the demo.
5. Lambda Code for Transcribing the Text and Storing the text file in S3
- Firstly, we will import the required libraries like boto3, requests, and JSON
- Increase the Lambda timeout from Configuration Settings; it is set to 3 secs by default.
- This code reads the Event and fetches the Bucket Name and File Name from the Event.
- Then we create an S3 URL which we are supposed to give for Transcribe Job
- We start the Transcription job and then get the details of the Transcription job
- For starting and getting the Transcription details, we are calling a function
- We fetch the Transcript File Url and other details which are needed from the JSON response
- To fetch the Transcribed data from the Url, we use requests and fetch the desired data.
- Then we make a text file and upload that text file to S3
- After execution of the code, we see a Text file in S3, and also, there will be a Transcription Job created in the AWS Transcribe service
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
import boto3 import json import requests s3 = boto3.client('s3') transcribe = boto3.client('transcribe') def lambda_handler(event, context): try: file_bucket = event['Records'][0]['s3']['bucket']['name'] file_name = event['Records'][0]['s3']['object']['key'] object_url = 'https://s3.amazonaws.com/{0}/{1}'.format(file_bucket, file_name) transcriptionJobDetails=startTranscriptionJob(file_name,object_url) status = getTranscriptionJob(file_name) url=status['TranscriptionJob']['Transcript']['TranscriptFileUri'] Text_Data = (requests.get(url).json())['results']['transcripts'][0]['transcript'] file = open(f"/tmp/{file_name}.txt", "w") file.write(Text_Data) file.close() s3.upload_file( Filename = f"/tmp/{file_name}.txt" , Bucket = "test-bucket-transcribe" , Key = f"{file_name}.txt" ) return Text_Data except Exception as e: raise e |
This function is used to Start the Transcription Job, It will call the Transcribe API and we are passing parameters to IdentifyLanguage Automatically of the Audio File.
1 2 3 4 5 6 7 8 9 |
def startTranscriptionJob(file_name,object_url): response = transcribe.start_transcription_job( TranscriptionJobName=file_name.replace('/','')[:10], IdentifyLanguage= True, MediaFormat='mp3', Media={ 'MediaFileUri': object_url }) return response |
This function is used to Get the Transcription Job Details; this will return a JSON response, from which we will fetch the desired results
1 2 3 4 5 6 7 8 |
def getTranscriptionJob(file_name): while True: status = transcribe.get_transcription_job( TranscriptionJobName=file_name.replace('/','')[:10] ) if status['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', 'FAILED']: break return status |
7. Conclusion
Now, if we upload an Mp3 file in our S3 bucket, our lambda will be triggered and after execution of our lambda, we will be able to see a text file in our S3 bucket containing the text Transcribed from the Audio File. This text can be used as per the business requirements for further processing and analysis. Also, this Transcribed text can be used for translation into different languages using the AWS Translate service.
8. UseCases:
- Get insights from customer conversations
With Amazon Transcribe, we can quickly gather insights from the conversations. Further, AWS Contact Center Intelligence partners and Contact Lens for Amazon Connect offer the best solution to improve customer engagement and increase agent productivity.
- Search and analyze media content
Content producers and media distributors can use Amazon Transcribe to automatically convert audio and video assets into a fully searchable archive for content, visual output, content rating, and monetization.
- Create subtitles and meeting notes
It helps to write down your wanted and stream content to increase reach and improve customer experience. Use Amazon Transcribe to improve productivity and accurately record meetings and discussions that are important to you.
- Improve clinical documentation
Physicians and clinicians can use Amazon Transcribe Medical to quickly and efficiently record clinical interviews on electronic health records (EHR) for analysis. HIPAA service – is qualified and trained to understand medical terms.
Get your new hires billable within 1-60 days. Experience our Capability Development Framework today.
- Cloud Training
- Customized Training
- Experiential Learning
About CloudThat
As a pioneer in the Cloud consulting realm, CloudThat is AWS (Amazon Web Services) Advanced Consulting Partner, AWS authorized Training Partner, Microsoft Gold Partner, and Winner of the Microsoft Asia Superstar Campaign for India: 2021. Our team has designed and delivered various Disaster Recovery strategies to our customers.
We are on a mission to build a robust cloud computing ecosystem by disseminating knowledge on technological intricacies within the cloud space. Our blogs, webinars, case studies, and white papers enable all the stakeholders in the cloud computing sphere to advance in their businesses.
To get started, go through our Expert Advisory page and Managed Services Package that is CloudThat’s offerings. Then, you can quickly get in touch with our highly accomplished team of experts to carry out your migration needs. Feel free to drop a comment or any queries that you have about Audio-to-text Automated Conversion, AWS Transcribe, or any other AWS services we will get back to you quickly.
FAQs
1. Does Amazon Transcribe support real-time transcriptions?
ANS: – Yes, Amazon Transcribe enables users to open a bidirectional stream over HTTP2. users can send an audio movement to the service while receiving textual content move-in go back in real-time.
2. Are there size restrictions on the audio content that Amazon Transcribe can process?
ANS: – Amazon Transcribe provider calls are constrained to four hours (or 2GB) in keeping with API for batch service. The streaming service can accommodate open connections as much as four hours long.
3. What languages can Amazon Transcribe automatically identify?
ANS: – Amazon Transcribe can identify any of the languages supported by the batch and streaming APIs.
4. Does Amazon Transcribe identify multiple languages in the same audio file?
ANS: – Amazon Transcribe only identifies the dominant language in an audio file.
WRITTEN BY Sanket Gaikwad
Click to Comment