Exploring Cinematic Treasures with Amazon Titan Multimodal Embeddings

Introduction

In the vast realm of cinematic experiences, finding that one movie you vaguely remember but can’t quite recall the name of is a common predicament. Perhaps you remember specific scenes, such as a man with a hat amidst flames, a woman driving a pink car, or a raccoon on an alien planet, but the movie title remains elusive. AWS has introduced a groundbreaking solution: Amazon Titan Multimodal Embeddings. This advanced technology combines textual and visual data to create rich representations of multimedia content, revolutionizing how we search for movies.

Pioneers in Cloud Consulting & Migration Services

Reduced infrastructural costs
Accelerated application deployment

Get Started

Understanding Multimodal Embeddings

Amazon Titan Multimodal Embeddings harnesses the power of both textual and visual data to generate embeddings—numeric representations that capture semantic meaning and relationships between diverse types of data. Amazon Titan enables sophisticated search capabilities that transcend traditional keyword-based queries by encoding movie titles and posters into embeddings.

The Problem: Lost in Movie Limbo

Imagine you watched a captivating movie but failed to jot down its title. All you remember is a distinct scene—a man wearing a hat against a backdrop of flames. Without the title, finding the movie amidst thousands of options seems daunting. This is where Amazon Titan comes to the rescue.

The Solution: Multimodal Search

We can now search for movies based on textual descriptions or visual cues using Amazon Titan Multimodal Embeddings. You vaguely recall a movie featuring a man with a hat and fire in the background. AWS Titan analyzes the semantic meaning and visual attributes by inputting this description into the search engine, retrieving relevant matches like “Oppenheimer” or “V for Vendetta.”

Similarly, if you remember a scene with a woman driving a pink car, you can input this description to find movies like “Barbie” or “Legally Blonde.” Even abstract descriptions like “a raccoon and a tree with a face on an alien planet” can lead to relevant movie suggestions, such as “Guardians of the Galaxy.”

Implementation with MovieLens Data

To demonstrate the power of Amazon Titan, we utilized data from MovieLens, a platform that provides movie recommendations based on user preferences. We collected information on well-known movies released in 2024, including titles, posters, genres, and plot summaries.

Generating Embeddings

We created embeddings for movie posters and titles using the Amazon Bedrock API. The API converts images and text into embeddings, capturing their semantic meaning and visual characteristics. We obtained comprehensive representations of each movie by combining textual and visual embeddings.

import boto3
import json
import base64
from botocore.config import Config
 
# Configure AWS region and other settings
my_config = Config(
    region_name='us-east-1',  # Update with your desired region
    signature_version='v4',
    retries={
        'max_attempts': 10,
        'mode': 'standard'
    }
)
 
# Create a Boto3 client for the Bedrock Runtime service
bedrock_runtime = boto3.client(service_name="bedrock-runtime", config=my_config)
 
def get_embedding_for_poster_and_title(image_path, title):
    # Read the image file and encode it to base64
    with open(image_path, "rb") as image_file:
        input_image = base64.b64encode(image_file.read()).decode('utf8')
 
    # Prepare the request body containing the image and title
    body = json.dumps({
        "inputImage": input_image,
        "inputText": title
    })
 
    # Invoke the Titan Embedding model
    response = bedrock_runtime.invoke_model(
        body=body,
        modelId="amazon.titan-embed-image-v1",
        accept="application/json",
        contentType="application/json"
    )
 
    # Decode the response and extract the embeddings
    vector_json = json.loads(response['Body'].read().decode('utf8'))
    image_name = image_path.split("/")[-1].split(".")[0]
 
    return vector_json, image_name, title
 
def get_embedding_for_text(text):
    body = json.dumps({
        "inputText": text
    })
 
    response = bedrock_runtime.invoke_model(
        body=body, 
        modelId="amazon.titan-embed-image-v1", 
        accept="application/json", 
        contentType="application/json"       
    )
 
    vector_json = json.loads(response['Body'].read().decode('utf8'))
 
    return vector_json, text
 
def query(text, n=5):
    text_embedding = get_embedding_for_text(text)
 
    query = {
        "size": n,
        "query": {
            "knn": {
                "titan_multimodal_embedding": {
                    "vector": text_embedding[0]['embedding'],
                    "k": n
                }
            }
        },
        "_source": ["movieId", "title", "imdbMovieId", "posterPath", "plotSummary"]
    }
 
    response = requests.get(base_url + "/multi-modal-embedding-index/_search", auth=HTTPBasicAuth(username, password), verify=False, json=query)
 
    results = response.json()
 
    return results

import boto3

import json

import base64

from botocore.config import Config

# Configure AWS region and other settings

my_config = Config(

region_name='us-east-1', # Update with your desired region

signature_version='v4',

retries={

'max_attempts': 10,

'mode': 'standard'

}

)

# Create a Boto3 client for the Bedrock Runtime service

bedrock_runtime = boto3.client(service_name="bedrock-runtime", config=my_config)

def get_embedding_for_poster_and_title(image_path, title):

# Read the image file and encode it to base64

with open(image_path, "rb") as image_file:

input_image = base64.b64encode(image_file.read()).decode('utf8')

# Prepare the request body containing the image and title

body = json.dumps({

"inputImage": input_image,

"inputText": title

})

# Invoke the Titan Embedding model

response = bedrock_runtime.invoke_model(

body=body,

modelId="amazon.titan-embed-image-v1",

accept="application/json",

contentType="application/json"

)

# Decode the response and extract the embeddings

vector_json = json.loads(response['Body'].read().decode('utf8'))

image_name = image_path.split("/")[-1].split(".")[0]

return vector_json, image_name, title

def get_embedding_for_text(text):

body = json.dumps({

"inputText": text

})

response = bedrock_runtime.invoke_model(

body=body,

modelId="amazon.titan-embed-image-v1",

accept="application/json",

contentType="application/json"

)

vector_json = json.loads(response['Body'].read().decode('utf8'))

return vector_json, text

def query(text, n=5):

text_embedding = get_embedding_for_text(text)

query = {

"size": n,

"query": {

"knn": {

"titan_multimodal_embedding": {

"vector": text_embedding[0]['embedding'],

"k": n

}

"_source": ["movieId", "title", "imdbMovieId", "posterPath", "plotSummary"]

}

response = requests.get(base_url + "/multi-modal-embedding-index/_search", auth=HTTPBasicAuth(username, password), verify=False, json=query)

results = response.json()

return results

Building the Search Index

We leveraged AWS OpenSearch, a fully managed search and analytics suite, to index the embeddings. The search index stores the embeddings and metadata, such as movie titles, plot summaries, and genres. By enabling KNN search, we can retrieve the most similar movies based on a given query vector.

Retrieving Results

To retrieve movie recommendations, we first convert the search query into an embedding vector using the same API to generate embeddings. We then query the search index using the KNN algorithm to find movies with similar embeddings. The results include movie titles, posters, IMDb IDs, and plot summaries, providing users with comprehensive information to make informed choices.

def get_embedding_for_text(text):
    body = json.dumps({
        "inputText": text
    })
 
    response = bedrock_runtime.invoke_model(
        body=body, 
        modelId="amazon.titan-embed-image-v1", 
        accept="application/json", 
        contentType="application/json"       
    )
 
    vector_json = json.loads(response['Body'].read().decode('utf8'))
 
    return vector_json, text
 
def query(text, n=5):
    text_embedding = get_embedding_for_text(text)
 
    query = {
        "size": n,
        "query": {
            "knn": {
                "titan_multimodal_embedding": {
                    "vector": text_embedding[0]['embedding'],
                    "k": n
                }
            }
        },
        "_source": ["movieId", "title", "imdbMovieId", "posterPath", "plotSummary"]
    }
 
    response = requests.get(base_url + "/multi-modal-embedding-index/_search", auth=HTTPBasicAuth(username, password), verify=False, json=query)
 
    results = response.json()
 
    return results

def get_embedding_for_text(text):

body = json.dumps({

"inputText": text

})

response = bedrock_runtime.invoke_model(

body=body,

modelId="amazon.titan-embed-image-v1",

accept="application/json",

contentType="application/json"

)

vector_json = json.loads(response['Body'].read().decode('utf8'))

return vector_json, text

def query(text, n=5):

text_embedding = get_embedding_for_text(text)

query = {

"size": n,

"query": {

"knn": {

"titan_multimodal_embedding": {

"vector": text_embedding[0]['embedding'],

"k": n

}

"_source": ["movieId", "title", "imdbMovieId", "posterPath", "plotSummary"]

}

response = requests.get(base_url + "/multi-modal-embedding-index/_search", auth=HTTPBasicAuth(username, password), verify=False, json=query)

results = response.json()

return results

Now, with the query function in place, we can execute searches based on textual descriptions and retrieve relevant movie recommendations. This enhanced search functionality further improves the movie discovery experience for users.

Conclusion

AWS Titan Multimodal Embeddings provide precise, rich embeddings by fusing textual and visual data, revolutionizing the movie finding process. This provides a smooth and user-friendly movie search experience by enabling users to locate films based on hazy memories of scenes, language, or imagery. Titan’s cutting-edge capabilities revolutionize how we find and enjoy cinematic material by making the process of rediscovering films accurate and straightforward.

Drop a query if you have any questions regarding Amazon Titan and we will get back to you quickly.

Making IT Networks Enterprise-ready – Cloud Management Services

Accelerated cloud migration
End-to-end view of the cloud environment

Get Started

About CloudThat

CloudThat is a leading provider of Cloud Training and Consulting services with a global presence in India, the USA, Asia, Europe, and Africa. Specializing in AWS, Microsoft Azure, GCP, VMware, Databricks, and more, the company serves mid-market and enterprise clients, offering comprehensive expertise in Cloud Migration, Data Platforms, DevOps, IoT, AI/ML, and more.

CloudThat is recognized as a top-tier partner with AWS and Microsoft, including the prestigious ‘Think Big’ partner award from AWS and the Microsoft Superstars FY 2023 award in Asia & India. Having trained 650k+ professionals in 500+ cloud certifications and completed 300+ consulting projects globally, CloudThat is an official AWS Advanced Consulting Partner, Microsoft Gold Partner, AWS Training Partner, AWS Migration Partner, AWS Data and Analytics Partner, AWS DevOps Competency Partner, Amazon QuickSight Service Delivery Partner, Amazon EKS Service Delivery Partner, AWS Microsoft Workload Partners, Amazon EC2 Service Delivery Partner, and many more.

To get started, go through our Consultancy page and Managed Services Package, CloudThat’s offerings.

FAQs

1. What are Amazon Titan Multimodal Embeddings?

ANS: – Amazon Titan Multimodal Embeddings is an innovative technology Amazon Web Services (AWS) developed that combines textual and visual data to create rich representations of multimedia content. These embeddings capture semantic meaning and relationships between different data types, enabling advanced search capabilities.

2. What kind of data does Amazon Titan utilize for movie discovery?

ANS: – Amazon Titan utilizes both textual and visual data for movie discovery. Textual data includes movie titles, plot summaries, and descriptions, while visual data comprises movie posters and images. By combining these modalities, Amazon Titan creates comprehensive representations of movies, enabling accurate and personalized search results.

WRITTEN BY Aayushi Khandelwal

Aayushi, a dedicated Research Associate pursuing a Bachelor's degree in Computer Science, is passionate about technology and cloud computing. Her fascination with cloud technology led her to a career in AWS Consulting, where she finds satisfaction in helping clients overcome challenges and optimize their cloud infrastructure. Committed to continuous learning, Aayushi stays updated with evolving AWS technologies, aiming to impact the field significantly and contribute to the success of businesses leveraging AWS services.