Introducing Amazon Bedrock Intelligent Prompt Routing and Prompt Caching (Preview) to minimize costs and latency

Voiced by Amazon Polly

On 4^th December 2024, AWS announced, two Amazon Bedrock features in preview to minimize costs and latency for Generative AI applications.

Transform Your Career with AWS Certifications

Advanced Skills
AWS Official Curriculum
10+ Hand-on Labs

Enroll Now

1. Amazon Bedrock Intelligent Prompt Routing

Amazon Bedrock’s Intelligent Prompt Routing optimizes quality and cost by using different foundation models (FMs) from the same family based on prompt complexity. For example, it can switch between Claude 3.5 Sonnet and Claude 3 Haiku, or Meta Llama 3.1 70B and 8B. Intelligent Prompt Routing leverages advanced prompt matching and model understanding techniques to predict each model’s performance for a given request. It then dynamically routes the request to the model most likely to deliver the desired response at the lowest cost. This system identifies the optimal model for each request, making it perfect for applications such as customer service. Simpler queries are handled by smaller, faster models, while more complex queries are directed to more capable models. This method can cut costs by up to 30% without compromising accuracy.

Benefits of Intelligent prompt routing:

Optimizes response quality and cost by using different foundation models.
Enhances overall performance by leveraging strengths of multiple models.
Simplifies management without complex orchestration.
Future-proofs by easily integrating new models.

2. Amazon Bedrock Prompt Caching

Amazon Bedrock now supports caching frequently used context in prompts across multiple model invocations. This is particularly useful for applications like document Q&A systems or coding assistants that need to maintain context over multiple interactions. Cached context remains available for up to 5 minutes after each access, potentially reducing costs by up to 90% and latency by up to 85% for supported models. These features help improve performance and cost efficiency in applications.

Prompt caching is now available for Claude 3.5 Haiku and Claude 3.5 Sonnet v2 in the US West (Oregon) and US East (N. Virginia) regions through cross-region inference. It is also available for Nova Micro, Nova Lite, and Nova Pro models in the US East (N. Virginia) region. Currently, only a limited number of customers have access to Amazon Bedrock’s prompt caching feature.

3. Working with Amazon Bedrock Prompt Routing

Steps to work with intelligent prompt routing:

Select the desired model family.
Intelligent prompt routing predicts each model’s performance for incoming requests.
Amazon Bedrock dynamically selects the model with the best response quality and cost.
The request is sent to the chosen model for processing.
Receive the response, including details about the selected model.

Select Meta Prompt Router and click on Open in Playground. Foe a simple prompt like “describe purpose of Amazon Bedrock in one line”, response is generated by Llama 3.1 70B.

When a more complex prompt is given, response is generated by Llama 3.1 8B model.

4. Things to note

Amazon Bedrock Intelligent Prompt Routing is now available in preview in the US East (N. Virginia) and US West (Oregon) AWS Regions. During this preview period, you can utilize the default prompt routers at no additional cost, only paying for the selected model. Prompt routers can be used alongside other Amazon Bedrock features, including performing evaluations, using knowledge bases, and configuring agents.
Intelligent prompt routing only supports prompts in English language.
Amazon Bedrock’s prompt caching support is now available in preview in the US West (Oregon) region for Anthropic’s Claude 3.5 Sonnet V2 and Claude 3.5 Haiku. Additionally, prompt caching is available in the US East (N. Virginia) region for Amazon Nova Micro, Amazon Nova Lite, and Amazon Nova Pro.
With prompt caching, cache reads are 90% cheaper than non-cached input tokens. There are no extra infrastructure charges for cache storage. For Anthropic models, there is an additional cost for tokens written to the cache. However, there are no extra costs for cache writes when using Amazon Nova models.
With prompt caching, content is stored for up to 5 minutes, and each cache hit resets this timer. This feature supports cross-Region inference transparently, allowing your applications to benefit from cost optimization and reduced latency while maintaining the flexibility of cross-Region inference.
These features simplify building cost-effective, high-performing generative AI applications by intelligently routing requests and caching frequently used content, reducing costs while enhancing performance.

5. Conclusion

Amazon Bedrock’s prompt routing and prompt caching features significantly enhance the efficiency and cost-effectiveness of AI applications. By enabling the reuse of frequently accessed context, these features reduce latency and operational costs, making it easier for developers to build responsive and scalable solutions. Initially available to a select group of customers, these innovations represent a promising step forward in optimizing AI performance and accessibility.

Earn Multiple AWS Certifications for the Price of Two

AWS Authorized Instructor led Sessions
AWS Official Curriculum

Get Started Now

About CloudThat

CloudThat is a leading provider of Cloud Training and Consulting services with a global presence in India, the USA, Asia, Europe, and Africa. Specializing in AWS, Microsoft Azure, GCP, VMware, Databricks, and more, the company serves mid-market and enterprise clients, offering comprehensive expertise in Cloud Migration, Data Platforms, DevOps, IoT, AI/ML, and more.

CloudThat is the first Indian Company to win the prestigious Microsoft Partner 2024 Award and is recognized as a top-tier partner with AWS and Microsoft, including the prestigious ‘Think Big’ partner award from AWS and the Microsoft Superstars FY 2023 award in Asia & India. Having trained 650k+ professionals in 500+ cloud certifications and completed 300+ consulting projects globally, CloudThat is an official AWS Advanced Consulting Partner, Microsoft Gold Partner, AWS Training Partner, AWS Migration Partner, AWS Data and Analytics Partner, AWS DevOps Competency Partner, AWS GenAI Competency Partner, Amazon QuickSight Service Delivery Partner, Amazon EKS Service Delivery Partner, AWS Microsoft Workload Partners, Amazon EC2 Service Delivery Partner, Amazon ECS Service Delivery Partner, AWS Glue Service Delivery Partner, Amazon Redshift Service Delivery Partner, AWS Control Tower Service Delivery Partner, AWS WAF Service Delivery Partner and many more.

To get started, go through our Consultancy page and Managed Services Package, CloudThat’s offerings.

Amazon Bedrock

WRITTEN BY Rashmi D

Rashmi Dhumal is working as a Subject Matter Expert in AWS Team at CloudThat, India. Being a passionate trainer, “technofreak and a quick learner”, is what aptly describes her. She has an immense experience of 20+ years as a technical trainer, an academician, mentor, and active involvement in curriculum development. She trained many professionals and student graduates pan India.