Enhancing AI Efficiency with Amazon Bedrock’s Intelligent Prompt Routing

Introduction

Amazon Bedrock introduces innovative features to enhance the efficiency and cost-effectiveness of AI applications with its Intelligent Prompt Routing and Prompt Caching capabilities. Intelligent Prompt Routing enables the seamless use of multiple foundation models (FMs) within the same family, optimizing responses by matching each prompt’s complexity with the most suitable model. This feature ensures high-quality performance while significantly reducing costs, making it ideal for applications like customer service assistants. Complementing this, the new Prompt Caching functionality stores frequently used context for repeated queries, drastically cutting down costs and latency for tasks such as document-based Q&A or coding assistance. Together, these advancements enable businesses to deliver faster and more cost-efficient AI-driven solutions without compromising accuracy or quality.

Pioneers in Cloud Consulting & Migration Services

Reduced infrastructural costs
Accelerated application deployment

Get Started

Accessing Amazon Bedrock Intelligent Prompt Routing through the Console

Amazon Bedrock Intelligent Prompt Routing leverages advanced techniques in prompt matching and model comprehension to determine the optimal model for each request, balancing response quality and cost. During the preview phase, default prompt routers such as Anthropic Claude and Meta’s Llama are available for model families.

You can interact with Intelligent Prompt Routing via the AWS Management Console, AWS Command Line Interface (CLI), or AWS SDK. To access it through the Amazon Bedrock console, navigate to the Foundation Models section and select Prompt Routers from the navigation pane.

prompt

Select the Anthropic Prompt Router default router to get more information.

prompt2

To use the prompt router for chat, choose Open in Playground and type the following prompt:

Alice has M sisters in addition to N brothers. What is the number of sisters that Alice’s brothers have?

I picked the new Router metrics icon on the right to observe which model the prompt router chose. In this instance, Anthropic Claude 3.5 Sonnet was utilized because of the complexity of the question.

prompt3

I now pose a simple query to the same prompt router:

In one line, explain the goal of a “hello world” application.

The prompt router has selected Anthropic Claude 3 Haiku.

prompt4

To view the Meta Prompt Router settings, choose it. Llama 3.1 70B and 8B cross-region inference profiles are being used, with the 70B model as a backup.

prompt5

Establish the Amazon Resource Name (ARN) of the prompt router as the model ID in the Amazon Bedrock API before utilizing it in an application. Let’s see how this functions using an AWS SDK and the AWS CLI.

Using the AWS CLI to Utilize Amazon Bedrock Intelligent Prompt Routing

The expanded Amazon Bedrock API now supports prompt routers. For instance, I can use ListPromptRouters to show all the prompt routes that are currently in use in an AWS Region:

aws bedrock list-prompt-routers

Output:

{
    "promptRouterSummaries": [
        {
            "promptRouterName": "Anthropic Prompt Router",
            "routingCriteria": {
                "responseQualityDifference": 0.26
            },
            "description": "Routes requests among models in the Claude family",
            "createdAt": "2024-11-20T00:00:00+00:00",
            "updatedAt": "2024-11-20T00:00:00+00:00",
            "promptRouterArn": "arn:aws:bedrock:us-east-1:123412341234:default-prompt-router/anthropic.claude:1",
            "models": [
                {
                    "modelArn": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.anthropic.claude-3-haiku-20240307-v1:0"
                },
                {
                    "modelArn": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.anthropic.claude-3-5-sonnet-20240620-v1:0"
                }
            ],
            "fallbackModel": {
                "modelArn": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.anthropic.claude-3-5-sonnet-20240620-v1:0"
            },
            "status": "AVAILABLE",
            "type": "default"
        },
        {
            "promptRouterName": "Meta Prompt Router",
            "routingCriteria": {
                "responseQualityDifference": 0.0
            },
            "description": "Routes requests among models in the LLaMA family",
            "createdAt": "2024-11-20T00:00:00+00:00",
            "updatedAt": "2024-11-20T00:00:00+00:00",
            "promptRouterArn": "arn:aws:bedrock:us-east-1:123412341234:default-prompt-router/meta.llama:1",
            "models": [
                {
                    "modelArn": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.meta.llama3-1-8b-instruct-v1:0"
                },
                {
                    "modelArn": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.meta.llama3-1-70b-instruct-v1:0"
                }
            ],
            "fallbackModel": {
                "modelArn": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.meta.llama3-1-70b-instruct-v1:0"
            },
            "status": "AVAILABLE",
            "type": "default"
        }
    ]
}

{

"promptRouterSummaries": [

{

"promptRouterName": "Anthropic Prompt Router",

"routingCriteria": {

"responseQualityDifference": 0.26

"description": "Routes requests among models in the Claude family",

"createdAt": "2024-11-20T00:00:00+00:00",

"updatedAt": "2024-11-20T00:00:00+00:00",

"promptRouterArn": "arn:aws:bedrock:us-east-1:123412341234:default-prompt-router/anthropic.claude:1",

"models": [

{

"modelArn": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.anthropic.claude-3-haiku-20240307-v1:0"

{

"modelArn": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.anthropic.claude-3-5-sonnet-20240620-v1:0"

}

"fallbackModel": {

"modelArn": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.anthropic.claude-3-5-sonnet-20240620-v1:0"

"status": "AVAILABLE",

"type": "default"

{

"promptRouterName": "Meta Prompt Router",

"routingCriteria": {

"responseQualityDifference": 0.0

"description": "Routes requests among models in the LLaMA family",

"createdAt": "2024-11-20T00:00:00+00:00",

"updatedAt": "2024-11-20T00:00:00+00:00",

"promptRouterArn": "arn:aws:bedrock:us-east-1:123412341234:default-prompt-router/meta.llama:1",

"models": [

{

"modelArn": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.meta.llama3-1-8b-instruct-v1:0"

{

"modelArn": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.meta.llama3-1-70b-instruct-v1:0"

}

"fallbackModel": {

"modelArn": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.meta.llama3-1-70b-instruct-v1:0"

"status": "AVAILABLE",

"type": "default"

}

]

}

I may use GetPromptRouter to get information about a certain prompt router by providing its ARN. Using the Meta Llama model family as an example:

JSON:

{
    "promptRouterName": "Meta Prompt Router",
    "routingCriteria": {
        "responseQualityDifference": 0.0
    },
    "description": "Routes requests among models in the LLaMA family",
    "createdAt": "2024-11-20T00:00:00+00:00",
    "updatedAt": "2024-11-20T00:00:00+00:00",
    "promptRouterArn": "arn:aws:bedrock:us-east-1:123412341234:default-prompt-router/meta.llama:1",
    "models": [
        {
            "modelArn": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.meta.llama3-1-8b-instruct-v1:0"
        },
        {
            "modelArn": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.meta.llama3-1-70b-instruct-v1:0"
        }
    ],
    "fallbackModel": {
        "modelArn": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.meta.llama3-1-70b-instruct-v1:0"
    },
    "status": "AVAILABLE",
    "type": "default"
}

{

"promptRouterName": "Meta Prompt Router",

"routingCriteria": {

"responseQualityDifference": 0.0

"description": "Routes requests among models in the LLaMA family",

"createdAt": "2024-11-20T00:00:00+00:00",

"updatedAt": "2024-11-20T00:00:00+00:00",

"promptRouterArn": "arn:aws:bedrock:us-east-1:123412341234:default-prompt-router/meta.llama:1",

"models": [

{

"modelArn": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.meta.llama3-1-8b-instruct-v1:0"

{

"modelArn": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.meta.llama3-1-70b-instruct-v1:0"

}

"fallbackModel": {

"modelArn": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.meta.llama3-1-70b-instruct-v1:0"

"status": "AVAILABLE",

"type": "default"

}

Using an AWS SDK to Utilize Amazon Bedrock Intelligent Prompt Routing

Using an AWS SDK with a prompt router is similar to the previous command line experience. When I invoke a model, I set the model ID to the prompt model ARN. For example, in this Python code, I’m using the Meta Llama router with the ConverseStream API:

Python Code:

import json
import boto3
bedrock_runtime = boto3.client(
    "bedrock-runtime",
    region_name="us-east-1",
)
MODEL_ID = "arn:aws:bedrock:us-east-1:123412341234:default-prompt-router/meta.llama:1"
user_message = "Describe the purpose of a 'hello world' program in one line."
messages = [
    {
        "role": "user",
        "content": [{"text": user_message}],
    }
]
streaming_response = bedrock_runtime.converse_stream(
    modelId=MODEL_ID,
    messages=messages,
)
for chunk in streaming_response["stream"]:
    if "contentBlockDelta" in chunk:
        text = chunk["contentBlockDelta"]["delta"]["text"]
        print(text, end="")
    if "messageStop" in chunk:
        print()
    if "metadata" in chunk:
        if "trace" in chunk["metadata"]:
            print(json.dumps(chunk['metadata']['trace'], indent=2))

import json

import boto3

bedrock_runtime = boto3.client(

"bedrock-runtime",

region_name="us-east-1",

)

MODEL_ID = "arn:aws:bedrock:us-east-1:123412341234:default-prompt-router/meta.llama:1"

user_message = "Describe the purpose of a 'hello world' program in one line."

messages = [

{

"role": "user",

"content": [{"text": user_message}],

}

]

streaming_response = bedrock_runtime.converse_stream(

modelId=MODEL_ID,

messages=messages,

)

for chunk in streaming_response["stream"]:

if "contentBlockDelta" in chunk:

text = chunk["contentBlockDelta"]["delta"]["text"]

print(text, end="")

if "messageStop" in chunk:

print()

if "metadata" in chunk:

if "trace" in chunk["metadata"]:

print(json.dumps(chunk['metadata']['trace'], indent=2))

Both the answer text and the trace’s content are printed by this script in the response metadata. The prompt router has chosen the quicker and less expensive type for this straightforward request:

Usually used to confirm that a development environment is configured successfully, a “Hello World” program is a straightforward example that illustrates a programming language’s basic syntax and capabilities.

{
  "promptRouter": {
    "invokedModelId": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.meta.llama3-1-8b-instruct-v1:0"
  }
}

{

"promptRouter": {

"invokedModelId": "arn:aws:bedrock:us-east-1:123412341234:inference-profile/us.meta.llama3-1-8b-instruct-v1:0"

}

Conclusion

Amazon Bedrock Intelligent Prompt Routing and Prompt Caching provide revolutionary features for maximizing the cost-effectiveness and performance of AI applications.

By intelligently matching prompts with the most suitable foundation models and leveraging cached context for repetitive tasks, these features empower developers to build smarter and faster solutions while reducing operational costs and latency. Whether accessed via the AWS Management Console, CLI, or SDKs, these innovations simplify AI deployments and enhance productivity across diverse use cases.

As businesses embrace these advancements, they can deliver high-quality, scalable AI-driven applications with improved efficiency and precision.

Drop a query if you have any questions regarding Amazon Bedrock Intelligent Prompt Routing and we will get back to you quickly.

Empowering organizations to become ‘data driven’ enterprises with our Cloud experts.

Reduced infrastructure costs
Timely data-driven decisions

Get Started

About CloudThat

CloudThat is a leading provider of Cloud Training and Consulting services with a global presence in India, the USA, Asia, Europe, and Africa. Specializing in AWS, Microsoft Azure, GCP, VMware, Databricks, and more, the company serves mid-market and enterprise clients, offering comprehensive expertise in Cloud Migration, Data Platforms, DevOps, IoT, AI/ML, and more.

CloudThat is the first Indian Company to win the prestigious Microsoft Partner 2024 Award and is recognized as a top-tier partner with AWS and Microsoft, including the prestigious ‘Think Big’ partner award from AWS and the Microsoft Superstars FY 2023 award in Asia & India. Having trained 650k+ professionals in 500+ cloud certifications and completed 300+ consulting projects globally, CloudThat is an official AWS Advanced Consulting Partner, Microsoft Gold Partner, AWS Training Partner, AWS Migration Partner, AWS Data and Analytics Partner, AWS DevOps Competency Partner, AWS GenAI Competency Partner, Amazon QuickSight Service Delivery Partner, Amazon EKS Service Delivery Partner, AWS Microsoft Workload Partners, Amazon EC2 Service Delivery Partner, Amazon ECS Service Delivery Partner, AWS Glue Service Delivery Partner, Amazon Redshift Service Delivery Partner, AWS Control Tower Service Delivery Partner, AWS WAF Service Delivery Partner, Amazon CloudFront, Amazon OpenSearch, AWS DMS and many more.

FAQs

1. What is Amazon Bedrock Intelligent Prompt Routing?

ANS: – Intelligent Prompt Routing optimizes prompt processing by selecting the most suitable model within a foundation model family, ensuring high-quality responses while reducing costs.

2. How does Prompt Caching improve performance?

ANS: – Prompt Caching stores frequently used context for repeated queries, minimizing costs and latency for tasks like Q&A or coding assistance.

WRITTEN BY Aayushi Khandelwal

Aayushi, a dedicated Research Associate pursuing a Bachelor's degree in Computer Science, is passionate about technology and cloud computing. Her fascination with cloud technology led her to a career in AWS Consulting, where she finds satisfaction in helping clients overcome challenges and optimize their cloud infrastructure. Committed to continuous learning, Aayushi stays updated with evolving AWS technologies, aiming to impact the field significantly and contribute to the success of businesses leveraging AWS services.