Deliver High Performance Applications using AWS Inferentia2 (Inf2)

Introduction

Companies build more complex models, performing training and running machine learning models more challenging. As part of a series of custom instances designed to reduce costs, it announced the new Inf2 instance for Amazon EC2.

Adam Selipsky, AWS CEO, announced it at the Las Vegas AWS re:Invent conference.

At AWS re:Invent, Selipsky told the audience that Inf1 is suitable for models ranging from simple to moderate.

However, for models with larger complexity, customers have often used higher-power instances because they lack the optimal resource configuration for inference workloads.

Pioneers in Cloud Consulting & Migration Services

Reduced infrastructural costs
Accelerated application deployment

Get Started

About EC2 Inf2 Instances

The Amazon Elastic Compute Cloud (Amazon EC2) Inf2 instances are designed for the most demanding deep learning (DL) inference applications and deliver high performance at the lowest cost in Amazon EC2.

AWS Inferentia2, the third DL accelerator designed by AWS, powers up to 12 Inf2 instances.

Compared with Inf1, Inf2 instances deliver three times more compute performance, four times more throughput, and ten times less latency.

Inf2 instances support the development of natural language understanding applications, translation applications, video and image generation applications, speech recognition applications, and personalization applications.

Their price-performance benefits are improved for smaller models, including large language models (LLM) and vision transformers, as well as large language models (LLM) at scale.

For support of ultra-large 100B+ parameter models, Inf2 instances are the first scale-out distributed inference instances in Amazon EC2 with ultra-high-speed networking.

Inf2 Instances offerings

An Inf2 instance can produce up to 2.3 petaflops of DL performance, be equipped with up to 384 GB of accelerator memory, and be connected to the host via NeuronLink, an intra-instance, high-speed, nonblocking interconnect.

Additionally, Inf2 instances provide 50% better performance per watt than GPU-based instances on Amazon EC2 and help you achieve your sustainability goals.

With only a few lines of code, you can deploy DL applications on Inf2 using the AWS Neuron SDK natively integrated with popular machine learning frameworks.

Multiple machines can work simultaneously to serve larger models despite the model being able to handle hundreds of billions of parameters.

The Inf2 instance supports full stacks for FP32, TF32, BF16, FP16, UINT8, and configurable FP8 (cFP8) data types.

With AWS Neuron, you can auto-cast high-precision FP32 models to lower precision data types while optimizing accuracy and performance.

Due to eliminating the need for retraining for lower precision, auto casting reduces time to market.

Inf2 Performance

inf

A deep learning inference workload can be executed on EC2 Inf2 instances powered by the Inf2 chip. In support of large-scale machine learning models, it is a custom-designed chip that can deliver up to 16 teraflops of mixed-precision performance.

Inf2 provided 2.6x higher throughput and 8.1x low latency, which makes Inf2 instance a more powerful and useful resource.

Also, Inf2 provides 50% better performance per watt compere to G5 instances.

Throughput and latency have been benchmarked for the Inf2 chip that powers EC2 Inf2 instances. Latency refers to the time it takes for a single inference to be processed, while throughput refers to the number of inferences the system can process during a given time period.

The Inf2 chip has been demonstrated to deliver 3.5 times the performance compared to other instances. Due to its high throughput, it is ideal for real-time image and video processing applications that require many inferences per second.

It has been shown that the Inf2 chip delivers inference results in as little as 0.7 milliseconds. It is especially important for applications that require real-time responses, such as autonomous vehicles, where even a slight delay can have serious consequences.

Product details

inf2

Conclusion

As a result, these large workloads could not be processed using any other solution until now.

The Inf2 instance powered by the new Inferentia2 chip is perfect for those seeking a customized solution based on their specific needs.

Compared to Inf1, we deliver faster compute performance, throughput, and much less latency than Inf2.

EC2 Inf2 instances based on Inferentia2 provide organizations and businesses with a reliable and efficient way to reduce costs while accelerating their machine learning workloads.

Get your new hires billable within 1-60 days. Experience our Capability Development Framework today.

Cloud Training
Customized Training
Experiential Learning

About CloudThat

CloudThat is also the official AWS (Amazon Web Services) Advanced Consulting Partner and Training partner and Microsoft gold partner, helping people develop knowledge of the cloud and help their businesses aim for higher goals using best in industry cloud computing practices and expertise. We are on a mission to build a robust cloud computing ecosystem by disseminating knowledge on technological intricacies within the cloud space. Our blogs, webinars, case studies, and white papers enable all the stakeholders in the cloud computing sphere.

Drop a query if you have any questions regarding Amazon EC2 Inf2 and I will get back to you quickly.

To get started, go through our Consultancy page and Managed Services Package that is CloudThat’s offerings.

FAQs

1. Is Amazon search faster with Inf2?

ANS: – Inf2 has enabled amazon search to deliver low-latency and high performance, as we know from the above details, making amazon search 2x faster with Inf2.

2. Which technology is most targeted by Amazon EC2 Inf2 instances?

ANS: – AWS EC2 Inf2 cloud platform is designed specifically for deep learning (DL) inference to get the best performance and lowest cost for your most demanding DL applications.