Using Software + Hardware Optimization to Enhance AI Inference Acceleration on Arm NPU

Many techniques have been proposed to both accelerate and compress trained Deep Neural Networks (DNNs) for deployment on resource-constrained edge devices.

Software-oriented approaches such as pruning and quantization have become commonplace, and several optimized hardware designs have been proposed to improve inference performance. An emerging question for developers is: how can we combine and automate these optimizations together?

In this session, we examine a real-world use-case where DNN design space exploration was used with the optimized Ethos-U55 NPU to leverage SW and HW optimizations in one workflow.

We will show how to automatically produce optimized TensorFlow Lite CNN model architectures, and speed up the dev-to-deployment process. We’ll present insights from testing Arm’s Vela compiler, FVP and configurable NPU to boost throughput 1.7x and reduce cycle count by 60% for image recognition tasks, enabling complex models typically not available for inference on edge devices.

Tech talk resources: https://github.com/Deeplite

#ArmDevSummit #Deeplite #MachineLearning

THE FUTURE IS HERE

AI Now

Dario Amodei WARNS: "You Have No Idea What's Coming in 6 Months"

AI Explained: From Simple to AGI!🤯 Will AI Replace Your Job? #shorts

What's the Difference: AI and AGI?

Artificial General Intelligence (AGI) Explained: The Future of AI

🚀 Jensen Huang on AGI – The Future of Artificial Intelligence! 🤖✨

The Agentic AI Supply Chain Framework

The AI Supply Chain

AI in logistics & Supply chain Management | Ai in Logistics Malayalam

Generative AI in Logistics: Transformation of Supply Chain

5 Supply Chain Jobs AI Will Replace | Future of Work in Supply Chain & Logistics

Using Software + Hardware Optimization to Enhance AI Inference Acceleration on Arm NPU

Using Software + Hardware Optimization to Enhance AI Inference Acceleration on Arm NPU

Rich X Search