THE FUTURE IS HERE

Using Software + Hardware Optimization to Enhance AI Inference Acceleration on Arm NPU

Many techniques have been proposed to both accelerate and compress trained Deep Neural Networks (DNNs) for deployment on resource-constrained edge devices.

Software-oriented approaches such as pruning and quantization have become commonplace, and several optimized hardware designs have been proposed to improve inference performance. An emerging question for developers is: how can we combine and automate these optimizations together?

In this session, we examine a real-world use-case where DNN design space exploration was used with the optimized Ethos-U55 NPU to leverage SW and HW optimizations in one workflow.

We will show how to automatically produce optimized TensorFlow Lite CNN model architectures, and speed up the dev-to-deployment process. We’ll present insights from testing Arm’s Vela compiler, FVP and configurable NPU to boost throughput 1.7x and reduce cycle count by 60% for image recognition tasks, enabling complex models typically not available for inference on edge devices.

Tech talk resources: https://github.com/Deeplite

#ArmDevSummit #Deeplite #MachineLearning