Using Software + Hardware Optimization to Enhance AI Inference Acceleration on Arm NPU
Many techniques have been proposed to both accelerate and compress trained Deep Neural Networks (DNNs) for deployment on resource-constrained edge devices.
Software-oriented approaches such as pruning and quantization have become commonplace, and several optimized hardware designs have been proposed to improve inference performance. An emerging question for developers is: how can we combine and automate these optimizations together?
In this session, we examine a real-world use-case where DNN design space exploration was used with the optimized Ethos-U55 NPU to leverage SW and HW optimizations in one workflow.
We will show how to automatically produce optimized TensorFlow Lite CNN model architectures, and speed up the dev-to-deployment process. We'll present insights from testing Arm's Vela compiler, FVP and configurable NPU to boost throughput 1.7x and reduce cycle count by 60% for image recognition tasks, enabling complex models typically not available for inference on edge devices.
Tech talk resources: https://github.com/Deeplite
#ArmDevSummit #Deeplite #MachineLearning
Many techniques have been proposed to both accelerate and compress trained Deep Neural Networks (DNNs) for deployment on resource-constrained edge devices.
Software-oriented approaches such as pruning and quantization have become commonplace, and several optimized hardware designs have been proposed to improve inference performance. An emerging question for developers is: how can we combine and automate these optimizations together?
In this session, we examine a real-world use-case where DNN design space exploration was used with the optimized Ethos-U55 NPU to leverage SW and HW optimizations in one workflow.
We will show how to automatically produce optimized TensorFlow Lite CNN model architectures, and speed up the dev-to-deployment process. We’ll present insights from testing Arm’s Vela compiler, FVP and configurable NPU to boost throughput 1.7x and reduce cycle count by 60% for image recognition tasks, enabling complex models typically not available for inference on edge devices.
Tech talk resources: https://github.com/Deeplite
#ArmDevSummit #Deeplite #MachineLearning