Energy and Cost Considerations for GPU Accelerated AI Inference Workloads

Abstract

Recent advances in AI have motivated hardware manufacturers to design deep learning friendly accelerators to keep with the ever-growing increases in model sizes and compu-tational requirements. While early accelerators were utilized for model training, newer accelerators are capable of running deep neural network (DNN) model inferences and are increasingly used in robotics, vision, and edge applications. In this paper, we compare several popular embedded and desktop GPUs with respect to their performance and energy efficiency. Our results show that although larger devices always provide higher throughput, they are not always the most energy-efficient. GPUs vary in terms of their energy efficiency. To aid the process of hardware selection for a system designer, we use our experimental results to design a recommendation algorithm that chooses the ideal hardware accelerator under cost, power, and performance constraints.