Cooperative Inference with Interleaved Operator Partitioning for CNNs

Authors: Zhibang Liu, Chaonong Xu, Zhizhuo Liu, Lekai Huang, Jiachen Wei, Chao Li
Conference: ICIC 2024 Posters, Tianjin, China, August 5-8, 2024
Pages: 368-377
Keywords: deep learning, distributed inference, parallel computing

Abstract

Deploying deep learning models on IoT devices often faces challenges due to limited memory resources and computing capabilities. Cooperative inference is an important way to address it, where an intelligent model has to be partitioned and then distributively deployed. To perform horizontal partitions, existing cooperative inference methods take either the output channel of operators or the height and width of feature maps as the partition dimensions. In this manner, since the activation of operators is distributed, they have to be concatenated together before being fed to the next operator, which incurs the delay for cooperative inference. In this paper, we propose the Interleaved Operator Partitioning IOP strategy for CNN models. By partitioning an operator based on the output channel dimension and its successive operator based on the input channel dimension, activation concatenation becomes unnecessary, thereby reducing the number of communication connections, which consequently reduces cooperative inference delay. Based on IOP, we further present a model segmentation algorithm for minimizing cooperative inference time, which greedily selects operators for IOP pairing based on the inference delay benefit harvested. Experimental results demonstrate that compared with the state-of-the-art partition approaches used in CoEdge and AlexNet, the IOP strategy achieves 14.97 ~ 16.97 faster acceleration and reduces peak memory usage by 21.22 ~ 49.98 for three classical image classification models.
📄 View Full Paper (PDF) 📋 Show Citation