VNet: a versatile network to train real-time semantic segmentation models on a single GPU

Published in Science China Information Sciences, 2022

Recommended citation: VNet: a versatile network to train real-time semantic segmentation models on a single GPU. Wenxing Li, Ning Lin, Mingzhe Zhang, Hang Lu, Xiaoming Chen, Xiaowei Li. Science China Press. Sci China Inf Sci, 2022, 65(3): 139105, https://doi.org/10.1007/s11432-020-2971-8.

Abstract

Modern semantic segmentation, which has important applications such as medical image analysis, image editing, and video surveillance, has made remarkable progress using deep convolution neural network models. Recently, an efficient real-time semantic segmentation method has received considerable attention, as intelligent edge devices not only have faster inference speed requirements for semantic segmentation models but also cannot rely on the cloud services of data centers. There are two feasible approaches to develop an efficient semantic segmentation model. The first approach is by designing efficient models: designing and developing the models’ architecture from scratch (eg, ENet [1]). The second approach, which is less common but increasingly popular, is network compression: to develop light-weight models (eg, ICNet [2]) with pruning methods [3] that are widely used in image classification tasks. However, both these approaches are difficult to follow to develop light-weight and fast semantic segmentation models without compromising on the accuracy of the models.

In this study, we investigate a salient question: whether other important factors can achieve better accuracy and why previous studies disregard these factors.