MESS: Multi-Exit Semantic Segmentation Networks

03 Jul 2022 in Projects

Semantic segmentation arises as the backbone of many vision systems, spanning from self-driving cars and robot navigation to augmented reality and teleconferencing. Frequently operating under stringent latency constraints within a limited resource envelope, optimising for efficient execution becomes important. At the same time, the heterogeneous capabilities of the target platforms and diverse constraints of different applications require the design and training of multiple target-specific segmentation models, leading to excessive maintenance costs. To this end, we propose a framework for converting state-of-the-art segmentation CNNs to Multi-Exit Semantic Segmentation (MESS) networks: specially trained models that employ parametrised early exits along their depth to i) dynamically save computation during inference on easier samples and ii) save training and maintenance cost by offering a post-training customisable speed-accuracy trade-off. Designing and training such networks naively can hurt performance. Thus, we propose novel two-staged training scheme for multi-exit networks. Furthermore, the parametrisation of MESS enables co-optimising the number, placement and architecture of the attached segmentation heads along with the exit policy, upon deployment via exhaustive search in <1GPUh. This allows MESS to rapidly adapt to the device capabilities and application requirements for each target use-case, offering a train-once-deploy-everywhere solution. MESS variants achieve latency gains of up to 2.83x with the same accuracy, or 5.33 pp higher accuracy for the same computational budget, compared to the original backbone network. Lastly, MESS delivers orders of magnitude faster architecture selection, compared to state-of-the-art techniques.

Authors: A. Kouris, S. I. Venieris*, S. Laskaridis*, N. D. Lane

Published at: European Conference in Computer Vision (ECCV’22) (full paper) and International Conference on Mobile Systems, Applications and Services (MobiSys’22) (poster)

Overview

DynO

We extended early-exit ideas to dense prediction by developing MESS, a framework for multi-exit semantic segmentation. Starting from strong segmentation backbones, we added parameterized exits that allow inference to terminate early on easy inputs.

We used a two-stage training process and a fast deployment-time co-optimization step that selects exit placement, head architecture, and exit policies in under an hour of GPU time. MESS delivers large latency gains at equal accuracy or substantial accuracy improvements under fixed compute budgets—while keeping deployment simple and flexible.

Reference

@inproceedings{kouris2022multi,
  title={Multi-exit semantic segmentation networks},
  author={Kouris, Alexandros and Venieris, Stylianos I and Laskaridis, Stefanos and Lane, Nicholas},
  booktitle={European Conference on Computer Vision},
  pages={330--349},
  year={2022},
  organization={Springer}
}

MESS: Multi-Exit Semantic Segmentation Networks

Overview

Links

Reference

Stefanos Laskaridis

Error

Overview

Links

Reference

Templates (for web app):

Error