Making DensePose Fast and Light

Ruslan Rakhimov1Emil Bogomolov1Alexandr Notchenko1Fung Mao2Alexey Artemov1Denis Zorin3, 1Evgeny Burnaev1

1Skolkovo Institute of Science and Technology2Huawei Moscow Research Center (Russia)3New York University

arXiv 2020

Qualitative comparison of different models. We depict contours with color-coded U and V coordinates as an output of the model.


DensePose estimation task is a significant step forward for enhancing user experience computer vision applications ranging from augmented reality to cloth fitting. Existing neural network models capable of solving this task are heavily parameterized and a long way from being transferred to an embedded or mobile device. To enable Dense Pose inference on the end device with current models, one needs to support an expensive server-side infrastructure and have a stable internet connection. To make things worse, mobile and embedded devices do not always have a powerful GPU inside. In this work, we target the problem of redesigning the DensePose R-CNN model's architecture so that the final network retains most of its accuracy but becomes more light-weight and fast. To achieve that, we tested and incorporated many deep learning innovations from recent years, specifically performing an ablation study on 23 efficient backbone architectures, multiple two-stage detection pipeline modifications, and custom model quantization methods. As a result, we achieved 17 times model size reduction and 2 times latency improvement compared to the baseline model.





If you have any questions about this work, please contact us under