Array | |
Fusion in dissimilarity space for RGB-D person re-identification | |
Hisato Fukuda1  Antony Lam2  Yoshinori Kuno3  Md Kamal Uddin3  Yoshinori Kobayashi4  | |
[1] Corresponding author. Graduate School of Science and Engineering, Saitama University, Saitama, Japan.;Noakhali Science and Technology University, Noakhali, Bangladesh;Graduate School of Science and Engineering, Saitama University, Saitama, Japan;Mercari, Inc., Tokyo, Japan; | |
关键词: Re-identification; RGB-D sensors; Dissimilarity space; Triplet loss; | |
DOI : | |
来源: DOAJ |
【 摘 要 】
Person re-identification (Re-id) is the task of recognizing people across different non-overlapping sensors of a camera network. Despite the recent advances with deep learning (DL) models for multi-modal fusion, state-of-the-art Re-id approaches fail to leverage the depth guided contextual information to dynamically select the most discriminant convolutional filters for better feature embedding and inference. Thanks to low cost modern RGB-D sensors (e.g. Microsoft Kinect and Intel RealSense Depth camera) that avail us with different modalities such as illumination invariant high-quality depth images, RGB images and skeleton information can be obtained simultaneously. State-of-the-art Re-id approaches utilize multi-modal fusion in feature space where the chances of fused noisy features to dominate the final recognition process is high. In this paper, we address this issue by exploiting the advantage of using an effective fusion technique in dissimilarity space. Given a query RGB-D image of an individual, two CNNs are separately trained with 3-channel RGB and 4-channel RGB-D images to produce two different feature embeddings required for pair-wise matching with embeddings for reference images, where dissimilarity scores w.r.t reference images from both modalities are fused together for final ranking. Additionally, lack of a proper RGB-D Re-id dataset prompts us to contribute a new RGB-D Re-id dataset named SUCVL RGBD-ID, including RGB and depth images of 58 identities from three cameras where one camera was installed in poor illumination conditions and the remaining two cameras were installed in two different indoor locations with different indoor lighting environments. Extensive experimental analysis on our dataset and two publicly available datasets show the effectiveness of our proposed method. Moreover, our proposed method is general and can be applied to a multitude of different RGB-D based applications.
【 授权许可】
Unknown