基于Transformer的行人重识别网络

莫建文; 莫伦麟

基于Transformer的行人重识别网络

A person re-identification network based on transformer

摘要

摘要: 针对行人重识别中水平切片方法由于分块特征感受野之间存在交叉重叠带来的分块数量限制问题，提出一种基于Transformer的行人重识别网络结构。首先，输入图像经过CNN网络提取中间特征图，并将特征图进行分块，对每块特征进一步切分成像素级token向量；然后，对各像素级token向量展平并加入位置编码和全局token向量，输入TransformerIN编码器中；接着，对得到的全局token向量进一步加入分类token向量和位置编码后，输入TransformerOUT编码器，得到最终的编码器输出；最后，取分类token向量并加上全连接后，利用softmax和交叉熵损失对行人进行分类。在Market-1501、DukeMTMC-reID数据集上的实验结果表明，本方法能够更细粒度地提取特征，并利用Transformer的全局把控能力，进一步提高了切片的数量和分类的精度。

Abstract: Aiming at the limitation of the number of blocks caused by overlapping and overlapping of block feature sensing fields in horizontal slicing-based person re-identification method, a person re-identification network structure CNN with INOUT_Transformer(CIT) based on Transformer was proposed. First of all, the input image was extracted from the middle feature image through CNN network, and the feature image was divided into blocks, and each piece of feature was further cut into pixel-level token vector. Then, each pixel level token vector was flattened and the position encoding and global token vector were added, which were input into the TransformerIN encoder. Then, the global token vector was further added into the classified token vector and position encoding, and then input into the TransformerOUT to obtain the final encoder output. Finally, after taking the classification token vector and adding the fully connected layer, the pedestrian was classified by Softmax and cross entropy loss. Experimental results on Market-1501 and DukeMTMC-reID datasets show that the proposed method can extract features more fine-grained, and further improve the number of slices and classification accuracy by utilizing Transformer′s global control ability.

HTML全文

参考文献(20)

施引文献

资源附件(0)