Remote Sensing | |
MSST-Net: A Multi-Scale Adaptive Network for Building Extraction from Remote Sensing Images Based on Swin Transformer | |
Wei Yuan1  Wenbo Xu2  | |
[1] School of Architecture and Civil Engineering, Chengdu University, Chengdu 610106, China;School of Resources and Environment, University of Electronic Science and Technology of China, Chengdu 611731, China; | |
关键词: deep learning; remote sensing; transformer; semantic segmentation; multi-scale adaptive; | |
DOI : 10.3390/rs13234743 | |
来源: DOAJ |
【 摘 要 】
The segmentation of remote sensing images by deep learning technology is the main method for remote sensing image interpretation. However, the segmentation model based on a convolutional neural network cannot capture the global features very well. A transformer, whose self-attention mechanism can supply each pixel with a global feature, makes up for the deficiency of the convolutional neural network. Therefore, a multi-scale adaptive segmentation network model (MSST-Net) based on a Swin Transformer is proposed in this paper. Firstly, a Swin Transformer is used as the backbone to encode the input image. Then, the feature maps of different levels are decoded separately. Thirdly, the convolution is used for fusion, so that the network can automatically learn the weight of the decoding results of each level. Finally, we adjust the channels to obtain the final prediction map by using the convolution with a kernel of 1 × 1. By comparing this with other segmentation network models on a WHU building data set, the evaluation metrics, mIoU, F1-score and accuracy are all improved. The network model proposed in this paper is a multi-scale adaptive network model that pays more attention to the global features for remote sensing segmentation.
【 授权许可】
Unknown