IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing | |
MSFusion: Multistage for Remote Sensing Image Spatiotemporal Fusion Based on Texture Transformer and Convolutional Neural Network | |
Hui Liu1  Guangqi Yang2  Yurong Qian2  Bochuan Tang2  Ranran Qi2  Yi Lu2  Jun Geng3  | |
[1] Key Laboratory of Software Engineering, Urumqi, China;School of Software, Key Laboratory of signal detection and processing in Xinjiang Uygur Autonomous Region, Xinjiang University, Urumqi, China;School of Software, Xinjiang University, Urumqi, China; | |
关键词: Multistage feature fusion; multitemporal remote sensing data; remote sensing; self-attention; spatiotemporal fusion; transformer; | |
DOI : 10.1109/JSTARS.2022.3179415 | |
来源: DOAJ |
【 摘 要 】
Due to the limitations of current technology and budget, a single satellite sensor cannot obtain high spatiotemporal resolution remote sensing images. Therefore, remote sensing image spatio-temporal fusion technology is considered as an effective solution and has attracted extensive attention. In the field of deep learning, due to the fixed size of the perception field of a convolutional neural network, it is impossible to model the correlation of global features, and the features extracted only through convolution operation lack the ability to capture long-distance features. At the same time, complex fusion methods cannot better integrate temporal and spatial features. In order to solve these problems, we propose a multistage remote sensing image spatio-temporal fusion model based on texture transformer and convolutional neural network. The model combines the advantages of transformer and convolutional network, uses a lightweight convolution network to extract spatial features and temporal discrepancy features, uses transformer to learn global temporal correlation, and finally, fuses temporal features with spatial features. In order to make full use of the features obtained in different stages, we design a cross-stage adaptive fusion module CSAFM. The module adopts the self-attention mechanism to adaptively integrate the features of different scales while considering the temporal and spatial characteristics. To test the robustness of the model, the experiments are carried out on three datasets of CIA, LGC, and DX. Compared with five typical spatio-temporal fusion algorithms, we obtain excellent results, which prove the superiority of MSFusion model.
【 授权许可】
Unknown