期刊论文详细信息
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Transferring Transformer-Based Models for Cross-Area Building Extraction From Remote Sensing Images
Michael Schmitt1  Wenyue Guo2  Anzhu Yu2  He Li2  Xiaochong Tong2  Chunping Qiu2  Xin Chen2 
[1] Bundeswehr University Munich, Neubiberg, Germany;PLA Strategic Support Force Information Engineering University, Zhengzhou, China;
关键词: Benchmark dataset;    building extraction (BE);    convolutional neural networks (CNNs);    generalization ability;    remote sensing (RS);    transformer;   
DOI  :  10.1109/JSTARS.2022.3175200
来源: DOAJ
【 摘 要 】

Extracting buildings from remote sensing (RS) images is an important task with a variety of applications. Considerable attention has focused on achieving new state-of-the-art (SOTA) accuracy with more and more advanced deep learning (DL) models. However, the developed models still hardly generalize across geographical areas, hindering the practical use of SOTA approaches. To attack this problem, we established a baseline for model cross-area generalization ability using available datasets for building extraction (BE). In addition to two popular fully convolutional neural network (FCN) based models, we first adapted two novel transformer-based models, shifted windows (Swin) transformer and SegFormer, which are all able to output SOTA accuracy with no big difference when tested within one area. However, experimental results show that all models fail to generalize to a different area. We then propose to fine-tune pretrained models from one area on a small subset of an unseen area, the effectiveness of which depends on the model choice and the data size for tuning. By jointly taking advantage of the transfer learning idea and the multiscale feature learning ability of SegFormer, a distinct improvement has been achieved compared to results from Swin transformer and FCN-based models trained on the same amount of data. Commonly used metric, Intersection over Union, can be increased from 38.97% to 70.86%, and from 48.36% to 74.51%, when using 10% and 30% subset of the targeting area, respectively. The influence of model choice and data size for tuning has also been investigated. Our work contributes to complementing the algorithm development and within-area model evaluation in the hot field of BE from RS images.

【 授权许可】

Unknown   

  文献评价指标  
  下载次数:0次 浏览次数:1次