ECTI Transactions on Computer and Information Technology | |
Machine Reading Comprehension Using Multi-Passage BERT with Dice Loss on Thai Corpus | |
article | |
Theerit Lapchaicharoenkit1  Peerapon Vateekeul1  | |
[1] Chulalongkorn University | |
关键词: Machine Reading Comprehension; natural language processing; Deep Learning; BERT; | |
DOI : 10.37936/ecti-cit.2022162.247799 | |
学科分类:医学(综合) | |
来源: Electrical Engineering/Electronics, Computer, Communications and Information Technology Association | |
【 摘 要 】
Nowadays there is an advancement in the field of machine reading comprehension task (MRC) due to the invention of large scale pre-trained language models, such as BERT. However, the performance is still limited when the context is long and contains many passages. BERT can only embed a part of the whole passage equal to the input size; thus, sliding windows must be used which leads to discontinued information when the passage is long. In this paper, we aim to propose a BERT-based MRC framework tailored for a long passage context in the Thai corpus. Our framework employs the multi-passage BERT along with self-adjusting dice loss, which can help the model focuses more on the answer region of the context passage. We also show that there is an improvement in the performance when an auxiliary task is used. The experiment was conducted on the Thai Question Answering (QA) dataset used in Thailand National Software Competition. The results show that our method improves the model’s performance over a traditional BERT framework.
【 授权许可】
CC BY-NC-ND
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO202307090004791ZK.pdf | 1685KB | download |