BMC Bioinformatics | |
Combining word embeddings to extract chemical and drug entities in biomedical literature | |
Manuel Carlos Díaz-Galiano1  Pilar López-Úbeda1  L. Alfonso Ureña-López1  M. Teresa Martín-Valdivia1  | |
[1] Department of Computer Science, Advanced Studies Center in Information and Communication Technologies (CEATIC), Universidad de Jaén, Campus Las Lagunillas s/n, 23071, Jaén, Spain; | |
关键词: Natural language processing; Named entity recognition; Concept indexing; Neural network; Word embeddings; SNOMED-CT; | |
DOI : 10.1186/s12859-021-04188-3 | |
来源: Springer | |
【 摘 要 】
BackgroundNatural language processing (NLP) and text mining technologies for the extraction and indexing of chemical and drug entities are key to improving the access and integration of information from unstructured data such as biomedical literature.MethodsIn this paper we evaluate two important tasks in NLP: the named entity recognition (NER) and Entity indexing using the SNOMED-CT terminology. For this purpose, we propose a combination of word embeddings in order to improve the results obtained in the PharmaCoNER challenge.ResultsFor the NER task we present a neural network composed of BiLSTM with a CRF sequential layer where different word embeddings are combined as an input to the architecture. A hybrid method combining supervised and unsupervised models is used for the concept indexing task. In the supervised model, we use the training set to find previously trained concepts, and the unsupervised model is based on a 6-step architecture. This architecture uses a dictionary of synonyms and the Levenshtein distance to assign the correct SNOMED-CT code.ConclusionOn the one hand, the combination of word embeddings helps to improve the recognition of chemicals and drugs in the biomedical literature. We achieved results of 91.41% for precision, 90.14% for recall, and 90.77% for F1-score using micro-averaging. On the other hand, our indexing system achieves a 92.67% F1-score, 92.44% for recall, and 92.91% for precision. With these results in a final ranking, we would be in the first position.
【 授权许可】
CC BY
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO202203048548467ZK.pdf | 1532KB | download |