Frontiers in Digital Health | |
Markup: A Web-Based Annotation Tool Powered by Active Learning | |
W. Owen Pickrell2  Carys Jones3  Beata Fonferko-Shadrach3  Huw Strafford3  Simon Thompson3  Arron Lacey3  Samuel Dobbie3  Ashley Akbari3  | |
[1] Health Data Research UK, Swansea University Medical School, Swansea University, Swansea, United Kingdom;Neurology Department, Morriston Hospital, Swansea Bay University Health Board, Swansea, United Kingdom;Swansea University Medical School, Swansea University, Swansea, United Kingdom; | |
关键词: natural language processing; active learning; unstructured text; annotation; sequence-to-sequence learning; | |
DOI : 10.3389/fdgth.2021.598916 | |
来源: DOAJ |
【 摘 要 】
Across various domains, such as health and social care, law, news, and social media, there are increasing quantities of unstructured texts being produced. These potential data sources often contain rich information that could be used for domain-specific and research purposes. However, the unstructured nature of free-text data poses a significant challenge for its utilisation due to the necessity of substantial manual intervention from domain-experts to label embedded information. Annotation tools can assist with this process by providing functionality that enables the accurate capture and transformation of unstructured texts into structured annotations, which can be used individually, or as part of larger Natural Language Processing (NLP) pipelines. We present Markup (https://www.getmarkup.com/) an open-source, web-based annotation tool that is undergoing continued development for use across all domains. Markup incorporates NLP and Active Learning (AL) technologies to enable rapid and accurate annotation using custom user configurations, predictive annotation suggestions, and automated mapping suggestions to both domain-specific ontologies, such as the Unified Medical Language System (UMLS), and custom, user-defined ontologies. We demonstrate a real-world use case of how Markup has been used in a healthcare setting to annotate structured information from unstructured clinic letters, where captured annotations were used to build and test NLP applications.
【 授权许可】
Unknown