International Journal of Engineering Pedagogy | |
Text Mining: Design of Interactive Search Engine Based Regular Expressions of Online Automobile Advertisements | |
ARTICLE | |
Ahmed Adeeb Jalal1  | |
[1] Al-Iraqia University | |
关键词: Information Extraction; Information Retrieval; Natural Language Processing; Text Mining; Web Crawler.; | |
DOI : 10.3991/ijep.v10i3.12419 | |
来源: International Society for Engineering Education (IGIP), Kassel University Press | |
【 摘 要 】
Technology world has greatly evolved over the past decades, which led to inflated data volume. This progress of technology in the digital form generated scattered texts across millions of web pages. Unstructured texts contain a vast amount of textual data. Discover of useful and interesting relations from unstructured texts requires more processing by computers. Therefore, text mining and information extraction have become an exciting research field to get structured and valuable information. This paper focuses on text pre-processing of automotive advertisements domains to configure a structured database. The structured database was created by extract the information over unstructured automotive advertisements, which is an area of natural language processing. Information extraction deals with finding factual information in text using learning regular expressions. We manually craft rule-based specific approaches to extract structured information from unstructured web pages. Structured information will be provided by user-friendly search engine designed for topic-specific knowledge. Consequently, this information that extracted from these advertisements uses to perform a structured search over certain interesting attributes. Thus, the tuples are assigned a probability and indexed to support the efficiency of extraction and exploration via user queries.
【 授权许可】
Unknown
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO202106100004904ZK.pdf | 1311KB | download |