Web Information Retrieval using Web Document Structures.
Word relevance;Information Extraction;Web Mining;Information Retrieval;Word Measures
Namjoshi, Nihar ; Dr. Robert StAmant, Committee Chair,Dr. Christopher Healey, Committee Member,Dr. James Lester, Committee Member,Namjoshi, Nihar ; Dr. Robert StAmant ; Committee Chair ; Dr. Christopher Healey ; Committee Member ; Dr. James Lester ; Committee Member
Information domains such as the World Wide Web have enormous information content. The task of extracting information relevant to a particular topic, or trying to predict what sort of information a user is seeking is not a trivial task. For a user, finding information relevant to a particular area of interest can be inconvenient and sometimes frustrating as well. Studies have shown that when users are faced with such a task, they may get easily bored and thus leave a Web site. Traditional Information Retrieval techniques rely on measures such as the frequency of a word in a given document, or the hyperlink connectivity of that particular web document. This approach may not necessarily bring out the important words or terms in a document and thus could be less effective while returning search results for queries. In our approach, we rely not only on the actual text in the document, but we also use the inherent formatting elements in Web pages, derived from the Hyper Text Markup Language (HTML) syntax to support our process of information extraction. We use rules to assign measures to important terms in a document in order to facilitate the relevant Information Extraction. We evaluated our system by asking users to test it and in addition, we compared our results with the results from a conventional search engine.
【 预 览 】
附件列表
Files
Size
Format
View
Web Information Retrieval using Web Document Structures.