会议论文详细信息
2019 International Conference on Advanced Electronic Materials, Computers and Materials Engineering
Research and Design of Theme Image Crawler Based on Difference Hash Algorithm
无线电电子学;计算机科学;材料科学
Wang, De-Zhi^1 ; Liang, Jun-Yan^2
Department of Computer Engineering, North China Institute of Science and Technology, Yanjiao
065201, China^1
Department of Library, North China Institute of Science and Technology, Yanjiao
065201, China^2
关键词: Correlation algorithm;    Function module;    Hash algorithm;    High repetition rate;    Image resources;    Image similarity;    PageRank algorithm;    Web resources;   
Others  :  https://iopscience.iop.org/article/10.1088/1757-899X/563/4/042080/pdf
DOI  :  10.1088/1757-899X/563/4/042080
来源: IOP
PDF
【 摘 要 】

For the problem of high repetition rate of image resources collected by general theme crawler, a theme image crawler system is designed to reduce image similarity. The main contents of the design include the main function modules of the crawler, the workflow of the system and the implementation method of the key modules. The difference hash algorithm is used to solve the problem of image similarity effectively. Combined with Web text cosine correlation algorithm and link PageRank algorithm, the paper comprehensively evaluates the relevance between Web resources and topics. The experimental results show that the subject image crawler can effectively reduce the similarity of the collected images and improve the efficiency of crawler image resources acquisition.

【 预 览 】
附件列表
Files Size Format View
Research and Design of Theme Image Crawler Based on Difference Hash Algorithm 630KB PDF download
  文献评价指标  
  下载次数:13次 浏览次数:13次