期刊论文详细信息
Journal of Computer Science
Combined Heuristic Technique for Optimization of Bloom Filter in Spam Filtering | Science Publications
Arulanand Natarajan1  S. Subramanian1  K. Premalatha1 
关键词: Spam word;    hash function;    genetic algorithm;    simulated annealing;    static set;    premature convergence;    long vector;    bit array;   
DOI  :  10.3844/jcssp.2011.1439.1447
学科分类:计算机科学(综合)
来源: Science Publications
PDF
【 摘 要 】

Problem statement: Spam is an irrelevant or inappropriate message sent on the internet to a large number of newsgroups or users. A spam word is a list of well-known words that often appear in spam mails. Bloom Filter (BF) is used for identification of spam word. Approach: BF is a simple but powerful data structure that can check membership to a static set. The trade-off to use BF is a certain configurable risk of false positives. The odds of a false positive can be made very low if the hash bitmap is sufficiently large. Bin Bloom Filter (BBF) has number of BFs which assign group of words into bins with different false positive rates based on weight of the spam words. Genetic Algorithm (GA) was employed to minimize the total membership invalidation cost of BBF. GA had premature convergence problem. Simulated Annealing (SA) was incorporated with GA to prevent the premature convergence effectively. Results: The experimental results of total membership invalidation cost are analyzed for various sizes of bins. The results showed that the combined GA-SA model outperforms SA and GA model. Conclusion: GA has premature convergence due to its genetic operators that are not able to generate offsprings which are superior to the parents. So more number of similar chromosomes presented on the population. When GA is incorporated with SA new genes were introduced which causes diversity in the population and prevents premature convergence. The combined GA-SA outperforms GA and SA.

【 授权许可】

Unknown   

【 预 览 】
附件列表
Files Size Format View
RO201911300742776ZK.pdf 325KB PDF download
  文献评价指标  
  下载次数:3次 浏览次数:4次