学位论文详细信息
A reformulated approach to attribute-aware sampling on large networks
Sampling;Networks;Data Mining, ,
Shang, Charles ; Sundaram ; Hari
关键词: Sampling;    Networks;    Data Mining, ,;   
Others  :  https://www.ideals.illinois.edu/bitstream/handle/2142/104885/SHANG-THESIS-2019.pdf?sequence=1&isAllowed=y
美国|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF
【 摘 要 】

Sampling has long been an important tool for extracting subsets of data for data mining tasks. As the scale of information produced has increased, efficient sampling is only becoming more important. Uniform sampling is often the preferred technique of choice, due to its simplicity and speed. However, many network based data sources prevent random access, necessitating a different way to sample. Algorithms like Breadth first search, Random walk, Expansion sampling, or other related strategies fulfill this role currently. But these algorithms are focused mainly on ensuring properties based on the structure of the graph, without consideration for the attributes of each node.In this study, we take an existing attribute aware sampler and propose a natural reformulation of the algorithm. We present a new surprise function that avoids some drawbacks of a previous work and take advantage of the submodularity property to reduce the computation that needs to be done when selecting a node and make some arguments about the efficiency and effectiveness of such a strategy. We test our algorithm on some real world data sets and found that our algorithm had increases in sample attribute coverage by up to 4 times when compared to techniques like random walk while still taking time approximately linear in the size of the sample

【 预 览 】
附件列表
Files Size Format View
A reformulated approach to attribute-aware sampling on large networks 576KB PDF download
  文献评价指标  
  下载次数:4次 浏览次数:19次