学位论文详细信息
Algorithm design on multicore processors for massive-data analysis
Graph algorithms;Parallel computing;Massive data;Financial market data;Streaming data;Keyword scanning
Agarwal, Virat ; Computational Science and Engineering
University:Georgia Institute of Technology
Department:Computational Science and Engineering
关键词: Graph algorithms;    Parallel computing;    Massive data;    Financial market data;    Streaming data;    Keyword scanning;   
Others  :  https://smartech.gatech.edu/bitstream/1853/34839/1/agarwal_virat_201008_phd.pdf
美国|英语
来源: SMARTech Repository
PDF
【 摘 要 】

Analyzing massive-data sets and streams is computationally very challenging. Data sets insystems biology, network analysis and security use network abstraction to construct large-scalegraphs. Graph algorithms such as traversal and search are memory-intensive and typically requirevery little computation, with access patterns that are irregular and fine-grained. The increasingstreaming data rates in various domains such as security, mining, and finance leaves algorithmdesigners with only a handful of clock cycles (with current general purpose computing technology)to process every incoming byte of data in-core at real-time. This along with increasing complexity ofmining patterns and other analytics puts further pressure on already high computational requirement.Processing streaming data in finance comes with an additional constraint to process at low latency,that restricts the algorithm to use common techniques such as batching to obtain high throughput.The primary contributions of this dissertation are the design of novel parallel data analysis algorithmsfor graph traversal on large-scale graphs, pattern recognition and keyword scanning on massivestreaming data, financial market data feed processing and analytics, and data transformation,that capture the machine-independent aspects, to guarantee portability with performance to futureprocessors, with high performance implementations on multicore processors that embed processorspecificoptimizations. Our breadth first search graph traversal algorithm demonstrates a capabilityto process massive graphs with billions of vertices and edges on commodity multicore processorsat rates that are competitive with supercomputing results in the recent literature. We also presenthigh performance scalable keyword scanning on streaming data using novel automata compressionalgorithm, a model of computation based on small software content addressable memories (CAMs)and a unique data layout that forces data re-use and minimizes memory traffic. Using a high-levelalgorithmic approach to process financial feeds we present a solution that decodes and normalizesoption market data at rates an order of magnitude more than the current needs of the market, yetportable and flexible to other feeds in this domain. In this dissertation we discuss in detail algorithmdesign challenges to process massive-data and present solutions and techniques that we believe canbe used and extended to solve future research problems in this domain.

【 预 览 】
附件列表
Files Size Format View
Algorithm design on multicore processors for massive-data analysis 6163KB PDF download
  文献评价指标  
  下载次数:37次 浏览次数:16次