科技报告详细信息
Time-Based Data Streams: Fundamental Concepts for a Data Resource for Streams
Beth A. Plale
关键词: distributed systems;    data management;   
DOI  :  10.2172/966043
RP-ID  :  DOE/ER/25600-1 Final Report
PID  :  OSTI ID: 966043
学科分类:数学(综合)
美国|英语
来源: SciTech Connect
PDF
【 摘 要 】

Real time data, which we call data streams, are readings from instruments, environmental, bodily or building sensors that are generated at regular intervals and often, due to their volume, need to be processed in real time. Often a single pass is all that can be made on the data, and a decision to discard or keep the instance is made on the spot. Too, the stream is for all practical purposes indefinite, so decisions must be made on incomplete knowledge. This notion of data streams has a different set of issues from a file, for instance, that is byte streamed to a reader. The file is finite, so the byte stream is becomes a processing convenience more than a fundamentally different kind of data. Through the duration of the project we examined three aspects of streaming data: the first, techniques to handle streaming data in a distributed system organized as a collection of web services, the second, the notion of the dashboard and real time controllable analysis constructs in the context of the Fermi Tevatron Beam Position Monitor, and third and finally, we examined provenance collection of stream processing such as might occur as raw observational data flows from the source and undergoes correction, cleaning, and quality control. The impact of this work is severalfold. We were one of the first to advocate that streams had little value unless aggregated, and that notion is now gaining general acceptance. We were one of the first groups to grapple with the notion of provenance of stream data also.

【 预 览 】
附件列表
Files Size Format View
RO201705170000626LZ 200KB PDF download
  文献评价指标  
  下载次数:7次 浏览次数:30次