In the foreseeable future, high-performance supercomputers will continue to evolve in the direction of attempting to build distributed, immensely parallel and highly heterogeneous machines. It is well known that in order to utilize these machines, good parallel programs are essential. However, conventional parallel programming models were created when supercomputers were smaller and more homogeneous. It is not clear whether these models will enable the same level of productivity for the next generation supercomputers. It is expected thatintermediate runtime systems between software applications and the underlying hardware machine architecture will help abstract away the extreme complexity of future large-scale machines. In the recent past, there have been growing interests in dataflow execution models due to their flexibility in making dynamic decisions. Despite their advantages, the dataflow runtime systems tend to have a low-level programming interface that is difficult to tame. It requires the programmer to decompose the computation and write program to construct dependence graph explicitly, resulting in programs that are difficult to build, debug and maintain.In this thesis, we repurpose the Hierarchically Tiled Array (HTA) programming model for improving the programmability of the dataflow runtime systems. HTA facilitates parallel programming by letting the programmer express algorithms as tiled array operations which contains implicit parallelism. We propose a design to map an HTA program to a dataflow task dependence graph dynamically, so that the programmer can write conventional HTA programs while enjoying the benefits provided by the underlying dataflow runtime system. As a proof of concepts, we implemented our design for the shared memory environment and implemented a variety of benchmarks for performance evaluation. We found that, for applications with high asynchrony and sparse data dependences, our implementation results in simpler programs than those obtained by using the dataflow runtime programming interface and delivers superior performance results thanOpenMP using parallel for loops. We also learned about the scalability issues in our current design and propose solutions as possible future work.
【 预 览 】
附件列表
Files
Size
Format
View
Hierarchically Tiled Arrays as high-level programming abstractions for dataflow runtime systems