学位论文

【摘要】

Parallelization is one of the major challenges for programmers. But parallelizing existingcode is a hard task that can lead to less than optimal solutions since sequential programscan su er from impediments to parallelization resulting from the semantic of the languagesor the data structures used rather than the nature of the problem being solved. To avoidsuch artifacts, programmers can analyze the algorithms to decide which dependencies are"real" and which can be ignored. But even then, conventional algorithms were developedwith speci c objectives in mind, such as reducing the total number of operations, which whilegood to achieve sequential performance, may not be the primary objective when consideringparallel machines. We propose to focus on a speci c domain and attack the parallelizing issueat the source, starting from a high level description of the equations without any knowledgeof existing algorithms to solve the problem and automatically derive parallel solutions.Hydra accepts an equation written in terms of operations on matrices and automaticallyproduces highly e cient code to solve these equations. Processing of the equation starts bytiling the matrices. This transforms the equation into either a single new equation containingterms involving tiles or into multiple equations some of which can be solved in parallel witheach other.Hydra continues transforming the equations using tiling and seeking terms that Hydraknows how to compute or equations it knows how to solve. The end result is that by transformingthe equations Hydra can produce multiple solvers with di erent locality behaviorand/or di erent parallel execution pro les. Next, Hydra applies empirical search over this space of possible solvers to identify the most e cient version. In this way, Hydra enablesthe automatic production of e cient solvers requiring very little or no coding at all and deliveringperformance approximating that of the highly tuned library routines such as IntelsMKL.With faster development time for modern architecture, the time available for hand-tuningof high performance libraries diminishes. Intel already started o ering auto-tuned libraryroutines (From Spiral) in their IPP library, to broaden the scope of application of thecollection, without having to increase the man hours required to hand-tune everything.

【预览】

附件列表
Files	Size	Format	View
Automatic algorithm derivation and exploration in linear algebra for parallelism and locality	535KB	PDF	download


Automatic algorithm derivation and exploration in linear algebra for parallelism and locality
Autotuning;Linear Algebra;Parallelism;Multicore;Tiling
Duchateau, Alexandre
关键词: Autotuning; Linear Algebra; Parallelism; Multicore; Tiling;
Others : https://www.ideals.illinois.edu/bitstream/handle/2142/45394/Alexandre_Duchateau.pdf?sequence=1&isAllowed=y
美国\|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF


	文献评价指标
	下载次数：9次	浏览次数：32次

【 摘 要 】

【 预 览 】

【摘要】

【预览】