Poojary, Vikram ; Dr. Gregory Byrd, Committee Member,Dr. Edward Gehringer, Committee Member,Dr. Yan Solihin, Committee Chair,Poojary, Vikram ; Dr. Gregory Byrd ; Committee Member ; Dr. Edward Gehringer ; Committee Member ; Dr. Yan Solihin ; Committee Chair
Performance tuning of high performance numerical code is an important process which is still largely performed manually.While recent research in automated performance tuning has proposed run-time application configuration and compilation, most compilers in use today do not support such run-time features. As a result, a performance tuner's role is limited to selecting the right compiler optimizations for a given application and environment in which the application runs.Because many compiler optimizations do not give performance benefits in all cases, performance tuners must tediously test each optimization on their applications under a wide range of scenarios. Therefore, it is desirable to automate compiler optimization selection in order to avoid or at least reduce the tuning effort.This thesis deals with the question of whether machine learning techniques can be used to automate compiler optimization selection. It presents a case study in which an Artificial Neural Network (ANN) and a Decision Tree (DT) are constructed, trained, and used to predict whether, for a given loop nest in a shared memory parallel program, loop unrolling optimization should be applied or not. Simple characteristics of the loop nests, such as the nesting level, iteration count, and body size, are collected and used as input to the ANN or DT.The ANN and DT were trained with loop nests from some OpenMP-based NAS parallel benchmarks, and are used to predict the benefit of loop unrolling across different benchmarks, and across different numbers of parallel threads. Various training methods were tried, and in the best case, ANN predicts correctly whether loop unrolling is beneficial in 62\% of the cases, whereas DT predicts correctly whether loop unrolling is beneficial in 56\% of the cases. Although the results show promise, we believe that to accurately automate compiler optimization selection, many other factors may need to be taken into account in characterizing each loop nest, due to complex interactions of loop unrolling with memory hierarchy, data layout, thread partitioning, and instruction-level parallelism.
【 预 览 】
附件列表
Files
Size
Format
View
Predicting Loop Unrolling Impact in OpenMP Programs Using Machine Learning