期刊论文

【摘要】

In most parallel loops of embedded applications, everyiteration executes the exact same sequence of instructions while manipulating different data. This fact motivates a new compiler-hardware orchestrated execution framework in which all parallel threads share one fetch unit and one decode unit but have their own execution, memory, and write-back units. This resource sharing enables parallel threads to execute in lockstep with minimal hardware extension and compiler support. Our proposed architecture, called multithreaded lockstep execution processor (MLEP), is a compromise between the single-instruction multiple-data (SIMD) and symmetric multithreading/chip multiprocessor (SMT/CMP) solutions. The proposed approach is more favorable than a typical SIMD execution in terms of degree of parallelism, range of applicability, and code generation, and can save more power and chip area than the SMT/CMP approach without significant performance degradation. For the architecture verification, we extend a commercial 32-bit embedded core AE32000C and synthesize it on Xilinx FPGA. Compared to the original architecture, our approach is 13.5% faster with a 2-way MLEP and 33.7% faster with a 4-way MLEP in EEMBC benchmarks which are automatically parallelized by the Intel compiler.

【授权许可】

【预览】

附件列表
Files	Size	Format	View
20150520113301136.pdf	519KB	PDF	download

【参考文献】

[1]H.C. Hunter and J.H. Moreno, "A New Look at Exploiting Data Parallelism in Embedded Systems," CASE, 2003, pp. 159-169.
[2]I. Karkowski and H. Corporaal, "Exploiting Fine- and Coarse-Grain Parallelism in Embedded Programs," PACT, 1998, pp. 60-67.
[3]J.E. Smith and G.S. Sohi, "The Microarchitecture of Superscalar Processors," Proc. of the IEEE, vol. 83, Dec. 1995, pp.1609-1624.
[4]D.M. Tullsen et al., "Simultaneous Multithreading: Maximizing On-Chip Parallelism," ISCA-22, June 1995.
[5]Analog Devices, Inc. ADSP-BF561 Blackfin Embedded Symmetric Multiprocessor Rev. 0.
[6]ARM. ARM11 MPCore. http://www.arm.com/.
[7]EEMBC (EDN Embedded Microprocessor Benchmark Consortium). http://www.eembc.org.
[8]J. Oh et al., "OpenMP and Compilation Issue in Embedded Applications," LNCS, vol. 2716, June 2003, pp. 109-121.
[9]Extendable Instruction Set Computer. http://www.adc.co.kr.
[10]A. Eichenberger et al., "A Tutorial on BG/L Dual FPU Simdization," BlueGen System Software Workshop, 2005.
[11]C. Kozyrakis and D. Patterson, "Vector vs. Superscalar and VLIW Architectures for Embedded Multimedia Benchmarks," MICRO-35, 2002, pp. 283-293.
[12]D. Talla et al., "Evaluating Signal Processing and Multimedia Applications on SIMD, VLIW, and Superscalar Architectures," ICCD, 2000, pp. 163-172.
[13]OpenMP Forum, http://www.openmp.org/. OpenMP: A Proposed Industry Standard API for Shared Memory Programming, Oct. 1997.
[14]M. Sato et al., "Design of OpenMP Compiler for an SMP Cluster," EWOMP, Sept. 1999, pp. 32-39.
[15]H.G. Nguyen, S.J. Hwang, and S.W. Kim, "Compiler Construction for Lockstep Execution of Multithreaded Processors," CIT, 2007, pp. 829-834.
[16]J.L. Lo et al., "Converting Thread-Level Parallelism to Instruction-Level Parallelism via Simultaneous Multithreading," ACM Trans. Computer Systems, vol. 15, no. 3, 1997, pp. 322-354.
[17]J. Collins and D. Tullsen, "Clustered Multithreaded Architectures: Pursuing both IPC and Cycle Time," IPDPS, 2004, pp. 766-775.
[18]H. Zhong, S.A. Lieberman, and S.A. Mahlke, "Extending Multicore Architectures to Exploit Hybrid Parallelism in Single-Thread Applications, HPCA, Feb. 2007, pp. 25-36.
[19]J.R. Nickols, "The Design of the MasPar MP-1: A Cost Effective Massively Parallel Computer," IEEE COMPCON, Spring 1990, pp. 25-28.
[20]W.W.L. Fung et al., "Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow," MICRO, Dec. 2007, pp. 407-420.
[21]T.R. Halfhill, "Parallel Processing With CUDA," Microprocessor Report, Jan. 2008.
[22]GeForce Family, http://www.nvidia.com/page/geforce8.html.

ETRI Journal
Exploiting Thread-Level Parallelism in Lockstep Execution by Partially Duplicating a Single Pipeline


关键词: MLEP; CMP; SMT; TLP; ILP;
Others : 1185660 DOI : 10.4218/etrij.08.0107.0343

PDF


	文献评价指标
	下载次数：18次	浏览次数：38次

【 摘 要 】

【 授权许可】

【 预 览 】

【 参考文献 】

【摘要】

【授权许可】

【预览】

【参考文献】