There are two venues for many-core machines to gain higher performance: increasing the number of processors andnumber of vector units in one SIMD processor. A truly scalable algorithm should take advantage for both venues.However, most of past research, on scalable memory allocators such as atomic operation based lock-free algorithms,can be scalable with number of processors growing, but have poor scalability with the number of vector units in oneSIMD processor growing. As a result, they are not truly scalable in many-core architecture.In this work, we introduce our proposed solution used in the design of XMalloc, an truly scalable, efficient lockfreememory allocator. We will present (1) Our solution for transforming traditional atomic CAS(Compare-And-Swap)based lock-free algorithm to be truly scalable for many-core architecture. (2) A hierarchical cache-like buffer solutionto reduce the average latency for accessing non-scalable or slow resource such as the memory system in many-coremachine.We used XMalloc as a memory allocator for NVIDIA Tesla C1600 with 240 processing units. Our experimentalresults show that XMalloc achieves very good scalability in terms of the number of processors and the number ofvector units in each SIMD processor growing. Our truly scalability lock-free solution achieve 211 times speedupcomparing to the common lock-free solution.
【 预 览 】
附件列表
Files
Size
Format
View
Xmalloc: a scalable lock-free dynamic memory allocator for many-core machines