With the enormous growth in popularity of mobile devices in the past decade, there has been a large push in industry for chip designers and manufacturers to develop powerful yet energy efficient processors. Increasing the parallelism available in the hardware has proven to be a great way to maintain and even improve performance while sustaining a manageable power budget. Specialized hardware such as graphics processing units, multicore systems and vector units are some of the hardware that has allowed the goal of improving performance while maintaining energy efficiency to be realized. These examples of specialized hardware are able to provide great benefits to applications that have computationally intensive algorithms.Such algorithms like video stabilization, object detection and 3D gaming, to name a few, are excellent candidates for making use of this hardware. Also, applications like these are just a few among the many computationally intensive applications found on mobile devices today. This work examines the effects of optimizations using some of the previously mentioned hardware on two different platforms. The first is an ARM based development board and the second an Intel based Ultrabook. Similar optimizations are applied to two computer vision applications. These optimizations are applied on two different levels. First, optimizations were made on a thread level and included utilizing vector units and manipulating control flow to more effectively use the cache. The second set of optimizations was made on a processor level and involved making use of the multiple cores on a chip with OpenMP and Thread Building Blocks. We based the performance of the platforms on three metrics: throughput, energy per frame and throughput per energy, a metric similar to that of the energy-delay product. After performing varying combinations of the optimizations, we ultimately found the Intel based Ultrabook to be the better choice of platform. On the more memory bound vision application, the best configuration on the Ultrabook had a throughput of almost 4x that of the ARM development board with 2x the energy efficiency. The results for the more compute bound application were closer, with the Ultrabook’s best configuration having a throughput of less than 3x that of the development board and only about 1.5x as energy efficient.
【 预 览 】
附件列表
Files
Size
Format
View
A comparative study of the effects of parallelization on ARM and Intel based platforms