Analyzing Offloading Inefficiencies in Scalable Heterogeneous Applications


Analyzing Offloading Inefficiencies in Scalable Heterogeneous Applications

Dietrich, R.; Tschüter, R.; Juckeland, G.; Knüpfer, A.

Abstract

With the rise of accelerators in high performance computing, programming models for the development of heterogeneous applications have evolved and are continuously being improved to increase program performance and programmer productivity. The concept of computation offloading to massively parallel compute devices has established itself as a new layer of parallelism in scientific applications, next to message passing and multi-threading. To optimize the execution of a respective parallel heterogeneous program for a specific platform, performance analysis is crucial. This work abstracts from specific offloading APIs such as available with CUDA, OpenCL, OpenACC, and OpenMP and summarizes common inefficiencies for offloading. Based on the definition of inefficiency patterns, the offloading concept can be included in generic analysis techniques such as critical-path and root-cause analysis. We implemented the detection and evaluation of inefficiency patterns as a post-mortem trace analysis, which finally highlights program activities with a high potential to reduce the total program runtime.

Keywords: performance analysis; computation offloading; heterogeneous applications; critical path

  • Beitrag zu Proceedings
    2nd International Workshop on Performance Portable Programming Models for Accelerators (P^3MA), colocated with the ISC High Performance Conference in Frankfurt, Germany, 22.06.2017, Frankfurt/Main, Deutschland
    High Performance Computing. ISC High Performance 2017. Lecture Notes in Computer Science, vol 10524, Cham: Springer, 978-3-319-67630-2, 457-476
    DOI: 10.1007/978-3-319-67630-2_34
    Cited 1 times in Scopus

Permalink: https://www.hzdr.de/publications/Publ-25671