Improving the Performance of DGEMM with MoA and Cache-Blocking: Preprint

Stephen Thomas, Lenore Mullin, Katarzyna Swirydowicz

Research output: Contribution to conferencePaper

Abstract

The goal of this paper is to demonstrate performance enhancements of the high performance dense linear algebra matrix-matrix multiply DGEMM kernel, widely implemented by vendors in the basic linear algebra subroutine BLAS library. The mathematics of arrays (MoA) paradigm due to Mullin (1988) results in contiguous memory accesses in combination with Church-Rosser complete language constructs optimized for target processor architectures [3]. Our performance studies demonstrate that the MoA implementation of DGEMM combined with optimal cache-blocking strategies results in at least a 25% performance gain on both Intel Xeon Skylake and IBM Power-9 processors over the vendor supplied Intel MKL and IBM ESSL basic linear algebra libraries. Results are presented for the NREL Eagle and ORNL Summit supercomputers.
Original languageAmerican English
Number of pages8
StatePublished - 2022
EventARRAY '21 -
Duration: 20 Jun 202126 Jun 2021

Conference

ConferenceARRAY '21
Period20/06/2126/06/21

NREL Publication Number

  • NREL/CP-2C00-80232

Keywords

  • cache-blocking
  • contiguous memory
  • DGEMM
  • mathematics of arrays
  • MoA

Fingerprint

Dive into the research topics of 'Improving the Performance of DGEMM with MoA and Cache-Blocking: Preprint'. Together they form a unique fingerprint.

Cite this