通过自动格式选择和机器学习优化稀疏线性代数

稀疏矩阵是科学模拟的重要组成部分。随着硬件的发展,提出了新的稀疏矩阵存储格式,旨在利用特定于新硬件的优化。在异构计算时代,用户经常需要使用多个格式来保持其应用程序在不同可用硬件上的最优性,从而导致开发时间加长和维护开销增加。解决这个问题的一个潜在方法是使用由机器学习(Machine Learning,ML)驱动的轻量级自动调谐器,从可用格式池中选择适合稀疏性模式、目标硬件和操作的最佳格式,以便为用户选择。

本文介绍了Morpheus-Oracle,一个提供轻量级ML自动调谐器的库,能够准确预测多个后端的最优格式,目标是主要的HPC架构,旨在消除结束用户对格式选择的输入。通过对2000多个实际矩阵的测试,我们在可用系统中实现了92.63%的平均分类准确性和80.22%的平衡准确性。采用自动调谐器的平均加速比为1.1x(在CPU上)和1.5x到8x(在NVIDIA和AMD GPU上),最大加速比分别达到7x和1000x。
Sparse matrices are an integral part of scientific simulations. As hardware
evolves new sparse matrix storage formats are proposed aiming to exploit
optimizations specific to the new hardware. In the era of heterogeneous
computing, users often are required to use multiple formats for their
applications to remain optimal across the different available hardware,
resulting in larger development times and maintenance overhead. A potential
solution to this problem is the use of a lightweight auto-tuner driven by
Machine Learning (ML) that would select for the user an optimal format from a
pool of available formats that will match the characteristics of the sparsity
pattern, target hardware and operation to execute.
In this paper, we introduce Morpheus-Oracle, a library that provides a
lightweight ML auto-tuner capable of accurately predicting the optimal format
across multiple backends, targeting the major HPC architectures aiming to
eliminate any format selection input by the end-user. From more than 2000
real-life matrices, we achieve an average classification accuracy and balanced
accuracy of 92.63% and 80.22% respectively across the available systems. The
adoption of the auto-tuner results in average speedup of 1.1x on CPUs and 1.5x
to 8x on NVIDIA and AMD GPUs, with maximum speedups reaching up to 7x and 1000x
respectively.
论文链接:http://arxiv.org/pdf/2303.05098v1


Posted

in

by

Tags: