Title: HW-SW Co-Design to Accelerate Dense and Sparse Deep Neural Network Workloads
Date: Thursday, May 4, 2023
Time: 1:00 PM - 2:30 PM ET
Location: Virtual (Click here to join the meeting)
Geonhwa Jeong
Ph.D. Student
School of Computer Science
College of Computing
Georgia Institute of Technology
Committee:
Dr. Tushar Krishna (advisor), School of Electrical and Computer Engineering & School of Computer Science, Georgia Institute of Technology
Dr. Hyesoon Kim, School of Computer Science, Georgia Institute of Technology
Dr. Vivek Sarkar, School of Computer Science, Georgia Institute of Technology
Abstract
As AI-based applications become pervasive, CPU vendors are starting to incorporate dense matrix engines within the datapath to boost efficiency. However, we observe that incorporating them inside CPUs can introduce under-utilization and stalls due to limited register storage to amortize the fill and drain times of the array. Moreover, as DL workloads embrace sparsity to reduce the computations and memory size of models, it is also imperative for CPUs to add support for sparsity to skip ineffectual computations and memory accesses.
First, we present RASA, an efficient register-aware systolic array as a matrix engine for CPU. We develop techniques to divide an execution stage into several sub-stages and overlap instructions to hide overheads and run them concurrently. Next, we present VEGETA, a sparse matrix engine extending a dense matrix engine with flexible structured sparsity support. Also, we show how VEGETA engines can be used for different sparsity granularities, such as network-wise, layer-wise, and tile-wise. Finally, we propose our on-going work, TASD, an approximation method to decompose an unstructured sparse tensor into a series of structured sparse tensors and show how TASD can be applied to accelerate the execution of both dense and sparse DNNs.