TitleEnergy Efficient On-chip Deep Neural Network (DNN) Inference and Training with Emerging Non-volatile Memory Technologies

Committee:

Dr. Shimeng Yu, ECE, Chair, Advisor

Dr. Callie Hao, ECE

Dr. Yingyan Lin, CS

Dr. Tushar Krishna, ECE

Dr. Saibal Mukhopadhyay, ECE

Abstract: Emerging non-volatile memory (eNVM) technologies are providing new opportunities for designing DNN accelerators with high energy efficiency. In this thesis, DNN accelerator designs using the eNVM-based compute-in-memory (CIM) paradigm and high-density on-chip buffer are proposed. For DNN inference, a CIM accelerator with a reconfigurable interconnect is presented. It optimizes the communication pattern by using application-specific interconnect topology. To support the multi-head self-attention (MHSA) mechanism in transformers, a heterogeneous computing platform with CIM and a digital sparse engine is utilized for the various types of matrix-matrix multiplications involved. A CIM-based approximate computing scheme is proposed to support the run-time sparsity in attention score computation. For DNN training, to overcome the high write energy of eNVM, a hybrid weight cell design using eNVM and a capacitor is proposed for the weight update during training. To store large volumes of intermediate data during training, a dual-mode buffer design is proposed based on ferroelectric materials. It optimizes both the dynamic read/write energy and the standby power by operating at volatile and non-volatile modes.