Wai-Chi Fang (National Yang Ming Chiao Tung University, Taiwan)
Over the past decade, the significance and appli- cation of deep learning have surged, showcasing their ability to surpass domain experts in prediction accuracy within short time- frames. However, their computational efficiency relies on intricate algorithmic complexity, necessitating substantial resources. To address this, researchers have developed hardware platforms to accelerate these algorithms. This paper explores various approaches to enhance computational efficiency, including lever- aging parallel computing, optimizing data flow, employing loop tiling techniques, and implementing data quantization. Addition- ally, learnable adaptive quantization and neural network spar- sification analysis are proposed to further refine computational demands. The paper adopts a strategy incorporating optimal parallel computing, data flow optimization, and tiling techniques to design processing elements for AI accelerators. The efficacy of these techniques is validated through the implementation of Long-term Recurrent Convolutional Networks (LRCN) in Electroencephalography-based affective computing applications.