项目面向低位宽混合精度深度学习模型的FPGA硬件加速器软硬件协同优化,在硬件方面,我们主要探索了各种低位宽卷积算子的packing方法,使得每个FPGA的DSP单元可以容纳尽可能多的低位宽操作,从而提高DSP的利用率。在模型方面,我们主要利用微分NAS技术,对于给定的模型进行混合精度量化,同时兼顾量化位宽在硬件上的实现效率,从而在给定的资源约束下,将目标卷积深度学习模型高效率的部署到FPGA上。
版权所有:中国计算机学会技术支持:开源发展技术委员会
京ICP备13000930号-9
京公网安备 11010802032778号
DeepBurning-MixQ
This is part of the DeepBurning project developed for agile neural network accelerator design in Institute of Computing Technology, Chinese Academy of Sciences. It focuses on the software/hardware co-optimization of FPGA-based accelerators for low bit-width mixed-precision neural network models. In terms of hardware, we mainly explore the packing method of various low bit-width convolution operators, so that each primitive DSP in FPGAs can accommodate as many low bit-width operations as possible, thereby improving DSP utilization. In terms of the model, we mainly utilize differential NAS (Network Architecture Search) technique to perform mixed-precision quantization on the given model, while also considering the hardware implementation efficiency of the quantized model, in order to efficiently deploy the target convolutional neural network model onto FPGA under given resource constraints.
This work will appear in ICCAD’23 and please refer to the paper for more details.
Erjing Luo#, Haitong Huang#, Cheng Liu*, Guoyu Li, Bing Yang, Ying Wang, Huawei Li, Xiaowei Li, “DeepBurning-MixQ: An Open Source Mixed-Precision Neural Network Accelerator Design Framework for FPGAs”, ICCAD, 2023. (# equal contribution)
Status
This project mainly explores automatic HW/SW co-optimization of FPGA-based neural network accelerators for mixed-precision neural network models. Currently, we have the mixed-precision neural network models fully pipelined across the FPGA, so it mainly targets smaller neural network models with limited layers. A hybrid multi-core neural network accelerator that can accommodate generic mixed-precision neural network models will come coon.
Classification Model
Usage
DAC-SDC Object Detection Model
The DAC System Design Contest focused on low-power object detection on an embedded FPGA system: https://www.dac.com/Conference/System-Design-Contest.
The target of this contest is optimize performance of the designs in terms of accuracy and power on a Ultra 96 v2 FPGA board. This contest was held 5 times, from 2018 to 2022, and the performance of optimal design in these years increased from 30 fps to thousands of fps.
Base models for anypacking bitwidth search:
Dataset: See https://byuccl.github.io/dac_sdc_2022/info/.
Usage: First
cd dacsdc/, then follow next steps.1) Hardware-aware Mixed Precison NAS for bit width
2) Main Train
For UltraNet:
For UltraNet_Bypass/SkyNet/SkyNetk5
3) Test model
4) HLS export
5) Model-Level Hardware Simulation
Reference