Netrans 简介

Netrans 是Pnna NPU 配套的AI编译器，提供命令行工具 netrans_cli 和 python api netrans_py，其功能是将模型权重转换成在 Pnna NPU 上运行的 nbg（network binary graph）格式文件（.nb 后缀）。

工程结构

Netrans 目录结构如下：

netrans-ai-compiler/
├── bin/                  # 编译器可执行文件
├── netrans_cli/          # 命令行工具
├── netrans_py/           # Python接口
├── examples/             # 示例代码
└── setup.sh              # 安装脚本

安装指南

系统依赖

CPU ： Intel® Core™ i5-6500 CPU @ 3.2 GHz x4 支持 the Intel® Advanced Vector Extensions.
RAM ：至少8GB
硬盘： 160GB
操作系统： Ubuntu 20.04 LTS 64-bit with Python 3.8，不推荐使用其他版本

安装步骤

安装依赖

sudo apt update
sudo apt install build-essential

创建 python3.8 环境

wget "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
mkdir -p ~/app
INSTALL_PATH="${HOME}/app/miniforge3"
bash Miniforge3-Linux-x86_64.sh -b -p ${INSTALL_PATH}
echo "source "${INSTALL_PATH}/etc/profile.d/conda.sh"" >> ${HOME}/.bashrc
echo "source "${INSTALL_PATH}/etc/profile.d/mamba.sh"" >> ${HOME}/.bashrc
source ${HOME}/.bashrc
mamba create -n netrans python=3.8 -y
mamba activate netrans

下载 Netrans

cd ~/app
git clone https://gitlink.org.cn/nudt_dsp/netrans.git

运行配置脚本

cd ~/app/netrans
./setup.sh

Netrans 使用说明

Netrans 提供 tensorflow、caffe、darknet、onnx 和 pytorch 的模型转换示例，请参考示例

命令行工具

Netrans CLI 提供了简单的命令行接口，用于编译和优化模型。基本用法

load.sh model_path  # 模型导入
config.sh model_path  # 参数配置
quantize.sh model_path quantize_type # 模型量化
export.sh model_path quantize_type # 模型导出

详细说明请参考netrans_cli 使用。

Python接口

通过Netrans Python接口，可以方便地在Python脚本中调用编译器。示例代码：

from nertans import Netrans
model_path = 'example/darknet/yolov4_tiny'
netrans_path = "netrans/bin" # 如果进行了export定义申明，这一步可以不用

# 初始化netrans
net = Netrans(model_path,netrans=netrans_path)
# 模型载入
net.load()
# 配置预处理 normlize 的参数
net.config(scale=1,mean=0)
# 模型量化
net.quantize("uint8")
# 模型导出
net.export()

# 模型直接量化成 int16 并导出, 直接复用刚配置好的 inputmeta
net.model2nbg(quantize_type = "int16", inputmeta=True)

详细说明请参考netrans_py 使用。

模型支持

Netrans 支持主流框架见下表。

输入支持	描述
caffe	支持所有的Caffe 模型
Tensorflow	支持版本1.4.x, 2.0.x, 2.3.x, 2.6.x, 2.8.x, 2.10.x, 2.12.x 以tf.io.write_graph()保存的模型
ONNX	支持 ONNX 至 1.14.0， opset支持至19
Pytorch	支持 Pytorch 至 1.5.1
Darknet	支持官网列出 darknet 模型

注意： Pytorch 动态图的特性，建议将 Pytorch 模型导出成 onnx ，再使用 Netrans 进行转换。

算子支持

支持的Caffe算子

absval   |   innerproduct   |   reorg
axpy   |   lrn   |   roipooling
batchnorm/bn   |   l2normalizescale   |   relu
convolution   |   leakyrelu   |   reshape
concat   |   lstm   |   reverse
convolutiondepthwise   |   normalize   |   swish
dropout   |   poolwithargmax   |   slice
depthwiseconvolution   |   premute   |   scale
deconvolution   |   prelu   |   shufflechannel
elu   |   pooling   |   softmax
eltwise   |   priorbox   |   sigmoid
flatten   |   proposal   |   tanh

支持的TensorFlow算子

tf.abs   |   tf.nn.rnn_cell_GRUCell   |   tf.negative
tf.add   |   tf.nn.dynamic_rnn   |   tf.pad
tf.nn.bias_add   |   tf.nn.rnn_cell_GRUCell   |   tf.transpose
tf.add_n   |   tf.greater   |   tf.nn.avg_pool
tf.argmin   |   tf.greater_equal   |   tf.nn.max_pool
tf.argmax   |   tf.image.resize_bilinear   |   tf.reduce_mean
tf.batch_to_space_nd   |   tf.image.resize_nearest_neighbor   |   tf.nn.max_pool_with_argmax
tf.nn.batch_normalization   |   tf.contrib.layers.instance_norm   |   tf.pow
tf.nn.fused_batchnorm   |   tf.nn.fused_batch_norm   |   tf.reduce_mean
tf.cast   |   tf.stack   |   tf.reduce_sum
tf.clip_by_value   |   tf.nn.sigmoid   |   tf.reverse
tf.concat   |   tf.signal.frame   |   tf.reverse_sequence
tf.nn.conv1d   |   tf.slice   |   tf.nn.relu
tf.nn.conv2d   |   tf.nn.softmax   |   tf.nn.relu6
tf.nn.depthwise_conv2d   |   tf.space_to_batch_nd   |   tf.rsqrt
tf.nn.conv1d   |   tf.space_to_depth   |   tf.realdiv
tf.nn.conv3d   |   tf.nn.local_response_normalization   |   tf.reshape
tf.image.crop_and_resize   |   tf.nn.l2_normalize   |   tf.expand_dims
tf.nn.conv2d_transposed   |   tf.nn.rnn_cell_LSTMCelltf.nn_dynamic_rnn   |   tf.squeeze
tf.depth_to_space   |   tf.rnn_cell.LSTMCell   |   tf.strided_slice
tf.equal   |   tf.less   |   tf.sqrt
tf.exp   |   tf.less_equal   |   tf.square
tf.nn.elu   |   tf.logical_or   |   tf.subtract
tf.nn.embedding_lookup   |   tf.logical_add   |   tf.scatter_nd
tf.maximum   |   tf.nn.leaky_relu   |   tf.split
tf.floor   |   tf.multiply   |   tf.nn.swish
tf.matmul   |   tf.nn.moments   |   tf.tile
tf.floordiv   |   tf.minimum   |   tf.nn.tanh
tf.gather_nd   |   tf.matmul   |   tf.unstack
tf.gather   |   tf.batch_matmul   |   tf.where
tf.nn.embedding_lookup   |   tf.not_equal   |   tf.select

支持的ONNX算子

ArgMin   |   LeakyRelu   |   ReverseSequence
ArgMax   |   Less   |   ReduceMax
Add   |   LSTM   |   ReduceMin
Abs   |   MatMul   |   ReduceL1
And   |   Max   |   ReduceL2
BatchNormalization   |   Min   |   ReduceLogSum
Clip   |   MaxPool   |   ReduceLogSumExp
Cast   |   AveragePool   |   ReduceSumSquare
Concat   |   Globa   |   Reciprocal
ConvTranspose   |   lAveragePool   |   Resize
Conv   |   GlobalMaxPool   |   Sum
Div   |   MaxPool   |   SpaceToDepth
Dropout   |   AveragePool   |   Sqrt
DepthToSpace   |   Mul   |   Split
DequantizeLinear   |   Neg   |   Slice
Equal   |   Or   |   Squeeze
Exp   |   Prelu   |   Softmax
Elu   |   Pad   |   Sub
Expand   |   POW   |   Sigmoid
Floor   |   QuantizeLinear   |   Softsign
InstanceNormalization   |   QLinearMatMul   |   Softplus
Gemm   |   QLinearConv   |   Sin
Gather   |   Relu   |   Tile
Greater   |   Reshape   |   Transpose
GatherND   |   Squeeze   |   Tanh
GRU   |   Unsqueeze   |   Upsample
Logsoftmax   |   Flatten   |   Where
LRN   |   ReduceSum   |   Xor
Log   |   ReduceMean   |      |

支持的Darknet算子

avgpool   |   maxpool   |   softmax
batch_normalize   |   mish   |   shortcut
connected   |   region   |   scale_channels
convolutional   |   reorg   |   swish
depthwise_convolutional   |   relu   |   upsample
leaky   |   route   |   yolo
logistic

配置文件说明

Inputmeta.yml 是 config 生成的配置文件模版，该文件用于为Netrans中间模型配置输入层数据集合。 Netrans中的量化、推理、导出和图片转dat的操作都需要用到这个文件。 Inputmeta.yml内容如下：

%YAML 1.2
---
# !!!This file disallow TABs!!!
# "category" allowed values: "image, undefined"
# "database" allowed types: "H5FS, SQLITE, TEXT, LMDB, NPY, GENERATOR"
# "tensor_name" only support in H5FS database
# "preproc_type" allowed types:"IMAGE_RGB, IMAGE_RGB888_PLANAR, IMAGE_RGB888_PLANAR_SEP, 
IMAGE_I420, 
# IMAGE_NV12, IMAGE_YUV444, IMAGE_GRAY, IMAGE_BGRA, TENSOR"
input_meta:
 databases:
 - path: dataset.txt
 type: TEXT
 ports:
 - lid: data_0
 category: image
 dtype: float32
 sparse: false
 tensor_name:
 layout: nhwc
 shape:
 - 50
 - 224
 - 224
 - 3
 preprocess:
 reverse_channel: false
 mean:
 - 103.94
 - 116.78
 - 123.67
 scale: 0.017
 preproc_node_params:
 preproc_type: IMAGE_RGB
 add_preproc_node: false
 preproc_perm:
 - 0
 - 1
 - 2
 - 3
 - lid: label_0
 redirect_to_output: true
 category: undefined
 tensor_name:
 dtype: float32
 shape:
 - 1
 - 1

参数说明：

|  参数   | 说明  |
| :---  | ---  
| input_meta  | 预处理参数配置申明。 |
| databases  | 数据配置，包括设置 path、type 和 ports 。|
| path  | 数据集文件的相对（执行目录）或绝对路径。默认为 dataset.txt, 不建议修改。 |
| type  | 数据集文件格式，固定为TEXT。 |
| ports  | 指向网络中的输入或重定向的输入，目前只支持一个输入，如果网络存在多个输入，请与@ccyh联系。 |
| lid  | 输入层的lid |
| category  | 输入的类别。将此参数设置为以下值之一：image（图像输入）或 undefined（其他类型的输入）。 |
| dtype  | 输入张量的数据类型，用于将数据发送到 pnna 网络的输入端口。支持的数据类型包括 float32 和 quantized。 |
| sparse  | 指定网络张量是否以稀疏格式存在。将此参数设置为以下值之一：true（稀疏格式）或 false（压缩格式）。 |
| tensor_name  | 留空此参数 |
| layout  | 输入张量的格式，使用 nchw 用于 Caffe、Darknet、ONNX 和 PyTorch 模型。使用 nhwc 用于 TensorFlow、TensorFlow Lite 和 Keras 模型。 |
| shape  | 此张量的形状。第一维，shape[0]，表示每批的输入数量，允许在一次推理操作之前将多个输入发送到网络。如果batch维度设置为0，则需要从命令行指定--batch-size。如果 batch维度设置为大于1的值，则直接使用inputmeta.yml中的batch size并忽略命令行中的--batch-size。 |
| fitting  | 保留字段 |
| preprocess  | 预处理步骤和顺序。预处理支持下面的四个参数，参数的顺序代表预处理的顺序。 |
| reverse_channel  | 指定是否保留通道顺序。将此参数设置为以下值之一：true（保留通道顺序）或 false（不保留通道顺序）。对于 TensorFlow 和 TensorFlow Lite 框架的模型使用 true。 |
| mean  | 用于每个通道的均值。 |
| scale  | 张量的缩放值。均值和缩放值用于根据公式 (inputTensor - mean) × scale 归一化输入张量。|
| preproc_node_params  | 预处理节点参数，在 OVxlib C 项目案例中启用预处理任务 |
| add_preproc_node  | 用于处理 OVxlib C 项目案例中预处理节点的插入。[true, false] 中的布尔值，表示通过配置以下参数将预处理层添加到导出的应用程序中。此参数仅在 add_preproc_node 参数设置为 true 时有效。|
| preproc_type  | 预处理节点输入类型。 [IMAGE_RGB, IMAGE_RGB888_PLANAR,IMAGE_YUV420, IMAGE_GRAY, IMAGE_BGRA, TENSOR] 中的字符串值 |
| preproc_perm  | 预处理节点输入的置换参数。 |
| redirect_to_output  | 将database张量重定向到图形输出的特殊属性。如果为该属性设置了一个port，网络构建器将自动为该port生成一个输出层，以便后处理文件可以直接处理来自database的张量。 如果使用网络进行分类，则上例中的lid“input_0”表示输入数据集的标签lid。 请注意，redirect_to_output 必须设置为 true，以便后处理文件可以直接处理来自database的张量。 标签的lid必须与后处理文件中定义的 labels_tensor 的lid相同。 [true, false] 中的布尔值。 指定是否将由张量表示的输入端口的数据直接发送到网络输出。true（直接发送到网络输出）或 false（不直接发送到网络输出）|

需要根据具体模型的参数对生成的inputmeta文件进行修改。