Anakin is a cross-platform, high-performance inference engine, which is originally
developed by Baidu engineers and is a large-scale application of industrial products.
Anakin is a cross-platform, high-performance inference engine, supports a wide range of neural network architectures and different hardware platforms. It is easy to run Anakin on GPU / x86 / ARM platform.
Anakin has integrated with NVIDIA TensorRT and open source this part of integrated API to provide services, developers can call the API directly or modify it as needed, which will be more flexible for development requirements.
High performance
In order to give full play to the performance of hardware, we optimized the
forward prediction at different levels.
- Automatic graph fusion. The goal of all performance optimizations under a
given algorithm is to make the ALU as busy as possible. Operator fusion
can effectively reduce memory access and keep the ALU busy.
- Memory reuse. Forward prediction is a one-way calculation. We reuse
the memory between the input and output of different operators, thus
reducing the overall memory overhead.
- Assembly level optimization. Saber is a underlying DNN library for Anakin, which
is deeply optimized at assembly level.
NV GPU Benchmark
Machine And Enviornment
CPU: Intel(R) Xeon(R) CPU 5117 @ 2.0GHz GPU: Tesla P4 cuda: CUDA8 cuDNN: v7
Time:warmup 10,running 1000 times to get average time
Latency (ms) and Memory(MB) of different batch
The counterpart of Anakin is the acknowledged high performance inference engine NVIDIA TensorRT 5 , The models which TensorRT 5 doesn’t support we use the custom plugins to support.
VGG16
Batch_Size
RT latency FP32(ms)
Anakin2 Latency FP32 (ms)
RT Memory (MB)
Anakin2 Memory (MB)
1
8.52532
8.2387
1090.89
702
2
14.1209
13.8772
1056.02
768.76
4
24.4529
24.3391
1002.17
840.54
8
46.7956
46.3309
1098.98
935.61
Resnet50
Batch_Size
RT latency FP32(ms)
Anakin2 Latency FP32 (ms)
RT Latency INT8 (ms)
Anakin2 Latency INT8 (ms)
RT Memory FP32(MB)
Anakin2 Memory FP32(MB)
1
4.6447
3.0863
1.78892
1.61537
1134.88
311.25
2
6.69187
5.13995
2.71136
2.70022
1108.86
382
4
11.1943
9.20513
4.16771
4.77145
885.96
406.86
8
19.8769
17.1976
6.2798
8.68197
813.84
532.61
Resnet101
Batch_Size
RT latency (ms)
Anakin2 Latency (ms)
RT Latency INT8 (ms)
Anakin2 Latency INT8 (ms)
RT Memory (MB)
Anakin2 Memory (MB)
1
9.98695
5.44947
2.81031
2.74399
1159.16
500.5
2
17.3489
8.85699
4.8641
4.69473
1158.73
492
4
20.6198
16.8214
7.11608
8.45324
1021.68
541.08
8
31.9653
33.5015
11.2403
15.4336
914.49
611.54
X86 CPU Benchmark
Machine And Enviornment
CPU: Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz with HT, for FP32 test CPU: Intel(R) Xeon(R) Gold 6271 CPU @ 2.60GHz with HT, for INT8 test System: CentOS 6.3 with GCC 4.8.2, for benchmark between Anakin and Intel Caffe
All test enable 8 thread parallel
Time:warmup 10,running 200 times to get average time
The counterpart of Anakin is Intel Cafe(1.1.6) with mklml.
We also provide English and Chinese tutorial documentation.
User guide
You can get the working principle of the project, C++ interface description and code examples from here. You can also learn about the model converter here.
Anakin2.0
Welcome to the Anakin GitHub.
Anakin is a cross-platform, high-performance inference engine, which is originally developed by Baidu engineers and is a large-scale application of industrial products.
Please refer to our release announcement to track the latest feature of Anakin.
Features
Flexibility
Anakin is a cross-platform, high-performance inference engine, supports a wide range of neural network architectures and different hardware platforms. It is easy to run Anakin on GPU / x86 / ARM platform.
Anakin has integrated with NVIDIA TensorRT and open source this part of integrated API to provide services, developers can call the API directly or modify it as needed, which will be more flexible for development requirements.
High performance
In order to give full play to the performance of hardware, we optimized the forward prediction at different levels.
NV GPU Benchmark
Machine And Enviornment
ms) and Memory(MB) of different batchVGG16
Resnet50
Resnet101
X86 CPU Benchmark
Machine And Enviornment
8 thread parallelARM CPU Benchmark
Machine And Enviornment
ARMv8 TEST
ms) ofone batchARMv7 TEST
ms) ofone batchDocumentation
All you need is in Doc Index
We also provide English and Chinese tutorial documentation.
User guide
You can get the working principle of the project, C++ interface description and code examples from here. You can also learn about the model converter here.
Developer guide
You might want to know more details of Anakin and make it better. Please refer to how to add custom devices and how to add custom device operators.
How to Contribute
We appreciate your contributions!
Ask Questions
You are welcome to submit questions and bug reports as Github Issues.
Copyright and License
Anakin is provided under the Apache-2.0 license.
Acknowledgement
Anakin refers to the following projects: