Jetson-inference介绍
Jetson-inference是NVIDIA Jetson Nano/TX1/TX2/Xavier NX/AGX-Xavier推理和实时DNN视觉库,使用NVIDIA TensorRT将神经网络有效地部署到嵌入式Jetson平台上,通过图形优化、内核融合和FP16/INT8精度提高性能和功耗优化。视觉推理,如用于图像识别的imageNet、用于对象检测的detectNet和用于语义分割的segNet,都继承自共享的tensorNet对象。提供了从实时摄像机馈送流和处理图像的示例。
github地址:https://github.com/dusty-nv/jetson-inference
下载和编译
首先安装git和cmake工具:
$ sudo apt-get install git cmake
接着从github克隆jetson-inference:
$ git clone https://github.com/dusty-nv/jetson-inference
$ cd jetson-inference
$ git submodule update --init
cmake配置,这个过程会自动下载许多模型(建议科学上网下载模型):
$ mkdir build #创建build文件夹
$ cd build #进入build
$ cmake ../ #运行cmake
编译:
$ cd jetson-inference/build
$ make
$ sudo make install
编译成功后的文件夹结构如下:
|-build
\aarch64
\bin
\include
\lib
\armhf
\bin
\include
\lib
运行测试
进入文件夹并运行imagenet:
$ cd jetson-inference/build/aarch64/bin
$ ./imagenet-console orange_0.jpg output_0.jpg
结果如下:
imagenet-console
args (3): 0 [./imagenet-console] 1 [orange_0.jpg] 2 [output_0.jpg]
imageNet -- loading classification network model from:
-- prototxt networks/googlenet.prototxt
-- model networks/bvlc_googlenet.caffemodel
-- class_labels networks/ilsvrc12_synset_words.txt
-- input_blob 'data'
-- output_blob 'prob'
-- batch_size 2
[TRT] TensorRT version 5.0.6
[TRT] detected model format - caffe (extension '.caffemodel')
[TRT] desired precision specified for GPU: FASTEST
[TRT] requested fasted precision for device GPU without providing valid calibrator, disabling INT8
[TRT] native precisions detected for GPU: FP32, FP16
[TRT] selecting fastest native precision for GPU: FP16
[TRT] attempting to open engine cache file networks/bvlc_googlenet.caffemodel.2.1.GPU.FP16.engine
[TRT] cache file not found, profiling network model on device GPU
[TRT] device GPU, loading networks/googlenet.prototxt networks/bvlc_googlenet.caffemodel
[TRT] retrieved Output tensor "prob": 1000x1x1
[TRT] retrieved Input tensor "data": 3x224x224
[TRT] device GPU, configuring CUDA engine
[TRT] device GPU, building FP16: ON
[TRT] device GPU, building INT8: OFF
[TRT] device GPU, building CUDA engine (this may take a few minutes the first time a network is loaded)
[TRT] device GPU, completed building CUDA engine
[TRT] network profiling complete, writing engine cache to networks/bvlc_googlenet.caffemodel.2.1.GPU.FP16.engine
[TRT] device GPU, completed writing engine cache to networks/bvlc_googlenet.caffemodel.2.1.GPU.FP16.engine
[TRT] device GPU, networks/bvlc_googlenet.caffemodel loaded
[TRT] device GPU, CUDA engine context initialized with 2 bindings
[TRT] binding -- index 0
-- name 'data'
-- type FP32
-- in/out INPUT
-- # dims 3
-- dim #0 3 (CHANNEL)
-- dim #1 224 (SPATIAL)
-- dim #2 224 (SPATIAL)
[TRT] binding -- index 1
-- name 'prob'
-- type FP32
-- in/out OUTPUT
-- # dims 3
-- dim #0 1000 (CHANNEL)
-- dim #1 1 (SPATIAL)
-- dim #2 1 (SPATIAL)
[TRT] binding to input 0 data binding index: 0
[TRT] binding to input 0 data dims (b=2 c=3 h=224 w=224) size=1204224
[cuda] cudaAllocMapped 1204224 bytes, CPU 0x100e30000 GPU 0x100e30000
[TRT] binding to output 0 prob binding index: 1
[TRT] binding to output 0 prob dims (b=2 c=1000 h=1 w=1) size=8000
[cuda] cudaAllocMapped 8000 bytes, CPU 0x100f60000 GPU 0x100f60000
device GPU, networks/bvlc_googlenet.caffemodel initialized.
[TRT] networks/bvlc_googlenet.caffemodel loaded
imageNet -- loaded 1000 class info entries
networks/bvlc_googlenet.caffemodel initialized.
loaded image orange_0.jpg (1920 x 1920) 58982400 bytes
[cuda] cudaAllocMapped 58982400 bytes, CPU 0x101060000 GPU 0x101060000
[TRT] layer conv1/7x7_s2 + conv1/relu_7x7 input reformatter 0 - 1.971458 ms
[TRT] layer conv1/7x7_s2 + conv1/relu_7x7 - 12.103073 ms
[TRT] layer pool1/3x3_s2 - 2.291198 ms
[TRT] layer pool1/norm1 - 0.626354 ms
[TRT] layer conv2/3x3_reduce + conv2/relu_3x3_reduce - 1.402032 ms
[TRT] layer conv2/3x3 + conv2/relu_3x3 - 22.193958 ms
[TRT] layer conv2/norm2 - 1.714010 ms
[TRT] layer pool2/3x3_s2 - 2.223438 ms
[TRT] layer inception_3a/1x1 + inception_3a/relu_1x1 || inception_3a/3x3_reduce + inception_3a/relu_3x3_reduce || inception_3a/5x5_reduce + inception_3a/relu_5x5_reduce - 3.953125 ms
[TRT] layer inception_3a/3x3 + inception_3a/relu_3x3 - 6.141875 ms
[TRT] layer inception_3a/5x5 + inception_3a/relu_5x5 - 1.052396 ms
[TRT] layer inception_3a/pool - 0.945416 ms
[TRT] layer inception_3a/pool_proj + inception_3a/relu_pool_proj - 0.752500 ms
[TRT] layer inception_3a/1x1 copy - 0.136875 ms
[TRT] layer inception_3b/1x1 + inception_3b/relu_1x1 || inception_3b/3x3_reduce + inception_3b/relu_3x3_reduce || inception_3b/5x5_reduce + inception_3b/relu_5x5_reduce - 4.915782 ms
[TRT] layer inception_3b/3x3 + inception_3b/relu_3x3 - 13.518437 ms
[TRT] layer inception_3b/5x5 + inception_3b/relu_5x5 - 4.846615 ms
[TRT] layer inception_3b/pool - 1.560260 ms
[TRT] layer inception_3b/pool_proj + inception_3b/relu_pool_proj - 1.945625 ms
[TRT] layer inception_3b/1x1 copy - 0.240729 ms
[TRT] layer pool3/3x3_s2 - 1.658229 ms
[TRT] layer inception_4a/1x1 + inception_4a/relu_1x1 || inception_4a/3x3_reduce + inception_4a/relu_3x3_reduce || inception_4a/5x5_reduce + inception_4a/relu_5x5_reduce - 3.698021 ms
[TRT] layer inception_4a/3x3 + inception_4a/relu_3x3 - 3.677865 ms
[TRT] layer inception_4a/5x5 + inception_4a/relu_5x5 - 0.617812 ms
[TRT] layer inception_4a/pool - 0.589011 ms
[TRT] layer inception_4a/pool_proj + inception_4a/relu_pool_proj - 0.638646 ms
[TRT] layer inception_4a/1x1 copy - 0.093385 ms
[TRT] layer inception_4b/1x1 + inception_4b/relu_1x1 || inception_4b/3x3_reduce + inception_4b/relu_3x3_reduce || inception_4b/5x5_reduce + inception_4b/relu_5x5_reduce - 3.953750 ms
[TRT] layer inception_4b/3x3 + inception_4b/relu_3x3 - 4.252917 ms
[TRT] layer inception_4b/5x5 + inception_4b/relu_5x5 - 0.875104 ms
[TRT] layer inception_4b/pool - 0.792448 ms
[TRT] layer inception_4b/pool_proj + inception_4b/relu_pool_proj - 0.751510 ms
[TRT] layer inception_4b/1x1 copy - 0.097969 ms
[TRT] layer inception_4c/1x1 + inception_4c/relu_1x1 || inception_4c/3x3_reduce + inception_4c/relu_3x3_reduce || inception_4c/5x5_reduce + inception_4c/relu_5x5_reduce - 2.025000 ms
[TRT] layer inception_4c/3x3 + inception_4c/relu_3x3 - 1.415729 ms
[TRT] layer inception_4c/5x5 + inception_4c/relu_5x5 - 0.267083 ms
[TRT] layer inception_4c/pool - 0.187605 ms
[TRT] layer inception_4c/pool_proj + inception_4c/relu_pool_proj - 0.331302 ms
[TRT] layer inception_4c/1x1 copy - 0.039270 ms
[TRT] layer inception_4d/1x1 + inception_4d/relu_1x1 || inception_4d/3x3_reduce + inception_4d/relu_3x3_reduce || inception_4d/5x5_reduce + inception_4d/relu_5x5_reduce - 0.823386 ms
[TRT] layer inception_4d/3x3 + inception_4d/relu_3x3 - 0.725208 ms
[TRT] layer inception_4d/5x5 + inception_4d/relu_5x5 - 0.217604 ms
[TRT] layer inception_4d/pool - 0.160052 ms
[TRT] layer inception_4d/pool_proj + inception_4d/relu_pool_proj - 0.150625 ms
[TRT] layer inception_4d/1x1 copy - 0.016928 ms
[TRT] layer inception_4e/1x1 + inception_4e/relu_1x1 || inception_4e/3x3_reduce + inception_4e/relu_3x3_reduce || inception_4e/5x5_reduce + inception_4e/relu_5x5_reduce - 1.174166 ms
[TRT] layer inception_4e/3x3 + inception_4e/relu_3x3 - 1.516354 ms
[TRT] layer inception_4e/5x5 + inception_4e/relu_5x5 - 0.346927 ms
[TRT] layer inception_4e/pool - 0.164740 ms
[TRT] layer inception_4e/pool_proj + inception_4e/relu_pool_proj - 0.236719 ms
[TRT] layer inception_4e/1x1 copy - 0.026406 ms
[TRT] layer pool4/3x3_s2 - 0.154375 ms
[TRT] layer inception_5a/1x1 + inception_5a/relu_1x1 || inception_5a/3x3_reduce + inception_5a/relu_3x3_reduce || inception_5a/5x5_reduce + inception_5a/relu_5x5_reduce - 0.528906 ms
[TRT] layer inception_5a/3x3 + inception_5a/relu_3x3 - 0.425313 ms
[TRT] layer inception_5a/5x5 + inception_5a/relu_5x5 - 0.203437 ms
[TRT] layer inception_5a/pool - 0.085677 ms
[TRT] layer inception_5a/pool_proj + inception_5a/relu_pool_proj - 0.205105 ms
[TRT] layer inception_5a/1x1 copy - 0.013020 ms
[TRT] layer inception_5b/1x1 + inception_5b/relu_1x1 || inception_5b/3x3_reduce + inception_5b/relu_3x3_reduce || inception_5b/5x5_reduce + inception_5b/relu_5x5_reduce - 0.864740 ms
[TRT] layer inception_5b/3x3 + inception_5b/relu_3x3 - 1.049219 ms
[TRT] layer inception_5b/5x5 + inception_5b/relu_5x5 - 0.268385 ms
[TRT] layer inception_5b/pool - 0.070677 ms
[TRT] layer inception_5b/pool_proj + inception_5b/relu_pool_proj - 0.204063 ms
[TRT] layer inception_5b/1x1 copy - 0.013646 ms
[TRT] layer pool5/7x7_s1 - 0.056666 ms
[TRT] layer loss3/classifier input reformatter 0 - 0.008594 ms
[TRT] layer loss3/classifier - 0.285677 ms
[TRT] layer prob input reformatter 0 - 0.012031 ms
[TRT] layer prob - 0.023334 ms
[TRT] layer network time - 120.529793 ms
class 0950 - 0.978909 (orange)
class 0951 - 0.020962 (lemon)
imagenet-console: 'orange_0.jpg' -> 97.89090% class #950 (orange)
loaded image fontmapA.png (256 x 512) 2097152 bytes
[cuda] cudaAllocMapped 2097152 bytes, CPU 0x1048a0000 GPU 0x1048a0000
[cuda] cudaAllocMapped 8192 bytes, CPU 0x100f62000 GPU 0x100f62000
imagenet-console: attempting to save output image to 'output_0.jpg'
imagenet-console: completed saving 'output_0.jpg'