Deepstream使用

DeepStream介绍

deepstream是nvidia官方推出的一个数据流处理工具包,可以很方便的实现对视频的解码、推理等工作,高效的完成图像分类、目标检测、识别和跟踪等任务。
deepstream是在gstreamer的基础上开发的,因此使用deepstream需要一定的c语言基础和gstreamer基础。deepstream可以看成是一个壳,里面集成了gstreamer和tensorrt等工具,我们可以使用deepstream完成视频的解码、传输、神经网络推理、结果的渲染等工作。

安装步骤

$ sudo apt install \
    libssl1.0.0 \
    libgstreamer1.0-0 \
    gstreamer1.0-tools \
    gstreamer1.0-plugins-good \
    gstreamer1.0-plugins-bad \
    gstreamer1.0-plugins-ugly \
    gstreamer1.0-libav \
    libgstrtspserver-1.0-0 \
    libjansson4=2.11-1
$ sudo apt-get install librdkafka1=0.11.3-1build1
$ tar -xpvf deepstream_sdk_v4.0.2_jetson.tbz2
$ cd deepstream_sdk_v4.0.2_jetson
$ sudo tar -xvpf binaries.tbz2 -C /
$ sudo ./install.sh
$ sudo ldconfig

插件配置

参考deepstream_sdk_v4.0.2_jetson/samples/configs/deepstream-app/下的配置文件:

  • source30_1080p_resnet_dec_infer_tiled_display_int8.txt:演示具有主要推理功能的30个流解码。(仅适用于dGPU和Jetson AGX Xavier平台。)
  • source4_1080p_resnet_dec_infer_tiled_display_int8.txt:演示具有主要推理,对象跟踪和三个不同辅助分类器的四个流解码。(仅适用于dGPU和Jetson AGX Xavier平台。)
  • source4_1080p_resnet_dec_infer_tracker_sgie_tiled_display_int8_gpu1.txt:在GPU 1上针对主要推理,对象跟踪和三个不同的二级分类器演示四个流解码(对于具有多个GPU卡的系统)。仅适用于dGPU平台。
  • config_infer_primary.txt:将 nvinfer元素配置为主要检测器。
  • config_infer_secondary_carcolor.txt, config_infer_secondary_carmake.txt, config_infer_secondary_vehicletypes.txt:将 nvinfer元素配置为辅助分类器。
  • iou_config.txt:配置一个低级的IOU(联合路口)跟踪器。
  • source1_usb_dec_infer_resnet_int8.txt:演示一台USB摄像机作为输入。
  • source1_csi_dec_infer_resnet_int8.txt:演示一个CSI摄像机作为输入;仅限于Jetson。
  • source2_csi_usb_dec_infer_resnet_int8.txt:演示一台CSI摄像机和一台USB摄像机作为输入;仅限于Jetson。
  • source6_csi_dec_infer_resnet_int8.txt:演示六个CSI摄像机作为输入;仅限于Jetson。
  • source8_1080p_dec_infer-resnet_tracker_tiled_display_fp16_nano.txt:演示8解码+推断+跟踪器;仅适用于Jetson Nano。
  • source8_1080p_dec_infer-resnet_tracker_tiled_display_fp16_tx1.txt:演示8解码+推断+跟踪器;仅适用于Jetson TX1。
  • source12_1080p_dec_infer-resnet_tracker_tiled_display_fp16_tx2.txt:演示12个解码+推断+跟踪器;仅适用于Jetson TX2。

视频输入

camera

  • USB摄像头
[source0]
enable=1
#Type - 1=CameraV4L2 2=URI 3=MultiURI
type=1
camera-width=1280
camera-height=720
camera-fps-n=30
camera-fps-d=1
camera-v4l2-dev-node=0
  • CSI摄像头
[source0]
enable=1
#Type - 1=CameraV4L2 2=URI 3=MultiURI 4=RTSP 5=CSI
type=5
camera-width=1280
camera-height=720
camera-fps-n=30
camera-fps-d=1
camera-csi-sensor-id=0

videofile

4个相同文件,用MultiURI

[source0]
enable=1
#Type - 1=CameraV4L2 2=URI 3=MultiURI 4=RTSP
type=3
uri=file://../../streams/sample_1080p_h264.mp4
num-sources=4
#drop-frame-interval=2
gpu-id=0
# (0): memtype_device   - Memory type Device
# (1): memtype_pinned   - Memory type Host Pinned
# (2): memtype_unified  - Memory type Unified
cudadec-memtype=0

media stream

[source0]
enable=1
#Type - 1=CameraV4L2 2=URI 3=MultiURI 4=RTSP
type=4
uri=rtsp://admin:admin123@192.168.1.106:554/cam/realmonitor?channel=1&subtype=0
num-sources=1
#drop-frame-interval=2
gpu-id=0
# (0): memtype_device   - Memory type Device
# (1): memtype_pinned   - Memory type Host Pinned
# (2): memtype_unified  - Memory type Unified
cudadec-memtype=0

多路USB

[source0]
enable=1
#Type - 1=CameraV4L2 2=URI 3=MultiURI 4=RTSP
type=3
uri=file://../../streams/sample_1080p_h264.mp4
num-sources=4
#drop-frame-interval=2
gpu-id=0
# (0): memtype_device   - Memory type Device
# (1): memtype_pinned   - Memory type Host Pinned
# (2): memtype_unified  - Memory type Unified
cudadec-memtype=0

多路CSI

[source0]
enable=1
#Type - 1=CameraV4L2 2=URI 3=MultiURI 4=RTSP 5=CSI
type=5
camera-csi-sensor-id=0
camera-width=1280
camera-height=720
camera-fps-n=30
camera-fps-d=1

[source1]
enable=1
#Type - 1=CameraV4L2 2=URI 3=MultiURI 4=RTSP 5=CSI
type=5
camera-csi-sensor-id=1
camera-width=1280
camera-height=720
camera-fps-n=30
camera-fps-d=1

[source2]
enable=1
#Type - 1=CameraV4L2 2=URI 3=MultiURI 4=RTSP 5=CSI
type=5
camera-csi-sensor-id=2
camera-width=1280
camera-height=720
camera-fps-n=30
camera-fps-d=1

[source3]
enable=1
#Type - 1=CameraV4L2 2=URI 3=MultiURI 4=RTSP 5=CSI
type=5
camera-csi-sensor-id=3
camera-width=1280
camera-height=720
camera-fps-n=30
camera-fps-d=1

视频处理

物体检测

# config-file property is mandatory for any gie section.
# Other properties are optional and if set will override the properties set in
# the infer config file.
[primary-gie]
enable=1
model-engine-file=../../models/Primary_Detector/resnet10.caffemodel_b30_int8.engine
#Required to display the PGIE labels, should be added even when using config-file
#property
batch-size=4
#Required by the app for OSD, not a plugin property
bbox-border-color0=1;0;0;1
bbox-border-color1=0;1;1;1
bbox-border-color2=0;0;1;1
bbox-border-color3=0;1;0;1
interval=0
#Required by the app for SGIE, when used along with config-file property
gie-unique-id=1
config-file=config_infer_primary.txt

物体跟踪

[tracker]
enable=1
tracker-width=640
tracker-height=368
#tracker-width=480
#tracker-height=272

#ll-lib-file=/opt/nvidia/deepstream/deepstream-4.0/lib/libnvds_mot_iou.so
#ll-lib-file=/opt/nvidia/deepstream/deepstream-4.0/lib/libnvds_nvdcf.so
ll-lib-file=/opt/nvidia/deepstream/deepstream-4.0/lib/libnvds_mot_klt.so
#ll-config-file required for DCF/IOU only
#ll-config-file=tracker_config.yml
#ll-config-file=iou_config.txt
gpu-id=0
#enable-batch-process applicable to DCF only
enable-batch-process=1

检测后的具体分类

[secondary-gie0]
enable=1
model-engine-file=../../models/Secondary_VehicleTypes/resnet18.caffemodel_b16_int8.engine
gpu-id=0
batch-size=16
gie-unique-id=4
operate-on-gie-id=1
operate-on-class-ids=0;
config-file=config_infer_secondary_vehicletypes.txt

[secondary-gie1]
enable=1
model-engine-file=../../models/Secondary_CarColor/resnet18.caffemodel_b16_int8.engine
batch-size=16
gpu-id=0
gie-unique-id=5
operate-on-gie-id=1
operate-on-class-ids=0;
config-file=config_infer_secondary_carcolor.txt

[secondary-gie2]
enable=1
model-engine-file=../../models/Secondary_CarMake/resnet18.caffemodel_b16_int8.engine
batch-size=16
gpu-id=0
gie-unique-id=6
operate-on-gie-id=1
operate-on-class-ids=0;
config-file=config_infer_secondary_carmake.txt

视频输出

多路合并

单路

[tiled-display]
enable=1
rows=1
columns=1
width=1280
height=720

多路

[tiled-display]
enable=1
rows=4
columns=2
width=1280
height=720
gpu-id=0
#(0): nvbuf-mem-default - Default memory allocated, specific to particular platform
#(1): nvbuf-mem-cuda-pinned - Allocate Pinned/Host cuda memory, applicable for Tesla
#(2): nvbuf-mem-cuda-device - Allocate Device cuda memory, applicable for Tesla
#(3): nvbuf-mem-cuda-unified - Allocate Unified cuda memory, applicable for Tesla
#(4): nvbuf-mem-surface-array - Allocate Surface Array memory, applicable for Jetson
nvbuf-memory-type=0

screen

[sink0]
enable=1
#Type - 1=FakeSink 2=EglSink 3=File 4=RTSPStreaming 5=Overlay
type=5
sync=0
display-id=0
offset-x=0
offset-y=0
width=0
height=0
overlay-id=1
source-id=0

videofile

[sink1]
enable=1
type=3
#1=mp4 2=mkv
container=1
#1=h264 2=h265 3=mpeg4
codec=1
sync=0
bitrate=2000000
output-file=out.mp4
source-id=0

media stream

[sink2]
enable=1
#Type - 1=FakeSink 2=EglSink 3=File 4=RTSPStreaming 5=Overlay
type=4
#1=h264 2=h265
codec=1
sync=0
bitrate=4000000
# set below properties in case of RTSPStreaming
rtsp-port=8554
udp-port=5400

vlc打开网络流 rtsp://192.168.0.118:8554/ds-test

osd

[osd]
enable=1
border-width=2
text-size=15
text-color=1;1;1;1;
text-bg-color=0.3;0.3;0.3;1
font=Serif
show-clock=0
clock-x-offset=800
clock-y-offset=820
clock-text-size=12
clock-color=1;0;0;0

streammux

[streammux]
##Boolean property to inform muxer that sources are live
live-source=1
## 根据路数进行设置
batch-size=4 
##time out in usec, to wait after the first buffer is available
##to push the batch even if the complete batch is not formed
batched-push-timeout=40000
## Set muxer output width and height
width=1280
height=720

Sample例程

  • DeepStream Sample App /sources/apps/sample_apps/deepstream-app

说明:端到端示例演示了4级联神经网络(1个一级检测器和3个二级分类器)的多相机流,并显示平铺输出。

  • DeepStream Test 1 /sources/apps/sample_apps/deepstream-t
  • DeepStream Test 2 /sources/apps/sample_apps/deepstream-test2

说明:简单的应用程序,建立在test1之上,显示额外的属性,如跟踪和二级分类属性。

  • DeepStream Test 3 /sources/apps/sample_apps/deepstream-test3

说明:简单的应用程序,建立在test1的基础上,显示多个输入源和批处理使用nvstreammuxer。

  • DeepStream Test 4 /sources/apps/sample_apps/deepstream-test4

说明:这是在Test1示例的基础上构建的,演示了“nvmsgconv”和“nvmsgbroker”插件在物联网连接管道中的使用。对于test4,用户必须修改kafka代理连接字符串才能成功连接。需要安装分析服务器docker之前运行test4。DeepStream分析文档有关于设置分析服务器的更多信息。

  • FasterRCNN Object Detector /sources/objectDetector_FasterRCNN

说明:FasterRCNN物体探测器实例。

  • SSD Object Detector /sources/objectDetector_SSD

说明:SSD目标探测器实例。

运行Jetson-inference

Jetson-inference介绍

Jetson-inference是NVIDIA Jetson Nano/TX1/TX2/Xavier NX/AGX-Xavier推理和实时DNN视觉库,使用NVIDIA TensorRT将神经网络有效地部署到嵌入式Jetson平台上,通过图形优化、内核融合和FP16/INT8精度提高性能和功耗优化。视觉推理,如用于图像识别的imageNet、用于对象检测的detectNet和用于语义分割的segNet,都继承自共享的tensorNet对象。提供了从实时摄像机馈送流和处理图像的示例。

github地址:https://github.com/dusty-nv/jetson-inference

下载和编译

首先安装git和cmake工具:

$ sudo apt-get install git cmake

接着从github克隆jetson-inference:

$ git clone https://github.com/dusty-nv/jetson-inference
$ cd jetson-inference
$ git submodule update --init

cmake配置,这个过程会自动下载许多模型(建议科学上网下载模型):

$ mkdir build    #创建build文件夹
$ cd build       #进入build
$ cmake ../      #运行cmake

编译:

$ cd jetson-inference/build			
$ make
$ sudo make install

编译成功后的文件夹结构如下:

|-build   
    \aarch64
        \bin
        \include
        \lib
    \armhf
        \bin
        \include
        \lib	

运行测试

进入文件夹并运行imagenet:

$ cd jetson-inference/build/aarch64/bin
$ ./imagenet-console orange_0.jpg output_0.jpg

结果如下:

imagenet-console
  args (3):  0 [./imagenet-console]  1 [orange_0.jpg]  2 [output_0.jpg]  
 
 
imageNet -- loading classification network model from:
         -- prototxt     networks/googlenet.prototxt
         -- model        networks/bvlc_googlenet.caffemodel
         -- class_labels networks/ilsvrc12_synset_words.txt
         -- input_blob   'data'
         -- output_blob  'prob'
         -- batch_size   2
 
[TRT]  TensorRT version 5.0.6
[TRT]  detected model format - caffe  (extension '.caffemodel')
[TRT]  desired precision specified for GPU: FASTEST
[TRT]  requested fasted precision for device GPU without providing valid calibrator, disabling INT8
[TRT]  native precisions detected for GPU:  FP32, FP16
[TRT]  selecting fastest native precision for GPU:  FP16
[TRT]  attempting to open engine cache file networks/bvlc_googlenet.caffemodel.2.1.GPU.FP16.engine
[TRT]  cache file not found, profiling network model on device GPU
[TRT]  device GPU, loading networks/googlenet.prototxt networks/bvlc_googlenet.caffemodel
[TRT]  retrieved Output tensor "prob":  1000x1x1
[TRT]  retrieved Input tensor "data":  3x224x224
[TRT]  device GPU, configuring CUDA engine
[TRT]  device GPU, building FP16:  ON
[TRT]  device GPU, building INT8:  OFF
[TRT]  device GPU, building CUDA engine (this may take a few minutes the first time a network is loaded)
[TRT]  device GPU, completed building CUDA engine
[TRT]  network profiling complete, writing engine cache to networks/bvlc_googlenet.caffemodel.2.1.GPU.FP16.engine
[TRT]  device GPU, completed writing engine cache to networks/bvlc_googlenet.caffemodel.2.1.GPU.FP16.engine
[TRT]  device GPU, networks/bvlc_googlenet.caffemodel loaded
[TRT]  device GPU, CUDA engine context initialized with 2 bindings
[TRT]  binding -- index   0
               -- name    'data'
               -- type    FP32
               -- in/out  INPUT
               -- # dims  3
               -- dim #0  3 (CHANNEL)
               -- dim #1  224 (SPATIAL)
               -- dim #2  224 (SPATIAL)
[TRT]  binding -- index   1
               -- name    'prob'
               -- type    FP32
               -- in/out  OUTPUT
               -- # dims  3
               -- dim #0  1000 (CHANNEL)
               -- dim #1  1 (SPATIAL)
               -- dim #2  1 (SPATIAL)
[TRT]  binding to input 0 data  binding index:  0
[TRT]  binding to input 0 data  dims (b=2 c=3 h=224 w=224) size=1204224
[cuda]  cudaAllocMapped 1204224 bytes, CPU 0x100e30000 GPU 0x100e30000
[TRT]  binding to output 0 prob  binding index:  1
[TRT]  binding to output 0 prob  dims (b=2 c=1000 h=1 w=1) size=8000
[cuda]  cudaAllocMapped 8000 bytes, CPU 0x100f60000 GPU 0x100f60000
device GPU, networks/bvlc_googlenet.caffemodel initialized.
[TRT]  networks/bvlc_googlenet.caffemodel loaded
imageNet -- loaded 1000 class info entries
networks/bvlc_googlenet.caffemodel initialized.
loaded image  orange_0.jpg  (1920 x 1920)  58982400 bytes
[cuda]  cudaAllocMapped 58982400 bytes, CPU 0x101060000 GPU 0x101060000
[TRT]  layer conv1/7x7_s2 + conv1/relu_7x7 input reformatter 0 - 1.971458 ms
[TRT]  layer conv1/7x7_s2 + conv1/relu_7x7 - 12.103073 ms
[TRT]  layer pool1/3x3_s2 - 2.291198 ms
[TRT]  layer pool1/norm1 - 0.626354 ms
[TRT]  layer conv2/3x3_reduce + conv2/relu_3x3_reduce - 1.402032 ms
[TRT]  layer conv2/3x3 + conv2/relu_3x3 - 22.193958 ms
[TRT]  layer conv2/norm2 - 1.714010 ms
[TRT]  layer pool2/3x3_s2 - 2.223438 ms
[TRT]  layer inception_3a/1x1 + inception_3a/relu_1x1 || inception_3a/3x3_reduce + inception_3a/relu_3x3_reduce || inception_3a/5x5_reduce + inception_3a/relu_5x5_reduce - 3.953125 ms
[TRT]  layer inception_3a/3x3 + inception_3a/relu_3x3 - 6.141875 ms
[TRT]  layer inception_3a/5x5 + inception_3a/relu_5x5 - 1.052396 ms
[TRT]  layer inception_3a/pool - 0.945416 ms
[TRT]  layer inception_3a/pool_proj + inception_3a/relu_pool_proj - 0.752500 ms
[TRT]  layer inception_3a/1x1 copy - 0.136875 ms
[TRT]  layer inception_3b/1x1 + inception_3b/relu_1x1 || inception_3b/3x3_reduce + inception_3b/relu_3x3_reduce || inception_3b/5x5_reduce + inception_3b/relu_5x5_reduce - 4.915782 ms
[TRT]  layer inception_3b/3x3 + inception_3b/relu_3x3 - 13.518437 ms
[TRT]  layer inception_3b/5x5 + inception_3b/relu_5x5 - 4.846615 ms
[TRT]  layer inception_3b/pool - 1.560260 ms
[TRT]  layer inception_3b/pool_proj + inception_3b/relu_pool_proj - 1.945625 ms
[TRT]  layer inception_3b/1x1 copy - 0.240729 ms
[TRT]  layer pool3/3x3_s2 - 1.658229 ms
[TRT]  layer inception_4a/1x1 + inception_4a/relu_1x1 || inception_4a/3x3_reduce + inception_4a/relu_3x3_reduce || inception_4a/5x5_reduce + inception_4a/relu_5x5_reduce - 3.698021 ms
[TRT]  layer inception_4a/3x3 + inception_4a/relu_3x3 - 3.677865 ms
[TRT]  layer inception_4a/5x5 + inception_4a/relu_5x5 - 0.617812 ms
[TRT]  layer inception_4a/pool - 0.589011 ms
[TRT]  layer inception_4a/pool_proj + inception_4a/relu_pool_proj - 0.638646 ms
[TRT]  layer inception_4a/1x1 copy - 0.093385 ms
[TRT]  layer inception_4b/1x1 + inception_4b/relu_1x1 || inception_4b/3x3_reduce + inception_4b/relu_3x3_reduce || inception_4b/5x5_reduce + inception_4b/relu_5x5_reduce - 3.953750 ms
[TRT]  layer inception_4b/3x3 + inception_4b/relu_3x3 - 4.252917 ms
[TRT]  layer inception_4b/5x5 + inception_4b/relu_5x5 - 0.875104 ms
[TRT]  layer inception_4b/pool - 0.792448 ms
[TRT]  layer inception_4b/pool_proj + inception_4b/relu_pool_proj - 0.751510 ms
[TRT]  layer inception_4b/1x1 copy - 0.097969 ms
[TRT]  layer inception_4c/1x1 + inception_4c/relu_1x1 || inception_4c/3x3_reduce + inception_4c/relu_3x3_reduce || inception_4c/5x5_reduce + inception_4c/relu_5x5_reduce - 2.025000 ms
[TRT]  layer inception_4c/3x3 + inception_4c/relu_3x3 - 1.415729 ms
[TRT]  layer inception_4c/5x5 + inception_4c/relu_5x5 - 0.267083 ms
[TRT]  layer inception_4c/pool - 0.187605 ms
[TRT]  layer inception_4c/pool_proj + inception_4c/relu_pool_proj - 0.331302 ms
[TRT]  layer inception_4c/1x1 copy - 0.039270 ms
[TRT]  layer inception_4d/1x1 + inception_4d/relu_1x1 || inception_4d/3x3_reduce + inception_4d/relu_3x3_reduce || inception_4d/5x5_reduce + inception_4d/relu_5x5_reduce - 0.823386 ms
[TRT]  layer inception_4d/3x3 + inception_4d/relu_3x3 - 0.725208 ms
[TRT]  layer inception_4d/5x5 + inception_4d/relu_5x5 - 0.217604 ms
[TRT]  layer inception_4d/pool - 0.160052 ms
[TRT]  layer inception_4d/pool_proj + inception_4d/relu_pool_proj - 0.150625 ms
[TRT]  layer inception_4d/1x1 copy - 0.016928 ms
[TRT]  layer inception_4e/1x1 + inception_4e/relu_1x1 || inception_4e/3x3_reduce + inception_4e/relu_3x3_reduce || inception_4e/5x5_reduce + inception_4e/relu_5x5_reduce - 1.174166 ms
[TRT]  layer inception_4e/3x3 + inception_4e/relu_3x3 - 1.516354 ms
[TRT]  layer inception_4e/5x5 + inception_4e/relu_5x5 - 0.346927 ms
[TRT]  layer inception_4e/pool - 0.164740 ms
[TRT]  layer inception_4e/pool_proj + inception_4e/relu_pool_proj - 0.236719 ms
[TRT]  layer inception_4e/1x1 copy - 0.026406 ms
[TRT]  layer pool4/3x3_s2 - 0.154375 ms
[TRT]  layer inception_5a/1x1 + inception_5a/relu_1x1 || inception_5a/3x3_reduce + inception_5a/relu_3x3_reduce || inception_5a/5x5_reduce + inception_5a/relu_5x5_reduce - 0.528906 ms
[TRT]  layer inception_5a/3x3 + inception_5a/relu_3x3 - 0.425313 ms
[TRT]  layer inception_5a/5x5 + inception_5a/relu_5x5 - 0.203437 ms
[TRT]  layer inception_5a/pool - 0.085677 ms
[TRT]  layer inception_5a/pool_proj + inception_5a/relu_pool_proj - 0.205105 ms
[TRT]  layer inception_5a/1x1 copy - 0.013020 ms
[TRT]  layer inception_5b/1x1 + inception_5b/relu_1x1 || inception_5b/3x3_reduce + inception_5b/relu_3x3_reduce || inception_5b/5x5_reduce + inception_5b/relu_5x5_reduce - 0.864740 ms
[TRT]  layer inception_5b/3x3 + inception_5b/relu_3x3 - 1.049219 ms
[TRT]  layer inception_5b/5x5 + inception_5b/relu_5x5 - 0.268385 ms
[TRT]  layer inception_5b/pool - 0.070677 ms
[TRT]  layer inception_5b/pool_proj + inception_5b/relu_pool_proj - 0.204063 ms
[TRT]  layer inception_5b/1x1 copy - 0.013646 ms
[TRT]  layer pool5/7x7_s1 - 0.056666 ms
[TRT]  layer loss3/classifier input reformatter 0 - 0.008594 ms
[TRT]  layer loss3/classifier - 0.285677 ms
[TRT]  layer prob input reformatter 0 - 0.012031 ms
[TRT]  layer prob - 0.023334 ms
[TRT]  layer network time - 120.529793 ms
class 0950 - 0.978909  (orange)
class 0951 - 0.020962  (lemon)
imagenet-console:  'orange_0.jpg' -> 97.89090% class #950 (orange)
loaded image  fontmapA.png  (256 x 512)  2097152 bytes
[cuda]  cudaAllocMapped 2097152 bytes, CPU 0x1048a0000 GPU 0x1048a0000
[cuda]  cudaAllocMapped 8192 bytes, CPU 0x100f62000 GPU 0x100f62000
imagenet-console:  attempting to save output image to 'output_0.jpg'
imagenet-console:  completed saving 'output_0.jpg'