Donate. I desperately need donations to survive due to my health

Get paid by answering surveys Click here

Click here to donate

Remote/Work from Home jobs

Is there a relation between the inference time and the GPU usage for a deep learning model?

So I am trying to establish a decent amount of GPU memory that a model needs to run efficiently. I started out with one of the models from "Tensorflow detection model zoo", particularly this one http://download.tensorflow.org/models/object_detection/faster_rcnn_inception_v2_coco_2018_01_28.tar.gz

Just to test a few things, the model is run for one image only once with resolution 960x620 on a machine with a dedicated GPU(Nvidia Geforce GTX 1080 Ti)[12GB]

Using Tensorflow 1.8 with Cuda 9.0 and cuDNN 7.0

By using tf.RunMetadata() I logged all the execution details and visualized it on tensorboard and the screenshots are below

  1. The first two images are for the above model with no modifications, first image shows the 'Device' placements and the second shows 'Compute time'
  2. For the next two images, I specifically mentioned the device placements for certain nodes in the graph.
  3. For the next two images, I mentioned per_process_gpu_memory_fraction = 0.35 in the session config

The last image is the entire graph for the model used.

I had a few doubts regarding the details logged such as the memory consumption and compute memory and also the device placements and I managed to clear quite a lot of them. Now I have some confusion as to why the compute time(see Image 6 and Image 2) has gone down when the gpu memory fraction was reduced. By my understanding, the time should have gone up since the resource given is lesser. But it does not seem that way. Why so?

These are the nodes mentioned to be put on CPU (instead of GPU by default)

'BatchMultiClassNonMaxSuppression', 'FirstStageFeatureExtractor/Assert/Assert', 'map/TensorArray_1', 'Preprocessor/map/TensorArray_2', 'GridAnchorGenerator/assert_equal/Assert/Assert'

Image 1(device placement)

Image 2(compute time)

Image 3(certain nodes' device placement specifically mentioned)

Image 4(compute time for scenario in the previous image)

Image 5(device placement with gpu memory fraction reduced)

Image 6(compute time for scenario in previous image)

Image 7(entire model graph)

Comments