Two ways of TensorRT to optimize Neural Network Computation Graph

GIE is now TensorRT https://devblogs.nvidia.com/deploying-deep-learning-nvidia-tensorrt/

GIE performs several important transformations and optimizations to the neural network graph. First, layers with unused output are eliminated to avoid unnecessary computation. Next, where possible convolution, bias, and ReLU layers are fused to form a single layer. Figure 4 shows the result of this vertical layer fusion on the original network from Figure 3 (fused layers are labeled CBR in Figure 4). Layer fusion improves the efficiency of running GIE-optimized networks on the GPU.

Figure 3. An example convolutional neural network with multiple convolutional and activation layers.
Figure 4. An example of vertical layer fusion on a convolutional neural network. Here, convolutional layers are combined with subsequent bias and activation (ReLU) layers.

Another transformation is horizontal layer fusion, or layer aggregation, along with the required division of aggregated layers to their respective outputs, as Figure 5 shows. Horizontal layer fusion improves performance by combining layers that take the same source tensor and apply the same operations with similar parameters, resulting in a single larger layer for higher computational efficiency. The example in Figure 5 shows the combination of 3 1×1 CBR layers from Figure 4 that take the same input into a single larger 1×1 CBR layer. Note that the output of this layer must be disaggregated to feed into the different subsequent layers from the original input graph.

Figure 5. An example of horizontal layer fusion on a convolutional neural network. Here, multiple 1×1 CBR layers from Figure 4 are fused “horizontally”, or across similar layers in the graph that share the same input.

Last updated