NVIDIA RTX 2080 Ti vs 2080 vs 1080 Ti vs Titan V, TensorFlow Performance with CUDA 10.0



Are the NVIDIA RTX 2080 and 2080Ti good for machine learning?

Yes, they are great! The RTX 2080 Ti rivals the Titan V for performance with TensorFlow. The RTX 2080 seems to perform as well as the GTX 1080 Ti (although the RTX 2080 only has 8GB of memory).

Probably the most impressive new feature of the new NVIDIA RTX cards is their astounding Ray-Tracing performance. However, these are excellent cards for GPU accelerated computing. They are very well suited for Machine Learning workloads and having “Tensorcores” is nice.

I have just finished some quick testing using TensorFlow 1.10 built against CUDA 10.0 running on Ubuntu 18.04 with the NVIDIA 410.48 driver. These are preliminary results after spending only a few hours with the new RTX 2080 Ti and RTX 2080. I’ll be doing more testing in the coming weeks.

I’m not going to go over details of the new RTX cards, there are already plenty of posts on-line that cover that. I will mention that the Turing architecture does include Tensorcores (FP16), similar to what you would find in the Volta architecture. From a GPU computing perspective the RTX Turing cards offer an affordable alternative to Volta based Titan V, Quadro GV100 or server oriented Tesla V100. The main drawback with the Turing based RTX cards is the lack of the outstanding double precision (FP64) performance on Volta. However, for most machine learning workloads that is not an issue. In fact the inclusion if FP16 Tensorcores is a big plus in the ML/AI domain.


Test system

Hardware

  • Puget Systems Peak Single
  • Intel Xeon-W 2175 14-core
  • 128GB Memory
  • 1TB Samsung NVMe M.2
  • GPU’s
  • GTX 1080Ti
  • RTX 2080
  • RTX 2080Ti
  • Titan V

Software

The Workstation is my personal system along with the extra GPU’s that are being tested.

The TensorFlow build that I used for this testing is the latest build on NGC. It is TensorFlow 1.10 linked with CUDA 10.0. The convolution neural code used for the ResNet-50 model is from “nvidia-examples” in the container instance, as is the “billion word LSTM” network code (“big_lstm”).

For details on how I have Docker/NVIDIA-Docker configured on my workstation have a look at the following post along with the links it contains to the rest of that series of posts.

How-To Setup NVIDIA Docker and NGC Registry on your Workstation – Part 5 Docker Performance and Resource Tuning

Note: For my own development work I mostly use Anaconda Python installed on my local workstation along with framework packages from Anaconda Cloud. However, I am a big fan of docker, nvidia-docker and NGC! I use NGC on my workstation not “the cloud”.

Do I need to have CUDA 10 to use TensorFlow on NVIDIA RTX 20xx series GPU’s?

No. The RTX 20xx GPU’s are “CUDA compute 7.5” devices but they will run code built for lower compute levels. I did some testing using TensorFlow 1.4 linked with CUDA 9.0 and it worked with the 2080 and 2080Ti GPU’s. What IS required, is to have NVIDIA display driver version 410 or later installed on your system. You need the new 410 or later driver even if you are using docker/nvidia-docker. The CUDA “run-time” libraries are included with the driver. Driver version 410 or later is required for RTX 20xx cards and also, for CUDA 10 linked programs.

As of this writing the easiest way to install the new NVIDIA driver on Ubuntu 18.04 is to do a CUDA 10 install which includes the driver. See my recent post,

How To Install CUDA 10 (together with 9.2) on Ubuntu 18.04 with support for NVIDIA 20XX Turing GPUs.


TensorFlow benchmark results – GTX 1080Ti vs RTX 2080 vs RTX 2080Ti vs Titan V

The benchmark for GPU ML/AI performance that I’ve been using the most recently is a CNN (convolution neural network) Python code contained in the NGC TensorFlow docker image. NVIDIA has been maintaining that with frequent updates and it’s easy to use with synthetic image data for quick benchmarking.

For reference, an example of command-lines used is,

kinghorn@i9:~$ docker run --runtime=nvidia --rm -it -v $HOME/projects:/projects nvcr.io/nvidia/tensorflow:18.09-py3
root@90752be3917b:/workspace# cd nvidia-examples/cnn
root@90752be3917b:/workspace/nvidia-examples/cnn# export CUDA_VISIBLE_DEVICES=0
root@90752be3917b:/workspace/nvidia-examples/cnn# python resnet.py --layers 50 -b64 --precision fp16

That is starting the NGC TensorFlow docker imaged tagged 18.09-py3, which contains TensorFlow 1.10 linked with CUDA 10.0. The job run is the ResNet-50 CNN model with a batch size of 64 at FP16 (single) precision. The environment variable CUDA_VISIBLE_DEVICES is used to select the GPU (or GPU’s) being used. (device 0 in my case, is a Titan V). Note, that –precision fp16 means “use tensorcores”.

ResNet-50 – GTX 1080Ti vs RTX 2080 vs RTX 2080Ti vs Titan V – TensorFlow – Training performance (Images/second)

GPU FP32
Images/sec
FP16 (Tensorcores)
Images/sec
GTX 1080 Ti 207 N/A
RTX 2080 207 332
RTX 2080 Ti 280 437
Titan V 299 547

ResNet-50 with RTX GPU's


I also ran the LSTM example on the “Billion Words data set “. The results are a little inconsistent but actually I like that! It’s a reminder that benchmark like results are subject to change and don’t always “go your way”.

For reference, an example of command-lines used is, (continuing in the container image used above)

root@90752be3917b:/workspace/nvidia-examples# cd big_lstm
root@90752be3917b:/workspace/nvidia-examples/big_lstm# ./download_1b_words_data.sh
root@90752be3917b:/workspace/nvidia-examples/cnn# export CUDA_VISIBLE_DEVICES=0
root@90752be3917b:/workspace/nvidia-examples/big_lstm# python single_lm_train.py --mode=train --logdir=./logs --num_gpus=1 --datadir=./data/1-billion-word-language-modeling-benchmark-r13output/ --hpconfig run_profiler=False,max_time=90,num_steps=20,num_shards=8,num_layers=2,learning_rate=0.2,max_grad_norm=1,keep_prob=0.9,emb_size=1024,projected_size=1024,state_size=8192,num_sampled=8192,batch_size=512

“Big LSTM” – GTX 1080Ti vs RTX 2080 vs RTX 2080Ti vs Titan V – TensorFlow – Training performance (Images/second)

GPU FP32
Images/sec
GTX 1080 Ti 6460
RTX 2080 (Note:1) 5071
RTX 2080 Ti 8945
Titan V (Note:2) 7066
Titan V (Note:3) 8373

LSTM with RTX GPU's

  • Note:1 With only 8GB memory on the RTX 2080 I had to drop the batch size down to 256 to keep from getting “out of memory” errors. That typically has a big (downward) influence on performance.
  • Note:2 For whatever reason this result for the Titan V is worse than expected. This is TensorFlow 1.10 linked with CUDA 10 running NVIDIA’s code for the LSTM model. The RTX 2080Ti performance was very good!
  • Note:3 I re-ran the “big-LSTM” job on the Titan V using TensorFlow 1.4 linked with CUDA 9.0 and got results consistent with what I have seen in the past. I have no explanation for the slowdown with the newer version of “big-LSTM”.

Should you get an RTX 2080 or RTX 2080Ti for machine learning work?

OK, I think that is an obvious yes! For the kind of GPU compute workload that you will find with ML/AI work the new NVIDIA Turing RTX cards will give excellent performance.

The RTX 2080 is a good value with performance similar to the beloved GTX 1080Ti and it has the added benefit of Tensorcores, not to mention the Ray-Tracing capabilities! It’s main downside is the limits imposed by the 8GB of memory.

The RTX 2080Ti is priced about the same as the older Titan Xp but offers performance for ML/AI workloads that rivals the Titan V, (which is over twice the cost). I think the RTX 2080Ti is the obvious and worthy successor to the GTX 1080Ti as the practical workstation GPU for ML/AI development work.

What’s my favorite GPU?

I do like the RTX 2080Ti but I just love the Titan V! The Titan V is a great card and even though it seems expensive from a “consumer” point of view. I consider it an incredible bargain. I am doing experimental work where I really need to have double precision i.e. FP64. The Titan V offers the same stellar FP64 performance as the server oriented Tesla V100. For a development workstation for someone doing a lot of experimenting or more general scientific computing it is an easy recommendation. That said if you really don’t need FP64 then the RTX 2080Ti is going to the best performance for the cost.

Happy computing! –dbk