TensorFlow and the GTX 970

I recently encountered a problem where python scripts using TensorFlow, that I knew to be working, that ran on lesser GPUs, were failing on my GTX 970 with out of memory errors.

The problem is that the GTX 970 uses a split memory architecture with 3.5 GB of standard RAM and a .5 GB pool of slower RAM. When TensorFlow hits the .5 GB pool, it crashes.

The fix is to restrict TensorFlow so that it never taps the .5 GB pool. To do so, put this at the top of your script:

config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.75
session = tf.Session(config=config, ...)

In the above snippet I’m restricting TensorFlow to 75% of the memory, which is 3 GB, because you also have to take into account the amount that’s used by the OS.