I’m a long time user of openSUSE and I particularly like Tumbleweed with its constantly up-to-date kernel and packages. When I started looking into learning Deep Learning using Keras and TensorFlow, I found that most of the existing setup instructions were focused on Ubuntu and made it sound like the package versions were so specific that using another distribution was an impossibility.
As with most things openSUSE, it turns out setting it up was easier than I expected. This guide is designed to get you up and running with a GPU-accelerated, Keras-driven, TensorFlow-backed example project in under an hour.
Prerequisities
You’ll need to have openSUSE installed (https://software.opensuse.org/distributions/tumbleweed) and an Nvidia graphics card. It will also help if you have an Nvidia developer account and if you know which version of Python you’ll want to use.
Also, you want to have an up-to-date system to work with, so do a:
#as root
zypper dup
and reboot the system.
Default Python Version
For some reason unknown to me, openSUSE’s python command defaults to using v2.7, but pip, the Python package manager, defaults to v3.6. You can explicitly choose the version by appending the version number (e.g., python3.6) when you execute the command, but for my own sanity, I changed the symlink to python to point to the 3.6 version:
#as root
rm /usr/bin/python
ln -s /usr/bin/python3.6 /usr/bin/python
Demo Programs
It’s helpful to know where we’re trying to go. To that end, we’ll use two demo programs. One, that prints out the device information (so that we can verify that we’re using the gpu) and another that trains a model on the MNIST handwritten numbers data set.
The first program is device_info.py
from tensorflow.python.client import device_lib
device_lib.list_local_devices()
If you run it now, you’ll get a ModuleNotFoundError because TensorFlow is not installed.
The second program is mnist_cnn.py, which can be obtained from https://github.com/keras-team/keras/blob/master/examples/mnist_cnn.py. If you run it, you’ll get a ModuleNotFoundError because Keras is not installed.
Install Nvidia Drivers
You’ll need the proprietary Nvidia drivers installed in order to enable TensorFlow to use the GPU. To do so, open YaST and under Software, go to Software Repositories. Go to Add, then Community Repositories, and select nVidia Graphics Drivers. Then do a:
#asroot
zypper in nvidia-gfxG04-kmp-default
to install the driver. Reboot to start using the driver.
Install CUDA
Go to https://developer.nvidia.com/cuda-90-download-archive?target_os=Linux&target_arch=x86_64&target_distro=OpenSUSE&target_version=Leap422&target_type=rpmlocal and download both the Base Installer and any patches.
The CUDA version does need to match the TensorFlow version. As of right now, TensorFlow is at 1.10 and it requires CUDA 9.0, which is older than what Nvidia is offering by default, so make sure you grab this exact version.
If you accidentally install the wrong version, the quick way to remove it is to remove the cuda-license package, since every other package depends on it. There will also be one or more cuda-repo packages that also must be removed.
Now, install CUDA using:
#asroot
zypper in <cuda_download.rpm> <each_cuda_patch.rpm>
zypper in cuda
CUDA provides it’s own version of the proprietary Nvidia drivers, but it’s important that they don’t get used. To make sure that they didn’t, open YaST, and under Software, go to Software Management and make sure that the version of nvidia-gfxG04-kmp-default that is installed matches your kernel (currently, the kernel is at 4.18, but CUDA installs a 4.4 version of the drivers).
You also need to create/update LD_LIBRARY_PATH to include the CUDA libraries. Edit ~/.profile and insert at the end:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64
Install cuDNN
Go to https://developer.nvidia.com/rdp/cudnn-download. This page requires you to have an Nvidia developer account (it’s free, but does require registration). Download the version of cuDNN that matches the CUDA version you installed in the previous step. The link you want is the Library for Linux. Unpack and install cuDNN:
tar xvzf <cudnn_download.tgz>
#as root
cp cuda/include/cudnn.h /usr/local/cuda/include
cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
Install TensorFlow and Keras
I prefer to use zypper for package management, so I want to use pip for as little as possible. The only requirements that zypper can’t satisfy using the default openSUSE repositories are TensorFlow and Keras, so those we’ll install with pip and handle the rest ourselves with zypper.
#as root
pip install –no-deps tensorflow-gpu keras
If you don’t mind using pip, you could remove the –no-deps flag and that would complete this guide. Of course, the more things you have installed with pip, the more important it becomes to re-run pip regularly to ensure you have the latest versions.
If you are running on a system without an Nvidia gpu, you can install the cpu version of TensorFlow with “pip install –no-deps tensorflow keras”, but keep in mind that it is much slower than the gpu version (20x in my testing, but that will depend heavily on your cpu and gpu).
If you need to install a particular version of a library you can do so like this: “pip install –no-deps tensorflow-gpu==1.10.1 keras”. This is helpful for satisfying the requirement between TensorFlow and CUDA versions.
Install Dependencies
Now, we need to install the dependencies of TensorFlow and Keras using zypper. Using zypper ensures that these dependencies will stay up-to-date. I got the list of dependencies by using “pip check”. Install the dependencies with:
#as root
zypper in python3-grpcio python3-h5py python3-numpy python3-PyYAML python3-protobuf python3-scipy python3-termcolor python3-wheel
There are some additional requirements that can’t be satisfied using the default repositories, so those we’ll install using pip:
#as root
pip install tensorflow-gpu keras
This is the same command as before, only without the –no-deps flag.
One Final Reboot
Do a final reboot to make sure that all of the changes take effect.
Test it out
At this point, you should be good to go. Try running the device_info.py script with “python device_info.py”. In the output, you should be able to see the model of your gpu.
Now try running mnist_cnn.py. This will take some time to complete, but it should be less than ten minutes or so. While it’s running, run “nvidia-smi”. Under GPU-Util will be a percentage. This is how much load your gpu is under. Assuming everything is working as it should, this should be at least 80%. The exact number depends on your system configuration; if your gpu is bottle-necked by your cpu, you won’t get to 100%. But if the number is something like 5%, then the training isn’t happening on the gpu and you should check the configuration.
Good description and summary!
I went the way I installed keras and ran info-device, that way I have learned from dynamic loader error messages which cuda, cudaDNN libs are demanded (it happened to be cuda10.0 in current Tumbleweed), downloaded the right versions and installed them.
But your description put me on the track, otherwise I would coward to Ubuntu.