# Run Pytorch and Tensorflow Containers with Nvidia Enroot Enroot is a simple, yet powerful tool to turn container images into unprivileged sandboxes. Enroot is targeted for HPC environments with integration with the Slurm scheduler, but can also be used as a standalone tool to run containers as an unprivileged user. Enroot is similar to Singularity, but with the added benefit of allowing users to read/write in the container and also to appear as a root user within the container environment. Please read Enroot's [github page](https://github.com/NVIDIA/enroot) for more information. ## Install enroot Enroot has been installed on Berzelius. You can skip this step if you plan to use it on Berzeliu. - For Debian-based distributions ``` arch=$(dpkg --print-architecture) curl -fSsL -O https://github.com/NVIDIA/enroot/releases/download/v3.4.0/enroot_3.4.0-1_${arch}.deb curl -fSsL -O https://github.com/NVIDIA/enroot/releases/download/v3.4.0/enroot+caps_3.4.0-1_${arch}.deb # optional sudo apt install -y ./*.deb ``` - For others, see [here](https://github.com/NVIDIA/enroot/blob/master/doc/installation.md) ## GPU support using libnvidia-container Enroot comes with GPU support using libnvidia-container. Enroot has been installed on Berzelius. You can skip this step if you plan to use it on Berzeliu. To install libnvidia-container on your local machine, see the [instructions](https://nvidia.github.io/libnvidia-container/). - For Debian-based distributions ``` distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \ && curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \ && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \ sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \ sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list sudo apt-get update ``` ## Set up Nvidia credentials This step is necessary for importing container images from [Nvidia NGC](https://catalog.ngc.nvidia.com/containers). - Complete step [4.1](https://docs.nvidia.com/ngc/ngc-overview/index.html#account-signup) and [4.3](https://docs.nvidia.com/ngc/ngc-overview/index.html#generating-api-key). Save the API key. - Add the API key to the config file at ```~/.config/enroot/.credentials ``` ``` machine nvcr.io login $oauthtoken password your_api_key machine authn.nvidia.com login $oauthtoken password your_api_key ``` - Set the config path by adding the line to ```~/.bashrc``` ``` export ENROOT_CONFIG_PATH=/home/xuagu37/.config/enroot ``` - To make the path valid ``` source ~/.bashrc ``` ## Set path to user container storage By default, your enroot containers will be saved in your ```home``` directory. On Berzelius, you have 20 GB hard drive space for ```home```. It is a better practice to put enroot containers in your project directory. Add this line to your ```bashrc``` ``` export ENROOT_CACHE_PATH=/proj/nsc_testing/xuan/enroot/cache export ENROOT_DATA_PATH=/proj/nsc_testing/xuan/enroot/data ``` ## Import container images You can import a container image either from Nvidia NGC or Pytorch/Tensorflow official Docker Hub repositories. - From Nvidia NGC ``` enroot import 'docker://nvcr.io#nvidia/pytorch:22.09-py3' enroot import 'docker://nvcr.io#nvidia/tensorflow:22.11-tf2-py3' ``` For other versions, please see the release notes for [Pytorch](https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/index.html) and [Tensorflow](https://docs.nvidia.com/deeplearning/frameworks/tensorflow-release-notes/index.html). - From Pytorch/Tensorflow official Docker Hub repositories ``` enroot import 'docker://pytorch/pytorch:1.12.1-cuda11.3-cudnn8-devel' enroot import 'docker://tensorflow/tensorflow:2.11.0-gpu' ``` For other versions, please see the Docker tags for [Pytorch](https://hub.docker.com/r/pytorch/pytorch/tags) and [Tensorflow](https://hub.docker.com/r/tensorflow/tensorflow/tags). ## Create a container I will take Pytorch from Nvidia NGC for an example. ``` enroot create --name nvidia_pytorch_22.09 nvidia+pytorch+22.09-py3.sqsh ``` ## Start a container - As the root user ``` enroot start --root --rw --mount /proj/nsc_testing/xuan:/proj/nsc_testing/xuan nvidia_pytorch_22.09 ``` - As a non-root user ``` enroot start --rw --mount /proj/nsc_testing/xuan:/proj/nsc_testing/xuan nvidia_pytorch_22.09 ``` The flag ```--mount``` mounts your local directory to your container. - You can also start a container and run your command at the same time. ``` enroot start --rw --mount /proj/nsc_testing/xuan:/proj/nsc_testing/xuan nvidia_pytorch_22.09 sh -c 'python path_to_your_script.py' ``` ## Access to GUI within Enroot ``` enroot start --rw --env DISPLAY --mount /tmp/.X11-unix:/tmp/.X11-unix --mount /proj/nsc_testing/xuan:/proj/nsc_testing/xuan nvidia_pytorch_22.09 ``` Please note that you need to use the flag ```-X``` when connecting to Berzelius.