# Run Pytorch and Tensorflow Containers with Nvidia Enroot

Enroot is a simple, yet powerful tool to turn container images into unprivileged sandboxes. Enroot is targeted for HPC environments with integration with the Slurm scheduler, but can also be used as a standalone tool to run containers as an unprivileged user. Enroot is similar to Singularity, but with the added benefit of allowing users to read/write in the container and also to appear as a root user within the container environment.

Please read Enroot's [github page](https://github.com/NVIDIA/enroot) for more information.

## Install enroot

Enroot has been installed on Berzelius. You can skip this step if you plan to use it on Berzeliu.

- For Debian-based distributions
```
arch=$(dpkg --print-architecture)
curl -fSsL -O https://github.com/NVIDIA/enroot/releases/download/v3.4.0/enroot_3.4.0-1_${arch}.deb
curl -fSsL -O https://github.com/NVIDIA/enroot/releases/download/v3.4.0/enroot+caps_3.4.0-1_${arch}.deb # optional
sudo apt install -y ./*.deb
```

- For others, see [here](https://github.com/NVIDIA/enroot/blob/master/doc/installation.md)  

## GPU support using libnvidia-container

Enroot comes with GPU support using libnvidia-container. It has been installed on Berzelius. You can skip this step if you plan to use it on Berzeliu.

To install libnvidia-container on your local machine, see the [instructions](https://nvidia.github.io/libnvidia-container/).  

- For Debian-based distributions
```
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
         && curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
         && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
               sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
               sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
```

## Set up Nvidia credentials

This step is necessary for importing container images from [Nvidia NGC](https://catalog.ngc.nvidia.com/containers).

- Complete step [4.1](https://docs.nvidia.com/ngc/ngc-overview/index.html#account-signup) and [4.3](https://docs.nvidia.com/ngc/ngc-overview/index.html#generating-api-key). Save the API key.  

- Add the API key by adding these lines to the config file at ```~/.config/enroot/.credentials  ```  
```
machine nvcr.io login $oauthtoken password your_api_key
machine authn.nvidia.com login $oauthtoken password your_api_key
```
Please replace ```your_api_key``` with your real API key.

- Set the config path by adding the line to ```~/.bashrc```
```
export ENROOT_CONFIG_PATH=/home/xuagu37/.config/enroot
```

- To make the path valid
```
source ~/.bashrc
```

## Set path to user container storage

By default, your enroot containers will be saved in your ```home``` directory. On Berzelius, you have only 20 GB disk space for ```home```. It is a better practice to put enroot containers in your project directory.  

Add the following lines to your ```bashrc```

```
export ENROOT_CACHE_PATH=/your/proj/path/enroot/cache
export ENROOT_DATA_PATH=/your/proj/path/enroot/data
```
To make the change valid
```
source ~/.bashrc
```

## Import container images

You can import a container image either from Nvidia NGC or Pytorch/Tensorflow official Docker Hub repositories.

- From Nvidia NGC 
```
enroot import 'docker://nvcr.io#nvidia/pytorch:22.09-py3'
enroot import 'docker://nvcr.io#nvidia/tensorflow:22.11-tf2-py3'
```
For other versions, please see the release notes for [Pytorch](https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/index.html) and [Tensorflow](https://docs.nvidia.com/deeplearning/frameworks/tensorflow-release-notes/index.html).

- From Pytorch/Tensorflow official Docker Hub repositories
```
enroot import 'docker://pytorch/pytorch:1.12.1-cuda11.3-cudnn8-devel'
enroot import 'docker://tensorflow/tensorflow:2.11.0-gpu'
```
For other versions, please see the Docker tags for [Pytorch](https://hub.docker.com/r/pytorch/pytorch/tags) and [Tensorflow](https://hub.docker.com/r/tensorflow/tensorflow/tags).

## Create a container

```
enroot create --name nvidia_pytorch_22.09 nvidia+pytorch+22.09-py3.sqsh
```

## Start a container

- As the root user
```
enroot start --root --rw --mount /proj/nsc_testing/xuan:/proj/nsc_testing/xuan nvidia_pytorch_22.09  
```

- As a non-root user
```
enroot start --rw --mount /proj/nsc_testing/xuan:/proj/nsc_testing/xuan nvidia_pytorch_22.09  
```

The flag ```--mount``` mounts your local directory to your container.

- You can also start a container and execute your command at the same time.
```
enroot start --rw --mount /proj/nsc_testing/xuan:/proj/nsc_testing/xuan nvidia_pytorch_22.09 sh -c 'python my_script.py' 
```

## Access to GUI within Enroot

```
enroot start --rw --env DISPLAY --mount /tmp/.X11-unix:/tmp/.X11-unix --mount /proj/nsc_testing/xuan:/proj/nsc_testing/xuan nvidia_pytorch_22.09  
```

Please note that you need to use the flag ```-X``` to set up the X11 forwarding when connecting to Berzelius.

## Cheat sheet

|Task                     |Command                 |
|:------------------------|:----------------------|
|Import a new container                       |enroot import 'docker://nvcr.io#nvidia/pytorch:22.09-py3'  |
|Create an instance of a container            |enroot create --name pytorch nvidia+pytorch+22.09-py3.sqsh                                                             |
|Destroy an instance                          |enroot remove pytorch                                                                                                 |
|Run an enroot image                          |enroot start --rw --mount /local_path:/path_on_enroot pytorch                                           |
|Run an enroot image as root                  |enroot start --root --rw --mount /local_path:/path_on_enroot pytorch                                      |
|Run an enroot image and execute your command |enroot start --rw --mount /local_path:/path_on_enroot pytorch sh -c 'python my_script.py'                 |