README.md

# Run Pytorch and Tensorflow Containers with Nvidia Enroot

Enroot is a simple, yet powerful tool to turn traditional container/OS images into unprivileged sandboxes. Enroot is targeted for HPC environments with integration with the Slurm scheduler, but can also be used as a standalone tool to run containers as an unprivileged user. Enroot is similar to Singularity, but with the added benefit of allowing users to read/write in the container and also to appear as a root user within the container environment.

Please read enroot's [github page](https://github.com/NVIDIA/enroot) for more information.

## Install enroot

- For Debian-based distributions
```
arch=$(dpkg --print-architecture)
curl -fSsL -O https://github.com/NVIDIA/enroot/releases/download/v3.4.0/enroot_3.4.0-1_${arch}.deb
curl -fSsL -O https://github.com/NVIDIA/enroot/releases/download/v3.4.0/enroot+caps_3.4.0-1_${arch}.deb # optional
sudo apt install -y ./*.deb
```

- For others, see [here](https://github.com/NVIDIA/enroot/blob/master/doc/installation.md)  

Plese note that enroot has been installed on Berzelius. You can skip this installation step if you plan to use it on Berzeliu.

## Set up Nvidia credentials
This step is necessary for importing container images from Nvidia NGC.

- Complete step [4.1](https://docs.nvidia.com/ngc/ngc-overview/index.html#account-signup) and [4.3](https://docs.nvidia.com/ngc/ngc-overview/index.html#generating-api-key). Save the API key.  

- Add the API key to the config file at ```~/.config/enroot/.credentials  ```  
```
machine nvcr.io login $oauthtoken password your_api_key
machine authn.nvidia.com login $oauthtoken password your_api_key
```

- Set the config path by adding the line to ```~/.bashrc```
```
export ENROOT_CONFIG_PATH=/home/xuagu37/.config/enroot
```

- To make the path valid
```
source ~/.bashrc
```

## Set path to user container storage

By default, your enroot containers will be saved in your ```home``` directory. On Berzelius, you have 20 GB hard drive space for ```home```. It is a better practice to put enroot containers in your project directory.  

Add this line to your ```bashrc```

```
export ENROOT_CACHE_PATH=/proj/nsc_testing/xuan/enroot/cache
export ENROOT_DATA_PATH=/proj/nsc_testing/xuan/enroot/data
```


## Import container images

You can import a container image either from Nvidia NGC or Pytorch/Tensorflow official Docker Hub repositories.

- From Nvidia NGC 
```
enroot import 'docker://nvcr.io#nvidia/pytorch:22.09-py3'
enroot import 'docker://nvcr.io#nvidia/tensorflow:22.11-tf2-py3'
```
For other versions, please see the release notes for [Pytorch](https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/index.html) and [Tensorflow](https://docs.nvidia.com/deeplearning/frameworks/tensorflow-release-notes/index.html).

- From Pytorch/Tensorflow official Docker Hub repositories
```
enroot import 'docker://pytorch/pytorch:1.12.1-cuda11.3-cudnn8-devel'
enroot import 'docker://tensorflow/tensorflow:2.11.0-gpu'
```
For other versions, please see the Docker tags for [Pytorch](https://hub.docker.com/r/pytorch/pytorch/tags) and [Tensorflow](https://hub.docker.com/r/tensorflow/tensorflow/tags).

## Create a container

I will take Pytorch from Nvidia NGC for an example.
```
enroot create --name nvidia_pytorch_22.09 nvidia+pytorch+22.09-py3.sqsh
```

## Start a container

- As the root user
```
enroot start --root --rw --mount /proj/nsc_testing/xuan:/proj/nsc_testing/xuan nvidia_pytorch_22.09  
```

- As a non-root user
```
enroot start --rw --mount /proj/nsc_testing/xuan:/proj/nsc_testing/xuan nvidia_pytorch_22.09  
```

The flag ```--mount``` mounts your local directory to your container.

- You can also start a container and run your command at the same time.
```
enroot start --rw --mount /proj/nsc_testing/xuan:/proj/nsc_testing/xuan nvidia_pytorch_22.09 sh -c 'python path_to_your_script.py' 
```

## Access to GUI within Enroot

```
enroot start --rw --env DISPLAY --mount /tmp/.X11-unix:/tmp/.X11-unix --mount /proj/nsc_testing/xuan:/proj/nsc_testing/xuan nvidia_pytorch_22.09  
```

Please note that you need to use the flag ```-X``` when connecting to Berzelius.