Skip to content
Snippets Groups Projects
README.md 4.07 KiB
Newer Older
Xuan Gu's avatar
Xuan Gu committed
# Run Pytorch and Tensorflow Containers with Nvidia Enroot

Xuan Gu's avatar
Xuan Gu committed
Enroot is a simple, yet powerful tool to turn traditional container/OS images into unprivileged sandboxes. Enroot is targeted for HPC environments with integration with the Slurm scheduler, but can also be used as a standalone tool to run containers as an unprivileged user. Enroot is similar to Singularity, but with the added benefit of allowing users to read/write in the container and also to appear as a root user within the container environment.

Please read enroot's [github page](https://github.com/NVIDIA/enroot) for more information.
Xuan Gu's avatar
Xuan Gu committed

Xuan Gu's avatar
Xuan Gu committed
## Install enroot
Xuan Gu's avatar
Xuan Gu committed

Xuan Gu's avatar
Xuan Gu committed
- For Debian-based distributions
Xuan Gu's avatar
Xuan Gu committed
```
arch=$(dpkg --print-architecture)
curl -fSsL -O https://github.com/NVIDIA/enroot/releases/download/v3.4.0/enroot_3.4.0-1_${arch}.deb
curl -fSsL -O https://github.com/NVIDIA/enroot/releases/download/v3.4.0/enroot+caps_3.4.0-1_${arch}.deb # optional
sudo apt install -y ./*.deb
```
Xuan Gu's avatar
Xuan Gu committed

Xuan Gu's avatar
Xuan Gu committed
- For others, see [here](https://github.com/NVIDIA/enroot/blob/master/doc/installation.md)  
Xuan Gu's avatar
Xuan Gu committed

Xuan Gu's avatar
Xuan Gu committed
Plese note that enroot has been installed on Berzelius. You can skip this installation step if you plan to use it on Berzeliu.
Xuan Gu's avatar
Xuan Gu committed

Xuan Gu's avatar
Xuan Gu committed
## Set up Nvidia credentials
Xuan Gu's avatar
Xuan Gu committed
This step is necessary for importing container images from Nvidia NGC.
Xuan Gu's avatar
Xuan Gu committed

Xuan Gu's avatar
Xuan Gu committed
- Complete step [4.1](https://docs.nvidia.com/ngc/ngc-overview/index.html#account-signup) and [4.3](https://docs.nvidia.com/ngc/ngc-overview/index.html#generating-api-key). Save the API key.  
Xuan Gu's avatar
Xuan Gu committed

Xuan Gu's avatar
Xuan Gu committed
- Add the API key to the config file at ```~/.config/enroot/.credentials  ```  
Xuan Gu's avatar
Xuan Gu committed
```
machine nvcr.io login $oauthtoken password your_api_key
machine authn.nvidia.com login $oauthtoken password your_api_key
```

Xuan Gu's avatar
Xuan Gu committed
- Set the config path by adding the line to ```~/.bashrc```
Xuan Gu's avatar
Xuan Gu committed
```
export ENROOT_CONFIG_PATH=/home/xuagu37/.config/enroot
```
Xuan Gu's avatar
Xuan Gu committed

Xuan Gu's avatar
Xuan Gu committed
- To make the path valid
Xuan Gu's avatar
Xuan Gu committed
```
Xuan Gu's avatar
Xuan Gu committed
source ~/.bashrc
Xuan Gu's avatar
Xuan Gu committed
```

Xuan Gu's avatar
Xuan Gu committed
## Set path to user container storage

Xuan Gu's avatar
Xuan Gu committed
By default, your enroot containers will be saved in your ```home``` directory. On Berzelius, you have 20 GB hard drive space for ```home```. It is a better practice to put enroot containers in your project directory.  
Xuan Gu's avatar
Xuan Gu committed

Add this line to your ```bashrc```

```
export ENROOT_CACHE_PATH=/proj/nsc_testing/xuan/enroot/cache
export ENROOT_DATA_PATH=/proj/nsc_testing/xuan/enroot/data
```


Xuan Gu's avatar
Xuan Gu committed
## Import container images
Xuan Gu's avatar
Xuan Gu committed

Xuan Gu's avatar
Xuan Gu committed
You can import a container image either from Nvidia NGC or Pytorch/Tensorflow official Docker Hub repositories.
Xuan Gu's avatar
Xuan Gu committed

Xuan Gu's avatar
Xuan Gu committed
- From Nvidia NGC 
Xuan Gu's avatar
Xuan Gu committed
```
enroot import 'docker://nvcr.io#nvidia/pytorch:22.09-py3'
enroot import 'docker://nvcr.io#nvidia/tensorflow:22.11-tf2-py3'
```
For other versions, please see the release notes for [Pytorch](https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/index.html) and [Tensorflow](https://docs.nvidia.com/deeplearning/frameworks/tensorflow-release-notes/index.html).
Xuan Gu's avatar
Xuan Gu committed

Xuan Gu's avatar
Xuan Gu committed
- From Pytorch/Tensorflow official Docker Hub repositories
Xuan Gu's avatar
Xuan Gu committed
```
enroot import 'docker://pytorch/pytorch:1.12.1-cuda11.3-cudnn8-devel'
enroot import 'docker://tensorflow/tensorflow:2.11.0-gpu'
```
For other versions, please see the Docker tags for [Pytorch](https://hub.docker.com/r/pytorch/pytorch/tags) and [Tensorflow](https://hub.docker.com/r/tensorflow/tensorflow/tags).
Xuan Gu's avatar
Xuan Gu committed

Xuan Gu's avatar
Xuan Gu committed
## Create a container
Xuan Gu's avatar
Xuan Gu committed

Xuan Gu's avatar
Xuan Gu committed
I will take Pytorch from Nvidia NGC for an example.
Xuan Gu's avatar
Xuan Gu committed
```
enroot create --name nvidia_pytorch_22.09 nvidia+pytorch+22.09-py3.sqsh
```
Xuan Gu's avatar
Xuan Gu committed

Xuan Gu's avatar
Xuan Gu committed
## Start a container
Xuan Gu's avatar
Xuan Gu committed

Xuan Gu's avatar
Xuan Gu committed
- As the root user
Xuan Gu's avatar
Xuan Gu committed
```
enroot start --root --rw --mount /proj/nsc_testing/xuan:/proj/nsc_testing/xuan nvidia_pytorch_22.09  
```
Xuan Gu's avatar
Xuan Gu committed

Xuan Gu's avatar
Xuan Gu committed
- As a non-root user
Xuan Gu's avatar
Xuan Gu committed
```
enroot start --rw --mount /proj/nsc_testing/xuan:/proj/nsc_testing/xuan nvidia_pytorch_22.09  
```
Xuan Gu's avatar
Xuan Gu committed

Xuan Gu's avatar
Xuan Gu committed
The flag ```--mount``` mounts your local directory to your container.
Xuan Gu's avatar
Xuan Gu committed

Xuan Gu's avatar
Xuan Gu committed
- You can also start a container and run your command at the same time.
Xuan Gu's avatar
Xuan Gu committed
```
enroot start --rw --mount /proj/nsc_testing/xuan:/proj/nsc_testing/xuan nvidia_pytorch_22.09 sh -c 'python path_to_your_script.py' 
```
Xuan Gu's avatar
Xuan Gu committed

Xuan Gu's avatar
Xuan Gu committed
## Access to GUI within Enroot
Xuan Gu's avatar
Xuan Gu committed

Xuan Gu's avatar
Xuan Gu committed
```
enroot start --rw --env DISPLAY --mount /tmp/.X11-unix:/tmp/.X11-unix --mount /proj/nsc_testing/xuan:/proj/nsc_testing/xuan nvidia_pytorch_22.09  
```
Xuan Gu's avatar
Xuan Gu committed

Xuan Gu's avatar
Xuan Gu committed
Please note that you need to use the flag ```-X``` when connecting to Berzelius.