Running a PyTorch Example on Turing
Gompei is an on-campus student at WPI who wants to run a PyTorch example on the Turing High-Performance Computing (HPC) cluster. They use a Windows laptop and have little experience with Linux or HPC systems. This guide walks through the steps they take to run a PyTorch script on Turing, keeping it simple and straightforward, with links to related documentation for deeper understanding.
1. Request Access to Turing
Before starting, Gompei needs an account on Turing.
- Action: Complete the Turing Account Request Form; wait for an email confirming the account is ready.
- More Information: See the Getting Started Guide.
2. Connect to Turing from Windows
Gompei connects to Turing using the built-in SSH client on their Windows laptop.
- Open Command Prompt: Press the Windows key, type
Command Prompt, and press Enter. - Connect via SSH: In the Command Prompt window, type
ssh gompei@turing.wpi.eduand press Enter. - Handle Security Prompt: When prompted about the server's authenticity, type
yesand press Enter. - Authenticate: Enter the WPI password and press Enter.
- More Information: For a deeper understanding of Linux basics, refer to Linux on Turing.
3. Set Up the Python Environment
Now connected to Turing, Gompei sets up the Python environment to run PyTorch.
- Load the Python Module:
module load python/3.11.10 - Learn more about modules in the Modules Documentation.
- Create a Project Directory:
mkdir pytorch_exampleandcd pytorch_example - Set Up a Virtual Environment:
python3 -m venv pytorch_example_envandsource pytorch_example_env/bin/activate - More about Python virtual environments: Python on Turing.
- Install Necessary Packages:
pip3 install numpypip3 install torch
4. Prepare the PyTorch Script
Gompei uses an example from the PyTorch tutorials.
- Get the Example Code: Visit the PyTorch Tensors Tutorial, copy the example code, and uncomment the 5th line to use the GPUs available on Turing.
- Create the Python Script:
nano pytorch_example.py, paste the copied code into the editor. - Save and Exit: Press
Ctrl + o, then Enter to save; pressCtrl + xto exit the editor.
5. Create the SLURM Submission Script
To run the job on Turing, Gompei creates a SLURM submission script.
- Create the Script:
nano run_pytorch.sh - Add the Following Content:
#!/bin/bash
#SBATCH -N 1 # allocate 1 compute node
#SBATCH -n 1 # total number of tasks
#SBATCH --mem=1g # allocate 1 GB of memory
#SBATCH -J "pytorch example" # name of the job
#SBATCH -o pytorch_example_%j.out # name of the output file
#SBATCH -e pytorch_example_%j.err # name of the error file
#SBATCH -p short # partition to submit to
#SBATCH -t 01:00:00 # time limit of 1 hour
#SBATCH --gres=gpu:1 # request 1 GPU
module load python/3.11.10 # These version were chosen for compatability with pytorch
module load cuda/12.4.0/3mdaov5 # load CUDA (adjust if necessary)
python3 -m venv pytorch_example_env # create virtual environment
source pytorch_example_env/bin/activate # activate virtual environment
pip3 install numpy # install NumPy
pip3 install torch # install PyTorch
python3 pytorch_example.py # run Python script
- Save and Exit: Press
Ctrl + o, then Enter to save; pressCtrl + Xto exit the editor. - More Information: Refer to the SLURM Submission Guide.
6. Submit and Monitor the Job
Gompei submits the job and monitors its progress.
- Submit the Job:
sbatch run_pytorch.sh - After submitting, a message like
Submitted batch job 123456appears, indicating the job ID. - Check Job Status:
squeue --me; this shows if the job is running or queued. - Monitor Output in Real-Time:
tail -f pytorch_example_123456.out; pressCtrl + cto stop monitoring. - Ensure the Job Has Finished: When the job no longer appears in
squeue --me, it has completed. - View the Output File:
cat pytorch_example_123456.out - Check for Errors:
cat pytorch_example_123456.err; an empty or non-critical error file indicates the job ran smoothly.
7. Transfer Files Between Turing and the Local Computer
Gompei wants to copy the output files to their Windows laptop.
- Open Command Prompt: Press the Windows key, type
Command Prompt, and press Enter. - Transfer Files Using
scp: Run the following command:
scp gompei@turing.wpi.edu:/home/gompei/pytorch_example/pytorch_example_123456.out C:\Users\gompei\Downloads\
- Replace
C:\Users\gompei\Downloads\with the desired local directory. - Enter the password when prompted.