Skip to content

Running a PyTorch Example on Turing

Gompei is an on-campus student at WPI who wants to run a PyTorch example on the Turing High-Performance Computing (HPC) cluster. They use a Windows laptop and have little experience with Linux or HPC systems. This guide walks through the steps they take to run a PyTorch script on Turing, keeping it simple and straightforward, with links to related documentation for deeper understanding.


1. Request Access to Turing

Before starting, Gompei needs an account on Turing.


2. Connect to Turing from Windows

Gompei connects to Turing using the built-in SSH client on their Windows laptop.

  • Open Command Prompt: Press the Windows key, type Command Prompt, and press Enter.
  • Connect via SSH: In the Command Prompt window, type ssh gompei@turing.wpi.edu and press Enter.
  • Handle Security Prompt: When prompted about the server's authenticity, type yes and press Enter.
  • Authenticate: Enter the WPI password and press Enter.
  • More Information: For a deeper understanding of Linux basics, refer to Linux on Turing.

3. Set Up the Python Environment

Now connected to Turing, Gompei sets up the Python environment to run PyTorch.

  • Load the Python Module: module load python/3.11.10
  • Learn more about modules in the Modules Documentation.
  • Create a Project Directory: mkdir pytorch_example and cd pytorch_example
  • Set Up a Virtual Environment: python3 -m venv pytorch_example_env and source pytorch_example_env/bin/activate
  • More about Python virtual environments: Python on Turing.
  • Install Necessary Packages:
  • pip3 install numpy
  • pip3 install torch

4. Prepare the PyTorch Script

Gompei uses an example from the PyTorch tutorials.

  • Get the Example Code: Visit the PyTorch Tensors Tutorial, copy the example code, and uncomment the 5th line to use the GPUs available on Turing.
  • Create the Python Script: nano pytorch_example.py, paste the copied code into the editor.
  • Save and Exit: Press Ctrl + o, then Enter to save; press Ctrl + x to exit the editor.

5. Create the SLURM Submission Script

To run the job on Turing, Gompei creates a SLURM submission script.

  • Create the Script: nano run_pytorch.sh
  • Add the Following Content:
#!/bin/bash
#SBATCH -N 1                      # allocate 1 compute node
#SBATCH -n 1                      # total number of tasks
#SBATCH --mem=1g                  # allocate 1 GB of memory
#SBATCH -J "pytorch example"      # name of the job
#SBATCH -o pytorch_example_%j.out # name of the output file
#SBATCH -e pytorch_example_%j.err # name of the error file
#SBATCH -p short                  # partition to submit to
#SBATCH -t 01:00:00               # time limit of 1 hour
#SBATCH --gres=gpu:1              # request 1 GPU

module load python/3.11.10               # These version were chosen for compatability with pytorch
module load cuda/12.4.0/3mdaov5          # load CUDA (adjust if necessary)

python3 -m venv pytorch_example_env       # create virtual environment
source pytorch_example_env/bin/activate   # activate virtual environment
pip3 install numpy                        # install NumPy
pip3 install torch                        # install PyTorch
python3 pytorch_example.py                # run Python script
  • Save and Exit: Press Ctrl + o, then Enter to save; press Ctrl + X to exit the editor.
  • More Information: Refer to the SLURM Submission Guide.

6. Submit and Monitor the Job

Gompei submits the job and monitors its progress.

  • Submit the Job: sbatch run_pytorch.sh
  • After submitting, a message like Submitted batch job 123456 appears, indicating the job ID.
  • Check Job Status: squeue --me; this shows if the job is running or queued.
  • Monitor Output in Real-Time: tail -f pytorch_example_123456.out; press Ctrl + c to stop monitoring.
  • Ensure the Job Has Finished: When the job no longer appears in squeue --me, it has completed.
  • View the Output File: cat pytorch_example_123456.out
  • Check for Errors: cat pytorch_example_123456.err; an empty or non-critical error file indicates the job ran smoothly.

7. Transfer Files Between Turing and the Local Computer

Gompei wants to copy the output files to their Windows laptop.

  • Open Command Prompt: Press the Windows key, type Command Prompt, and press Enter.
  • Transfer Files Using scp: Run the following command:
scp gompei@turing.wpi.edu:/home/gompei/pytorch_example/pytorch_example_123456.out C:\Users\gompei\Downloads\
  • Replace C:\Users\gompei\Downloads\ with the desired local directory.
  • Enter the password when prompted.