ADVANCED TOPICS - Batch File Environment Variables

Slurm environment variables are a critical aspect for users of HPC clusters to understand, as they provide useful information about job execution and allow customization of Slurm behavior. Here's a brief explanation of the categories you mentioned:

Output Environment Variables:

These are set by Slurm for each job and include details about the job's run-time environment. Some common output environment variables include:

Slurm Notes
SLURM_JOB_ID The unique identifier for the current job allocation.
SLURM_JOB_NAME The name of the job.
SLURM_JOB_NODELIST List of nodes allocated to the job.
SLURM_JOB_PARTITION The partition the job is running on.
SLURM_JOB_NUM_NODES The number of nodes allocated to the job.
SLURM_PROCID The ID of the task (process) within the job.
SLURM_NTASKS The total number of tasks (processes) in the job.

To see the complete list of output environment variables, you can consult the Slurm documentation or run man sbatch, man salloc, or man srun and look under the "OUTPUT ENVIRONMENT VARIABLES" section.

Input Environment Variables:

These can be set by users to specify default Slurm options for their jobs. These environment variables essentially act as default settings for job scripts and command-line options. Examples include:

Slurm Notes
SBATCH_ACCOUNT Specifies the account to charge for job execution.
SBATCH_PARTITION Default partition for the job.
SBATCH_TIME Sets the wall clock limit for all jobs.
SBATCH_QOS Specifies the Quality of Service for the job.

Again, you can find the complete list by looking at the Slurm man pages as mentioned above.

Command Customization Variables:

Slurm allows users to customize the behavior of commands and their outputs by setting certain environment variables. For example:

Slurm Notes
SQUEUE_FORMAT

This environment variable can be set to define a custom format for squeue command output. It must be set in the environment from which squeue is invoked. For example, in bash, to display job ID, partition, name, user, state, time, and nodes, you might set it like this:

export SQUEUE_FORMAT="%.18i %.9P %.8j %.8u %.2t %.9M %.9l %.6D"

SBATCH_EXPORT Controls which environment variables are exported to the job’s environment.

 

Remember to export these variables in your shell configuration file (like ~/.bashrc or ~/.bash_profile) or prefix them before your Slurm command if you want them to take effect. You can use the export command in bash to set these variables for your current session or for each session by placing them in your shell configuration files.

 

COMMONLY USED ENVIRONMENT VARIABLES

Info Slurm Notes
Job name $SLURM_JOB_NAME  
Job ID $SLURM_JOB_ID  
Submit directory $SLURM_SUBMIT_DIR Slurm jobs starts from the submit directory by default.
Submit host $SLURM_SUBMIT_HOST  
Node list $SLURM_JOB_NODELIST The Slurm variable has a different format to the PBS one.

To get a list of nodes use:

scontrol show hostnames $SLURM_JOB_NODELIST

Job array index $SLURM_ARRAY_TASK_ID  
Queue name $SLURM_JOB_PARTITION  
Number of nodes allocated $SLURM_JOB_NUM_NODES

$SLURM_NNODES

 
Number of processes $SLURM_NTASKS  
Number of processes per node $SLURM_TASKS_PER_NODE  
Requested tasks per node $SLURM_NTASKS_PER_NODE  
Requested CPUs per task $SLURM_CPUS_PER_TASK  
Scheduling priority $SLURM_PRIO_PROCESS  
Job user $SLURM_JOB_USER  
Hostname $HOSTNAME == $SLURM_SUBMIT_HOST Unless a shell is invoked on an allocated resource, the HOSTNAME variable is propagated (copied) from the submit machine environments will be the same on all allocated nodes.

    GPUs

    Jobs can request GPUs with the job submission options --partition=gpu and a count option from the table below. All counts can be represented by gputype:number or just a number (default type will be used). Available GPU types can be found with the command sinfo -O gres -p <partition>. GPUs can be requested in both Batch and Interactive jobs.

    Description Slurm directive (#SBATCH or srun option) Example
    GPUs per node --gpus-per-node=<gputype:number> --gpus-per-node=2 or --gpus-per-node=v100:2
    GPUs per job --gpus=<gputype:number> --gpus=2 or --gpus=v100:2
    GPUs per socket --gpus-per-socket=<gputype:number> --gpus-per-socket=2 or --gpus-per-socket=v100:2
    GPUs per task --gpus-per-task=<gputype:number> --gpus-per-task=2 or --gpus-per-task=v100:2
    CPUs required per GPU --cpus-per-gpu=<number>  --cpus-per-gpu=4
    Memory per GPU --mem-per-gpu=<number>  --mem-per-gpu=1000m