Slurm environment variables are a critical aspect for users of HPC clusters to understand, as they provide useful information about job execution and allow customization of Slurm behavior. Here's a brief explanation of the categories you mentioned:
Output Environment Variables:
These are set by Slurm for each job and include details about the job's run-time environment. Some common output environment variables include:
Slurm | Notes |
SLURM_JOB_ID |
The unique identifier for the current job allocation. |
SLURM_JOB_NAME |
The name of the job. |
SLURM_JOB_NODELIST |
List of nodes allocated to the job. |
SLURM_JOB_PARTITION |
The partition the job is running on. |
SLURM_JOB_NUM_NODES |
The number of nodes allocated to the job. |
SLURM_PROCID |
The ID of the task (process) within the job. |
SLURM_NTASKS |
The total number of tasks (processes) in the job. |
To see the complete list of output environment variables, you can consult the Slurm documentation or run man sbatch
, man salloc
, or man srun
and look under the "OUTPUT ENVIRONMENT VARIABLES" section.
Input Environment Variables:
These can be set by users to specify default Slurm options for their jobs. These environment variables essentially act as default settings for job scripts and command-line options. Examples include:
Slurm | Notes |
SBATCH_ACCOUNT |
Specifies the account to charge for job execution. |
SBATCH_PARTITION |
Default partition for the job. |
SBATCH_TIME |
Sets the wall clock limit for all jobs. |
SBATCH_QOS |
Specifies the Quality of Service for the job. |
Again, you can find the complete list by looking at the Slurm man pages as mentioned above.
Command Customization Variables:
Slurm allows users to customize the behavior of commands and their outputs by setting certain environment variables. For example:
Slurm | Notes |
SQUEUE_FORMAT |
This environment variable can be set to define a custom format for
|
SBATCH_EXPORT |
Controls which environment variables are exported to the job’s environment. |
Remember to export these variables in your shell configuration file (like ~/.bashrc
or ~/.bash_profile
) or prefix them before your Slurm command if you want them to take effect. You can use the export
command in bash to set these variables for your current session or for each session by placing them in your shell configuration files.
COMMONLY USED ENVIRONMENT VARIABLES
Info | Slurm | Notes |
Job name | $SLURM_JOB_NAME |
|
Job ID | $SLURM_JOB_ID |
|
Submit directory | $SLURM_SUBMIT_DIR |
Slurm jobs starts from the submit directory by default. |
Submit host | $SLURM_SUBMIT_HOST |
|
Node list | $SLURM_JOB_NODELIST |
The Slurm variable has a different format to the PBS one.
To get a list of nodes use:
|
Job array index | $SLURM_ARRAY_TASK_ID |
|
Queue name | $SLURM_JOB_PARTITION |
|
Number of nodes allocated | $SLURM_JOB_NUM_NODES
|
|
Number of processes | $SLURM_NTASKS |
|
Number of processes per node | $SLURM_TASKS_PER_NODE |
|
Requested tasks per node | $SLURM_NTASKS_PER_NODE |
|
Requested CPUs per task | $SLURM_CPUS_PER_TASK |
|
Scheduling priority | $SLURM_PRIO_PROCESS |
|
Job user | $SLURM_JOB_USER |
|
Hostname | $HOSTNAME == $SLURM_SUBMIT_HOST |
Unless a shell is invoked on an allocated resource, the HOSTNAME variable is propagated (copied) from the submit machine environments will be the same on all allocated nodes. |
GPUs
Jobs can request GPUs with the job submission options --partition=gpu
and a count option from the table below. All counts can be represented by gputype:number
or just a number (default type will be used). Available GPU types can be found with the command sinfo -O gres -p <partition>
. GPUs can be requested in both Batch and Interactive jobs.
Description | Slurm directive (#SBATCH or srun option) | Example |
GPUs per node | --gpus-per-node=<gputype:number> |
--gpus-per-node=2 or --gpus-per-node=v100:2 |
GPUs per job | --gpus=<gputype:number> |
--gpus=2 or --gpus=v100:2 |
GPUs per socket | --gpus-per-socket=<gputype:number> |
--gpus-per-socket=2 or --gpus-per-socket=v100:2 |
GPUs per task | --gpus-per-task=<gputype:number> |
--gpus-per-task=2 or --gpus-per-task=v100:2 |
CPUs required per GPU | --cpus-per-gpu=<number> |
--cpus-per-gpu=4 |
Memory per GPU | --mem-per-gpu=<number> |
--mem-per-gpu=1000m |