Slurm environment variables are a critical aspect for users of HPC clusters to understand, as they provide useful information about job execution and allow customization of Slurm behavior. Here's a brief explanation of the categories you mentioned:
1. Output Environment Variables:
These are set by Slurm for each job and include details about the job's run-time environment. Some common output environment variables include:
- `SLURM_JOB_ID`: The unique identifier for the current job allocation.
- `SLURM_JOB_NAME`: The name of the job.
- `SLURM_JOB_NODELIST`: List of nodes allocated to the job.
- `SLURM_JOB_PARTITION`: The partition the job is running on.
- `SLURM_JOB_NUM_NODES`: The number of nodes allocated to the job.
- `SLURM_PROCID`: The ID of the task (process) within the job.
- `SLURM_NTASKS`: The total number of tasks (processes) in the job.
To see the complete list of output environment variables, you can consult the Slurm documentation or run `man sbatch`, `man salloc`, or `man srun` and look under the "OUTPUT ENVIRONMENT VARIABLES" section.
2. Input Environment Variables:
These can be set by users to specify default Slurm options for their jobs. These environment variables essentially act as default settings for job scripts and command-line options. Examples include:
- `SBATCH_ACCOUNT`: Specifies the account to charge for job execution.
- `SBATCH_PARTITION`: Default partition for the job.
- `SBATCH_TIME`: Sets the wall clock limit for all jobs.
- `SBATCH_QOS`: Specifies the Quality of Service for the job.
Again, you can find the complete list by looking at the Slurm man pages as mentioned above.
3. Command Customization Variables:
Slurm allows users to customize the behavior of commands and their outputs by setting certain environment variables. For example:
- `SQUEUE_FORMAT`: This environment variable can be set to define a custom format for `squeue` command output. It must be set in the environment from which `squeue` is invoked. For example, to display job ID, partition, name, user, state, time, and nodes, you might set it like this:
```bash
export SQUEUE_FORMAT="%.18i %.9P %.8j %.8u %.2t %.9M %.9l %.6D"
```
- `SLURM_CONF`: Points to the location of the Slurm configuration file.
- `SBATCH_EXPORT`: Controls which environment variables are exported to the job’s environment.
Remember to export these variables in your shell configuration file (like `.bashrc` or `.bash_profile`) or prefix them before your Slurm command if you want them to take effect. You can use the `export` command in bash to set these variables for your current session or for each session by placing them in your shell configuration files.
COMMONLY USED ENVIRONMENT VARIABLES
Info | Slurm | Notes |
Job name | $SLURM_JOB_NAME |
|
Job ID | $SLURM_JOB_ID |
|
Submit directory | $SLURM_SUBMIT_DIR |
Slurm jobs starts from the submit directory by default. |
Submit host | $SLURM_SUBMIT_HOST |
|
Node list | $SLURM_JOB_NODELIST |
The Slurm variable has a different format to the PBS one.
To get a list of nodes use: scontrol show hostnames $SLURM_JOB_NODELIST |
Job array index | $SLURM_ARRAY_TASK_ID |
|
Queue name | $SLURM_JOB_PARTITION |
|
Number of nodes allocated | $SLURM_JOB_NUM_NODES
|
|
Number of processes | $SLURM_NTASKS |
|
Number of processes per node | $SLURM_TASKS_PER_NODE |
|
Requested tasks per node | $SLURM_NTASKS_PER_NODE |
|
Requested CPUs per task | $SLURM_CPUS_PER_TASK |
|
Scheduling priority | $SLURM_PRIO_PROCESS |
|
Job user | $SLURM_JOB_USER |
|
Hostname | $HOSTNAME == $SLURM_SUBMIT_HOST |
Unless a shell is invoked on an allocated resource, the HOSTNAME variable is propagated (copied) from the submit machine environments will be the same on all allocated nodes. |
GPUs
Jobs can request GPUs with the job submission options --partition=gpu
and a count option from the table below. All counts can be represented by gputype:number
or just a number (default type will be used). Available GPU types can be found with the command sinfo -O gres -p <partition>
. GPUs can be requested in both Batch and Interactive jobs.
Description | Slurm directive (#SBATCH or srun option) | Example |
GPUs per node | --gpus-per-node=<gputype:number> |
--gpus-per-node=2 or --gpus-per-node=v100:2 |
GPUs per job | --gpus=<gputype:number> |
--gpus=2 or --gpus=v100:2 |
GPUs per socket | --gpus-per-socket=<gputype:number> |
--gpus-per-socket=2 or --gpus-per-socket=v100:2 |
GPUs per task | --gpus-per-task=<gputype:number> |
--gpus-per-task=2 or --gpus-per-task=v100:2 |
CPUs required per GPU | --cpus-per-gpu=<number> |
--cpus-per-gpu=4 |
Memory per GPU | --mem-per-gpu=<number> |
--mem-per-gpu=1000m |