COMMON JOB SUBMISSION OPTIONS

Common Job Submission Options

Description Slurm directive (#SBATCH option) Great Lakes Usage
Job name --job-name=<name> --job-name=gljob1
Account --account=<account> --account=test
Partition --partition=<partition_name> --partition=standard

Available partitions: standard (default), gpu (GPU jobs only), spgpu (single-precision optimized), largemem (large memory jobs only), debug, standard-oc (on-campus software only)

Wall time limit --time=<dd-hh:mm:ss> --time=01-02:00:00
Node count --nodes=<count> --nodes=2
Process count per node --ntasks-per-node=<count> --ntasks-per-node=1
Minimum memory per processor --mem-per-cpu=<memory> --mem-per-cpu=1000m
Request software license(s) --licenses=<application>@slurmdb:<N> --licenses=stata@slurmdb:1
requests one license for Stata
Request event notification --mail-type=<events>
Note: multiple mail-type requests may be specified in a comma separated list:
--mail-type=BEGIN,END,NONE,FAIL,REQUEUE
--mail-type=BEGIN,END,FAIL

Please note that if your job is set to utilize more than one node, make sure your code is MPI enabled in order to run across these nodes. More advanced job submission options can be found in the Slurm User Guide for Great Lakes.

Interactive Jobs

An interactive job is a job that returns a command line prompt (instead of running a script) when the job runs. Interactive jobs are useful when debugging or interacting with an application. The salloc command is used to submit an interactive job to Slurm. When the job starts, a command line prompt will appear on one of the compute nodes assigned to the job. From here commands can be executed using the resources allocated on the local node.

[user@gl-login1 ~]$ salloc --account=test salloc: job 28652756 queued and waiting for resources salloc: job 28652756 has been allocated resources salloc: Granted job allocation 28652756 salloc: Waiting for resource configuration salloc: Nodes gl3057 are ready for job [user@gl3057 ~]$ hostname gl3057.arc-ts.umich.edu [user@gl3057 ~]$

Jobs submitted with salloc and no additional specification of resources will be assigned the cluster default values of 1 CPU and 768MB of memory.  The account must be specified; the job will not run otherwise. If additional resources are required, they can be requested as options to the salloc command. The following example job would be appropriate for an MPI job where one wants two nodes with four MPI processes using one CPU on each node with one GB of memory for each CPU in each task. MPI programs run from jobs should be started with srun or one of the other commands that will start MPI programs. Note the --cpu-bind=none option, which is recommended unless you know what an efficient processor geometry for your job is.

[user@gl-login1 ~]$ salloc --nodes=2 --account=test --ntasks-per-node=4 --mem-per-cpu=1GB salloc: Pending job allocation 28652831 salloc: job 28652831 queued and waiting for resources salloc: job 28652831 has been allocated resources salloc: Granted job allocation 28652831 salloc: Waiting for resource configuration salloc: Nodes gl[3017-3018] are ready for job [user@gl3160 ~]$ srun --cpu-bind=none hostname gl3017.arc-ts.umich.edu gl3017.arc-ts.umich.edu gl3017.arc-ts.umich.edu gl3017.arc-ts.umich.edu gl3018.arc-ts.umich.edu gl3018.arc-ts.umich.edu gl3018.arc-ts.umich.edu gl3018.arc-ts.umich.edu

In the above example srun is used within the job from the first compute node to run a command once for every task in the job on the assigned resources. srun can be used to run on a subset of the resources assigned to the job, though that is fairly uncommon. See the srun man page for more details.

GPU and Large Memory Jobs

Jobs can request GPUs with the job submission options --partition=gpu or --partition=spgpu and a count option from the table below. All counts can be represented by gputype:number or just a number (default type on partition will be used). Available GPU types can be found with the command sinfo -O gres -p <partition>. GPUs can be requested in both Batch and Interactive jobs.  Additionally, a user can select the compute mode of GPUs for each job as either exclusive (ARC’s default setting) or shared.  Exclusive mode limits each GPU to run only one process at a time, while shared mode allows multiple processes to run simultaneously on a single GPU.  See the CUDA Programming Guide for more details. Note, you may query the compute mode from any GPU node by entering the command nvidia-smi -q | grep "Compute Mode", where a result of Default refers to the NVIDIA default of shared mode, as opposed to the ARC default selection of exclusive mode.  For example:

$ nvidia-smi -q |grep "Compute Mode"
Compute Mode : Default

The gpu partition uses NVIDIA Tesla V100 GPUs (gputype v100) and the spgpu partition uses NVIDIA A40 GPUs (gputype a40).  For more information on these GPUs, please see the Great Lakes configuration page.

Description Slurm directive (#SBATCH or srun option) Example
GPUs per node --gpus-per-node=<gputype:number> --gpus-per-node=2 or --gpus-per-node=v100:2
GPUs per job --gpus=<gputype:number> --gpus=2 or --gpus=a40:2
GPUs per socket --gpus-per-socket=<gputype:number> --gpus-per-socket=2 or --gpus-per-socket=v100:2
GPUs per task --gpus-per-task=<gputype:number> --gpus-per-task=2 or --gpus-per-task=a40:2
Compute Mode --gpu_cmode=<shared|exclusive>  --gpu_cmode=shared
CPUs required per GPU --cpus-per-gpu=<number>  --cpus-per-gpu=4
Memory per GPU --mem-per-gpu=<number>  --mem-per-gpu=1000m

Jobs can request nodes with large amounts of RAM with --partition=largemem.

Submitting a Job in One Line

If you wish to submit a job without needing a separate script, you can use sbatch --wrap=<command string>.  This will wrap the specified command in a simple “sh” shell script, which is then submitted to the Slurm controller.

Using Local Disk During a Job

During your job, you may write to and read from two temporary locations on the node:

  • /tmp: Two 7200 RPM SATA drives in RAID 0, 3.5 TB per node
  • /tmpssd: Faster solid state drive, 426 GB per node (on standard compute nodes only)

These folders are local, meaning they are only available to the processes running on that specific node and are not shared across the cluster.  If you need shared space, your /scratch folder may be a better temporary work space.

Keep in mind that these are temporary folders and may be used by others during or after your job. Please try not to completely fill the space so that others can use it, and move or delete your /tmp and /tmpssd files after your work is finished.

Job Status

Most of a job’s specifications can be seen by invoking scontrol show job <jobID>.  More details about the job can be written to a file by using  scontrol write batch_script <jobID> output.txt. If no output file is specified, the script will be written to slurm<jobID>.sh.

A job’s record remains in Slurm’s memory for 30 minutes after it completes.  scontrol show job will return “Invalid job id specified” for a job that completed more than 30 minutes ago.  At that point, one must invoke the sacct command to retrieve the job’s record from the Slurm database.