JOB OUTPUT - An Overview of Interpreting Job Outcomes

JOB STATISTICS AND ACCOUNTING

Available Commands

my_account_usage

my_job_statistics

Job Completion Emails

When your job completes you will receive an email summary of your job. This email contains statistics and suggestions for this job should you run something like it in the future.

Slurm merges the job’s standard error and output by default and saves it to an output file with a name that includes the job ID (slurm-<job_ID>.out for normal jobs and "slurm-<job_ID_index.out for arrays"). You can specify your own output and error files to the sbatch command using the -o /file/to/output and -e /file/to/error options respectively. If both standard out and error should go to the same file, only specify -o /file/to/output Slurm will append the job’s output to the specified file(s). If you want the output to overwrite any existing files, add the --open-mode=truncate option. The files are written as soon as output is created. It does not spool on the compute node and then get copied to the final location after the job ends. If not specified in the job submission, standard output and error are combined and written into a file in the working directory from which the job was submitted.

For example if I submit job 93 from my home directory, the job output and error will be written to my home directory in a file called slurm-93.out. The file appears while the job is still running.

[user@gl-login1 ~]$ sbatch test.sh Submitted batch job 93

[user@gl-login1 ~]$ ll slurm-93.out -rw-r–r– 1 user hpcstaff 122 Jun 7 15:28 slurm-93.out

[user@gl-login1 ~]$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 93 standard example user R  0:04 1 gl3160

If you submit from a working directory which is not a shared filesystem, your output will only be available locally on the compute node and will need to be copied to another location after the job completes. /home/scratch, and /nfs are all networked filesystems which are available on the login nodes and all compute nodes.

For example if I submit a job from /tmp on the login node, the output will be in /tmp on the compute node.

[user@gl-login1 tmp]$ pwd /tmp

[user@gl-login1 tmp]$ sbatch /home/user/test.sh Submitted batch job 98

[user@gl-login1 tmp]$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 98 standard example user R 0:03 1 gl3160

[user@gl-login1 tmp]$ ssh gl3160 [user@gl3160 ~]$ ll /tmp/slurm-98.out -rw-r–r– 1 user hpcstaff 78 Jun 7 15:46 /tmp/slurm-98.out