Slurm HPC Reference
Free reference guide: Slurm HPC Reference
About Slurm HPC Reference
The Slurm HPC Reference is a searchable cheat sheet covering the Slurm workload manager used on high-performance computing clusters. It includes job submission commands (sbatch, srun, salloc), SBATCH directives for job name, time limits, memory allocation, and partition selection, plus node and GPU resource requests with gres specifications.
The reference covers queue management with squeue output formatting, scancel for job cancellation, scontrol for holding/releasing/updating jobs, and array job syntax with SLURM_ARRAY_TASK_ID environment variables and concurrent job throttling. Monitoring tools include sacct for completed job history, seff for efficiency reports, sstat for running job metrics, and mail notification configuration.
Designed for HPC system administrators, computational scientists, and researchers who submit batch jobs to Slurm-managed clusters. Covers partition and QoS configuration queries, node detail inspection, and all essential SBATCH directives needed for production job scheduling on multi-node GPU and CPU clusters.
Key Features
- Job submission commands: sbatch for batch jobs, srun for interactive execution, salloc for resource allocation
- SBATCH directives reference: --job-name, --time, --mem, --partition, --nodes, --ntasks, --cpus-per-task, --gres
- Partition and QoS management with sinfo, scontrol show partition, and node status queries
- GPU resource scheduling with --gres=gpu:N and specific GPU type requests (e.g., gpu:a100:2)
- Array job syntax with --array ranges, concurrent limits (%N), and SLURM_ARRAY_TASK_ID usage
- Queue management: squeue formatting, scancel, scontrol hold/release/update job attributes
- Job monitoring: sacct history, seff efficiency reports, sstat runtime metrics, email notifications
- Searchable by category with dark mode support across desktop, tablet, and mobile
Frequently Asked Questions
What is the difference between sbatch, srun, and salloc?
sbatch submits a batch script for non-interactive execution, srun runs a command interactively on allocated resources (can also be used inside batch scripts for MPI launching), and salloc allocates resources and opens an interactive shell session where you can then run srun commands.
How do I request GPU resources in a Slurm job?
Use the --gres=gpu:N directive to request N GPUs. You can specify GPU type with --gres=gpu:a100:2 for two A100 GPUs. Combine with --partition=gpu to target GPU-equipped partitions. The reference includes examples for both SBATCH directives and command-line usage.
How do array jobs work in Slurm?
Array jobs submit multiple similar tasks with #SBATCH --array=0-99. Each task gets a unique $SLURM_ARRAY_TASK_ID for parameterization. Use --array=0-999%50 to limit concurrent tasks to 50. The parent job ID is available as $SLURM_ARRAY_JOB_ID.
How do I check the status and efficiency of my Slurm jobs?
Use squeue -u $USER to see running/pending jobs, sacct -j JOBID to view completed job history with elapsed time and memory usage, and seff JOBID for a quick efficiency report showing CPU and memory utilization percentages.
What SBATCH directives should I set for a typical batch job?
At minimum, set --job-name for identification, --time for the maximum wall time, --mem or --mem-per-cpu for memory, --partition for the queue, and --output/--error for log files. For parallel jobs, add --nodes, --ntasks, and --cpus-per-task as needed.
How do I cancel or modify a running Slurm job?
Use scancel JOBID to cancel a specific job, scancel -u $USER to cancel all your jobs, scontrol hold JOBID to pause a pending job, scontrol release JOBID to resume it, and scontrol update JobId=JOBID TimeLimit=48:00:00 to change attributes.
How do I view partition and node information?
Use sinfo to see all partitions and their states, sinfo -p PARTITION for specific partition details, sinfo -N -l for per-node information, scontrol show partition for detailed configuration, and scontrol show node for CPU count, memory, and GPU resources per node.
Is this Slurm reference free to use?
Yes, this Slurm HPC Reference is completely free with no account required. It runs entirely in your browser with no server processing. It is part of liminfo.com's collection of free online developer and infrastructure tools.