3 What is Slurm
Most of this section was extracted from the slurmR
R package’s vignette “Working with Slurm.”
Nowadays, high-performance-computing (HPC) clusters are commonly available tools for either in or out of cloud settings. Slurm Work Manager (formerly Simple Linux Utility for Resource Manager) is a program written in C that is used to efficiently manage resources in HPC clusters. The slurmR R package–which we will be using in this book–provides tools for using R in HPC settings that work with Slurm. It provides wrappers and functions that allow the user to seamlessly integrate their analysis pipeline with HPC clusters, emphasizing on providing the user with a family of functions similar to those that the parallel R package provides.
3.1 Definitions
First, some important discussion points within the context of Slurm+R that users, in general, will find useful. Most of the points have to do with options available for Slurm, and in particular, with the sbatch
command with is used to submit batch jobs to Slurm. Users who have used Slurm in the past may wish to skip this and continue reading the following section.
Node A single computer in the HPC: A lot of times jobs will be submitted to a single node. The simplest way of using R+Slurm is submitting a single job and requesting multiple CPUs to use, for example,
parallel::parLapply
orparallel::mclapply
. Usually, users do not need to request a specific number of nodes to be used as Slurm will allocate the resources as needed.A common mistake of R users is to specify the number of nodes and expect that their script will be parallelized. This won’t happen unless the user explicitly writes a parallel computing script.
The relevant flag for
sbatch
is--nodes
.Partition A group of nodes in HPC. Generally large nodes may have multiple partitions, meaning that nodes may be grouped in various ways. For example, nodes belonging to a single group of users may be in a single partition, and nodes dedicated to working with large data may be in another partition. Usually, partitions are associated with account privileges, so users may need to specify which account are they using when telling Slurm what partition they plan to use.
The relevant flag for
sbatch
is--partition
.Account Accounts may be associated with partitions. Accounts can have privileges to use a partition or set of nodes. Often, users need to specify the account when submitting jobs to a particular partition.
The relevant flag for
sbatch
is--account
.Task A step within a job. A particular job can have multiple tasks. tasks may span multiple nodes, so if the user wants to submit a multicore job, this option may not be the right one.
The relevant flag for
sbatch
is--ntasks
CPU generally this refers to core or thread (which may be different in systems supporting multithreaded cores). Users may want to specify how many CPUs they want to use for a task. And this is the relevant option when using things like OpenMP or functions that allow creating cluster objects in R (e.g.
makePSOCKcluster
,makeForkCluster
).The relevant option in
sbatch
is--cpus-per-task
. More information regarding CPUs in Slurm can be found here. Information regarding how Slurm counts CPUs/cores/threads can be found here.Job Array Slurm supports job arrays. A job array is in simple terms a job that is repeated multiple times by Slurm, this is, replicates a single job as requested per the user. In the case of R, when using this option, a single R script is spanned in multiple jobs, so the user can take advantage of this and parallelize jobs across multiple nodes. Besides from the fact that jobs within a Job Array may be spanned across multiple nodes, each job in that array has a unique ID that is available to the user via environment variables, in particular
SLURM_ARRAY_TASK_ID
.Within R, and hence the Rscript submitted to Slurm, users can access this environment variable with
Sys.getenv("SLURM_ARRAY_TASK_ID")
. Some of the functionalities ofslurmR
rely on Job Arrays.More information on Job Arrays can be found here. The relevant option for this in
sbatch
is--array
.
More information about Slurm can be found their official website here. A tutorial about how to use Slurm with R can be found here.
4 A brief intro to Slurm
For a quick-n-dirty intro to Slurm (Yoo, Jette, and Grondona 2003), we will start with a simple “Hello world” using Slurm + R. For this, we need to go through the following steps:
Copy a Slurm script to HPC,
Logging to HPC, and
Submit the job using
sbatch
.
4.1 Step 1: Copy the Slurm script to HPC
We need to copy the following Slurm script to HPC (00-hello-world.slurm):
#!/bin/sh
#SBATCH --output=00-hello-world.out
module load R/4.2.2
Rscript -e "paste('Hello from node', Sys.getenv('SLURMD_NODENAME'))"
Which has four lines:
#!/bin/sh
: The shebang (shewhat?)#SBATCH --output=00-hello-world.out
: An option to be passed tosbatch
, in this case, the name of the output file to which stdout and stderr will go.module load R/4.2.2
: Uses Lmod to load theR
module.Rscript ...
: A call to R to evaluate the expressionpaste(...)
. This will get the environment variableSLURMD_NODENAME
(whichsbatch
creates) and print it on a message.
To do so, we will use Secure copy protocol (scp), which allows us to copy data to and fro computers. In this case, we should do something like the following
scp 00-hello-world.slurm [userid]@notchpeak.chpc.utah.edu:/path/to/a/place/you/can/access
In words, “Using the username [userid]
, connect to notchpeak.chpc.utah.edu
, take the file 00-hello-world.slurm
and copy it to /path/to/a/place/you/can/access
. With the file now available in the cluster, we can submit this job using Slurm.
4.2 Step 2: Logging to HPC
Log in using ssh. In the case of Windows users, download the Putty client.
To log in, you will need to use your organization ID. Usually, if your email is something like
myemailuser@school.edu
, your ID ismyemailuser
. Then:ssh myemailuser@notchpeak.chpc.utah.edu
4.3 Step 3: Submitting the job
Overall, there are two ways to use the compute nodes: interactively (salloc
) and in batch mode (sbatch
). In this case, since we have a Slurm script, we will use the latter.
To submit the job, we can type the following:
sbatch 00-hello-world.slurm
And that’s it! That said, it is often required to specify the account and partition the user will be submitting the job. For example, if you have the account my-account
and partition my-partition
associated with your user, you can incorporate that information as follows:
sbatch 00-hello-world.slurm --acount=my-account --partition=my-partition
In the case of interactive sessions, You can start one using the salloc
command. For example, if you wanted to run R with 8 cores, using 16 Gigs of memory in total, you would need to do the following:
salloc -n1 --cpus-per-task=8 --mem-per-cpu=2G --time=01:00:00
Once your request is submitted, you will get access to a compute node. Within it, you can load the required modules and start R:
module load R/4.2.2
R
Interactive sessions are not recommended for long jobs. Instead, use this resource if you need to inspect some large dataset, debug your code, etc.