Advanced Usage
If your task comprises a complicated pipeline of interconnected tasks there are several options for splitting into dependent tasks or parallelisation of independent portions across many cluster nodes. Information on these techniques and other advance options is in this section.
How to request a GPU for your job
Whilst GPU tasks can simply be submitted to the short.gq or long.gq queues fsl_sub also provides helper options which can automatically select a GPU queue and select the appropriate CUDA toolkit for you.
- -c|--coprocessor <coprocessor name>: This selects the coprocessor with the given name (see fsl_sub --help for details of available coprocessors)
- --coprocessor_multi <number>: This allows you to request multiple GPUs. On the FMRIB cluster you can select no more than two GPUs. You will automatically be given a two-slot openmp parallel environment
- --coprocessor_class <class>: This would allow you to select which GPU hardware model you require, see fsl_sub --help for details
- --coprocessor_toolkit <toolkit version>: This allows you to select the API toolkit your sofware needs. This will automatically make available the requested CUDA libraries where these haven't been compiled into the software
- cuda selects GPUs capable of high-performance double-precision workloads and would normally be used for queued tasks such as Eddy and BedpostX.
- cuda_all selects all GPUs.
- cuda_ml selects GPUs more suited to machine learning tasks, they typically have very poor double-precision performance, instead being optimised for single, half and quarter precision workloads - use these for tasks involving ML inference and development, although training may still be more optimal on the general purpose GPUs depending on the task this involves, ask the developer of the software for advice on this. In the case of the FMRIB SLURM cluster there is no difference in double precision capability for all our GPUs - this partition is only included to allow for straightforward porting of your scripts to BMRC's cluster.
INTERACTIVE JOBS (INCLUDING GPU/MACHINE LEARNING TASKS)
Where your program requires interaction you can select a GPU when requesting a VDI, graphical MATLAB, Jupyter or RStudio session.
Alternatively, within a VDI session, you can request a text only interactive session using:
salloc -p gpu_short --gres=gpu:1 --cpus-per-gpu=2 --mem-per-cpu=8G
(...wait for job allocation...)
srun --pty /bin/bash -l
There may be a delay during the salloc command whilst the system finds a suitable host. Adapt the options as required, the example above requests:
- -p gpu_short - gpu_short partition (1.25 days)
- --gres=gpu:1 - requests a single gpu, for a specific type use `gpu:k40:1` and change the number to 2 to request two GPUs
- --cpus-per-gpu=2 - requests two CPU cores for each GPU allocated.
- --mem-per-cpu=8G - allocates 16GB of memory for the task.
The `srun` command then launches a terminal into this interactive job.
When you have finished, use the command `exit` twice to return to your original terminal.
How to submit commands that rely on a personal Conda environment
When a job starts up on an OOD compute node it does not inherit the environment that you had when it was submitted. Modules are ordinarily re-loaded, so if the software is configured using a module then it should still work when submitted (you could write your own modules if you wish - see https://lmod.readthedocs.io/en/latest/015_writing_modules.html).
If your command is installed into a Conda environment not configured by a module then the cluster node will not know where to find it. You can either specify the full path to the script (typically <path of environment>/bin/<scriptname>
) or you can create a wrapper script and submit this script to fsl_sub. A basic (generic) wrapper follows:
#!/bin/bash # Enable Conda eval "$(conda shell.bash hook)" # Activate your environment conda activate <name or pathtoenvironment> "$@"
Make this executable (chmod +x <name of script>
) and then you can use it to run commands as follows.
If you wish to run 'mypython_command option1 option2 option2' then use:
fsl_sub -R 1 -T 1 ./conda_wrapper.sh mypython_command option1 option2
How to request a multi-threaded slot and how to ensure your software only uses the CPU cores it has been allocated
Running multi-threaded programs can cause significant problems with cluster scheduling software if the clustering software is not made aware of the multiple threads (your job is allocated one slot but actually consumes many more, often ALL the CPUs, overloading the machine).
We support the running of shared memory multi-threaded software only (e.g. OpenMP, multi-threaded MKL, OpenBLAS etc).
To submit an OpenMP job, use the -s (or --parallelenv) option to fsl_sub. For example:
fsl_sub -s 2 <command or script>
2 being the number of threads you wish to allocate to your jobs.
The task running on the queue will be able to determine how many slots it has by querying the environment variable pointed to by FSLSUB_NSLOTS. For example in BASH the number of slots is equal to ${!FSLSUB_NSLOTS}.
In Python you would be able to get this figure with the following code:
import os slots = os.environ[os.environ['FSLSUB_NSLOTS']]
Within MATLAB you can control the number of slots with:
n = getenv("FSLSUB_NSLOTS"); LASTN = maxNumCompThreads(n);
To be able to provide these threads the cluster software needs to reserve slots on compute nodes, this may lead to significant wait times whilst sufficient slots become available on a single device.
How to submit non-interactive MATLAB scripts to the queues
See the MATLAB page for details on selecting MATLAB versions, compilation and using compilation runtimes.
For non-interactive MATLAB, it is more efficient to compile your code (see the MATLAB page).
When using MATLAB 2019a onwards, there is a new command line option `-batch` specifically for running MATLAB in the most efficient manner on compute clusters. Unfortunately, when run in this mode, MATLAB only accepts simple commands, so if your script is in a file called 'mytask.m' then you would call with:
fsl_aub -R 16 -T 100 matlab -batch "run('mytask.m')"
For older MATLAB versions then the equivalent would be:
fsl_sub -R 16 -T 100 matlab -nodisplay -nosplash \< mytask.m
NB The "\" is very important since MATLAB won't read your script otherwise.
Warning: MATLAB tasks will often attempt to carry out some operations using multiple threads. Our cluster is configured to run only single thread programs unless you request multiple threads. SLURM will enforce these limits so preventing MATLAB from overloading the system, but it may be advisable to limit the number of threads MATLAB uses to ensure optimum performance.
Once the task is running you can look at the file "matlab.o<jobid>" for any output.
If you wish to take advantage of the multi-threaded facilities in MATLAB request multiple cores with the -s option to fsl_sub.
Where you must interact with the process see the section on the MATLAB gui within the VDI.
Environment variables that can be set to control fsl_sub submitted tasks
Available Environment Variables
fsl_sub sets or can be controlled with the following shell variables. These can be set either for the duration of the fsl_sub run by prepending the call with the setting of the value:
ENVVAR=VALUE fsl_sub ...
or by exporting the value to your shell so that all subsequent calls will also have this variable set this way:
export ENVVAR=VALUE
Envrionment variable | Who sets | Purpose | Example values |
---|---|---|---|
FSLSUB_JOBID_VAR | fsl_sub | Variable name of Grid job id | JOB_ID |
FSLSUB_ARRAYTASKID_VAR | fsl_sub | Variable name of Grid task id | SGE_TASK_ID |
FSLSUB_ARRAYSTARTID_VAR | fsl_sub | Variable name of Grid first task id | SGE_TASK_FIRST |
FSLSUB_ARRAYENDID_VAR | fsl_sub | Variable name of Grid last task id | SGE_TASK_LAST |
FSLSUB_ARRAYSTEPSIZE_VAR | fsl_sub | Variable name of Grid step between task ids | SGE_TASK_STEPSIZE |
FSLSUB_ARRAYCOUNT_VAR | fsl_sub | Variable name of Grid number of tasks in array | Not supported in Grid Engine |
FSLSUB_MEMORY_REQUIRED | You | Advise fsl_sub of expected memory required | 32G |
FSLSUB_PROJECT | You | Name of Grid project to run jobs under | MyProject |
FSLSUB_PARALLEL | You/fsl_sub | Control array task parallelism when running without a cluster engine (e.g. when a queued task itself submits an array task) | 4 (for four threads), 0 to let fsl_sub's shell plugin use all available cores |
FSLSUB_CONF | You | Provides the path to the configuration file | /usr/local/etc/fslsub_conf.yml |
FSLSUB_NSLOTS | fsl_sub | Variable name of Grid allocated slots | NSLOTS |
FSLSUB_DEBUG | You/fsl_sub | Enable debugging in child fsl_sub | 1 |
FSLSUB_PLUGINPATH | You | Where to find installed plugins (do not change this variable) | /path/to/folder |
FSLSUB_NOTIMELIMIT | You | Disable notification of job time to the cluster | 1 |
Where a FSLSUB_* variable is a reference to another variable you need to read the content of the referred to variable. This can be achieved as follows:
BASH: the number of slots is equal to ${!FSLSUB_VARIABLE}
Python:
import os
value = os.environ[os.environ['FSLSUB_VARIABLE']]
NSLOT_VAR = getenv('FSLSUB_VARIABLE')
N = getenv(NSLOT_VAR)
Other potentially useful submission options or techniques
Capturing job submission information
fsl_sub can store the commands used to submit the job if you provide the option --keep_jobscript. When provided, post submission you will find a file in the current folder (assuming you have write permissions there) a script called wrapper-<jobid>.sh. This exact submission may be repeated by using:
fsl_sub -F wrapper-<jobid>.sh
The script contents is described below:
#!/bin/bash | Run the script in BASH |
#SBATCH OPTION | SLURM options |
#SBATCH OPTION | |
module load <module name> | Load a Shell Module |
# Built by fsl_sub v.2.3.0 and fsl_sub_plugin_sge v.1.3.0 | Version of fsl_sub and plugin that submitted the job |
# Command line: <command line> | Command line that invoked fsl_sub |
# Submission time (H:M:S DD/MM/YYYY) <date/time> | Date and time that the job was submitted |
<command> |
<command> |
PASSING ENVIRONMENT VARIABLES TO QUEUED JOBS
It is not possible to inherit all the environment variables from the shell that submits a job, so fsl_sub allows you to specify environment variables that should be transferred to the job. This can also be useful if you are scheduling many similar tasks and need to specify a different value for an environment variable for each run, for example SUBJECTS_DIR which FreeSurfer uses to specify where your data sets reside. The --export option is used for this purpose.
SKIPPING COMMAND VALIDATION
By default fsl_sub will check the command given (or the commands in the lines in an array task file) can be found and are executable. If this causes issues, often because a particular program is only available on the compute nodes, not on the submission host, then you can disable this check with -n (--novalidation).
Requesting a specific resource
Some resources may have a limited quantity available for use, e.g. software licenses or RAM. fsl_sub has the ability to request these resources from the cluster (the --coprocessor options do this to automatically to request the appropriate number of GPUs). The option -r (--resource) allows you to pass a resource string directly through to the Grid Engine software. If you need to do this you will be advised by the computing help team or software documentation the exact string to pass.
How to submit pipeline stages such that they wait for their predecessor to complete
If you have a multi-stage task to run, you can submit the jobs all at once, specifying that later stages must wait for the previous task to complete. This is achieved by providing the '-j' (or --jobhold) option with the job id of the task to wait for. For example:
jid=$(fsl_sub -R 3 -T 16 ./my_first_stage) fsl_sub -R 1 -T 8 -j $jid ./my_second_stage
Note the $() surrounding the first fsl_sub command, this captures the output of a command and stores the text in the variable 'jid'. This is then passed as the job id to wait for before running 'my_second_stage'.
It is also possible to submit array holds with the --array_hold command which takes the job id of the predecessor array task. This can only be used when both the first and subsequent job are both array tasks of the same size (same number of sub-tasks) and each sub-task in the second array depends only on the equivalent sub-task in the first array.
How to submit independent 'clone' tasks for running in parallel
An array task is a set of closely related tasks that do not rely on the output of any other members of the set of jobs. An example might be where you need to process each slice of a brain volume but there is no need to know or effect the content of any other slice (the array tasks can't communicate with each other to advise of changes to data). These tasks allow you to submit large numbers of discrete jobs and manage them under one job id, with each sub-task being allocated a unique task id and potentially able to run in parallel given enough compute slot availability.
You can submit an array task with the -t/--array_task option or with the --array_native option:
TEXT FILE ARRAY TASKS
The -t (or --array_task) option needs the name of a text file that contains the array task commands, one per line. Sub-tasks will be generated from these lines, with the task ID being equivalent to the line number in the file (starting from 1). e.g.
fsl_sub -R 12 -T 8 -t ./myparalleljobs
The array task has a parent job id which can be used to control/delete all of the sub-tasks, the sub-tasks may be specified as job id:sub-task id, eg ''12345:10'' for sub-task 10 of job 12345.
NATIVE ARRAY TASKS
The --array_task option requires an argument n[-m[:s]] which specifies the array:
- n provided alone will run the command n-times in parallel
- n-m will run the command once for each number in the range with task ids equal to the position in this range
- n-m:s similarly, but with s specifying the increment in task id.
The cluster software will set environment variables that the script/binary can use to determine what task they need to carry out. For example, this might be used to represent the brain volume slice to process. As these environment variables differ between different cluster software, fsl_sub sets several environment variables to the name of the environment variable the script can use to obtain it's task id from the cluster software:
Envrionment variable | ...points to variable containing |
---|---|
FSLSUB_JOBID_VAR | job id |
FSLSUB_ARRAYTASKID_VAR | task id |
FSLSUB_ARRAYSTARTID_VAR | first task id |
FSLSUB_ARRAYENDID_VAR | last task id |
FSLSUB_ARRAYSTEPSIZE_VAR | step between task ids |
FSLSUB_ARRAYCOUNT_VAR | number of tasks in array (not supported in Grid Engine) |
To use these you need to look up the variable name and then read the value from the variable, for example in BASH use ${!FSLSUB_ARRAYTASKID_VAR} to get the value of the task id.
Important The tasks must be truly independent - ie, they must not write to the same file(s) or rely on calculations in other array jobs in this set otherwise you may get unpredictable results (or sub-tasks may crash).
LIMITING CONCURRENT ARRAY TASKS
Sometimes it may be necessary to limit the number of array sub-tasks runnning at any one time. You can do this by providing the -x (or --array_limit) option which takes a integer, e.g.:
fsl_sub -T10 -x 10 -t ./myparalleljobs
Will limit sub-tasks to ten running at any one time.
ARRAY TASKS WITH THE SHELL RUNNER
If running without a cluster backend or when fsl_sub is called from within an already scheduled task, the shell backend is capable of running array tasks in parallel. If running as a cluster job, the shell plugin will run no more than the number of threads selected in your parallel environment (if one is specified, default is one task at a time).
If you are not running on a cluster then by default fsl_sub will use all of the CPUs on your system. You can control this either using the -x|--array-limit option or by setting the environment variable FSLSUB_PARALLEL to the maximum number of array tasks to run at once. It is also possible to configure this in your own personal fsl_sub configuration file (see below).