Job submission - genotoul-bioinfo-UPDATE

SLURM Cluster

Training slides

Waiting for training slides, please use these documents :

Which scheduler is used ?

SLURM workload Manager : https://slurm.schedmd.com

Which commands can I use to submit my job ?

BATCH

sbatch: submit a batch job to slurm (default workq partition).
sarray: submit a batch job-array to slurm.

INTERACTIVE

srun --pty bash : submit an interactive session with a compute node (default workq partition).

INTERACTIVE with X11 forwarding

For the first time, create your public key as below (onto genologin server)

$ ssh-keygen (and "Enter" for all question)

$cat .ssh/id_rsa.pub >> .ssh/authorized_keys

srun --x11 --pty bash : submit an interactive session with X11 forwarding (default workq partition).

INTERACTIVE with graphical mode

runVisuSession.sh: submit a TurboVNC / VirtualGL session with the graphical node (interq partition). Just for graphics jobs.

INTERACTIVE inside a batch job

srun --pty --jobid jobid bash: convenient to follow a batch job (connection on the node where the batch is running

Basic parameters for srun command

srun
-J job name -> for change the jobname
-p partition -> which partition(~ queue) to use
--time=HH:MM:SS -> max time of the job

-o (--output) = output_filename : to specify the stdout redirection. If -e (--error) is not specified both stdout ans stderr will be directed to the file name specified.
-e (--error) = error_filename : if specified, stderr will be redirected to different location as stdout

Default job resources

Without any parameter, on any partition, each job is limited to 1 cpu, 2G ram (cpus-per-task=1, mem=2G)

How can I submit a simple job on the cluster ?

1 - First write a script (ex: myscript.sh) with the command line as following:

#!/bin/bash #SBATCH -J test #SBATCH -o output.out #SBATCH -e error.out #SBATCH -t 01:00:00 #SBATCH --mem=8G #SBATCH --mail-type=BEGIN,END,FAIL (the email address is automatically LDAP account's one)#Purge any previous modules module purge #Load the application module load bioinfo/ncbi-blast-2.2.29+ # My command lines I want to run on the cluster blastall ...

2 - To submit the job, use the sbatch command line as following:

sbatch myscript.sh

How to book more memory than default (2G) ?

To change memory reservation, add this option to the submission command (sbatch, srun, sarray):

--mem=XG (default value is 2G)

How can I book more than 1 cpu ?

With default parameters, each job is limited to 1 cpu.
To book more, use the following options:

# Book n cpus on the same node (up to 64)
-c ncpus (--cpus-per-task=ncpus)

# Book n cpus on any nodes in case of MPI jobs
-N nnodes (--nodes=nnodes)

-n ntasks (--ntasks=ntasks)

--ntasks_per_node

Which are the available queues/partitions ?

Each job is submitted to a specific partition (the default one is the workq).
Each partition has a different priority considering the maximum time of execution allowed.

The partition is configurable with -p option.

Queue	Access	Priority	Max time	Max slots
workq	everyone	100	4 days (96h)	3072
unlimitq	everyone	1	180 days	500
interq (runVisuSession.sh)	on demand		2 days (48h)	32
smpq	on demand		180 days	96
wflowq	specific software		180 days	3072

Submit an array of jobs ?

To submit an array of jobs, use sarray command (same sbatch options):

1. create a file with one command per line (prefix by module load if needed)

eg : file star_cmd.txt contains:

module load bioinfo/STAR-2.6.0c; STAR -genomeDir referenceModel --readFilesIn ech1.R1.fastq ech1.R2.fastq ...
module load bioinfo/STAR-2.6.0c; STAR -genomeDir referenceModel --readFilesIn ech2.R1.fastq ech2.R2.fastq ...
module load bioinfo/STAR-2.6.0c; STAR -genomeDir referenceModel --readFilesIn ech3.R1.fastq ech3.R2.fastq ...

2. launch sarray with sbatch option

sarray -J jobName -o %j.out -e %j.err -t 01:00:00 --mem=8G --mail-type=BEGIN,END,FAIL star_cmd.txt

For more information about how to create the command file go to FAQ Bioinfo tips ("How to generate an sarray command file with ....")

What is my CPU time quota ?

To know your quota, use the command:

squota_cpu

Academic account quota: 100 000 h/per calendar year
Beyond these 100,000 hours, you will need to submit a science project (by the resources request form) to estimate the real needs of the bioinformatics environment.

According to results from this evaluation, but also their geographical and institutional origin, users can then either continue their treatments or be invited to contribute financially to infrastructure, or be redirected to regional or national mésocentres calculation.

Non-academic account quota: 500 h/per calendar year for testing the infrastructure.
Overtime calculation will be charged (price on request).

How can I know my quota usage on /work directory ?

Use the following command line (on genologin server):

mmlsquota -u username --block-size G

How can I submit a MPI job ?

Example of a full bash script :
#!/bin/bash #SBATCH -J mpi_job #SBATCH --nodes=2 #SBATCH --tasks-per-node=6 #SBATCH --time=00:10:00 cd $SLURM_SUBMIT_DIR module purge module load compiler/intel-2018.0.128 mpi/openmpi-1.8.8-intel2018.0.128 mpirun -n $SLURM_NTASKS --map-by ppr:$SLURM_NTASKS_PER_NODE:node ./hello_world

How can I monitor a running job ?

To do so, you can use the squeue command, following are some usefull options:

squeue -u username : list only the specified user's jobs.
squeue -j job_id : provide several informations on the specified job.

(see squeue --help or man squeue for more options)

For more detail:

scontrol show job job_id

You can also have access to a graphical user interface which provides the same informations.
This interface is accessible with the sview command.

How to use srun to check running jobs?

The srun command can be used to check in on a running job in the cluster.

srun --pty --jobid= bash : starts a shell, where you can run any command, on the first allocated node in a specific job.

To check processor and memory usage quickly, you can run top directly:

srun --pty --jobid= top -u login

How can I retrieve informations on a finished job ?

To do so, use the sacct command line as following:

sacct -j job_id

(see sacct --help or man sacct for more options)

How can I kill my job ?

To do so, you can use the scancel command, following are some usefull options:

# Kill the specified job
scancel job_id

# Kill all job launched by the specified user
scancel -u username

How to specify a processor architecture ?

Add the following flag to address the ivy or broadwell processor architecture :

-C,
--constraint=ivy
(old compute nodes 001 to 068)

-C,
--constraint=broadwell
(new computes nodes 101 to 148)

How to use visualization node?

Prerequisites

-> Request access to the visualization node here.
-> The turboVnc software must be installed on your workstation. You do not need the entire package, only the client part (vncviewer). You can download it here.

How to connect?

Connect on front-end server:

ssh username@genologin.toulouse.inra.fr

Then, on front-end server:

$ runVisuSession.sh

Desktop 'TurboVNC' started on display genoview:1

Starting applications specified in /tools/bin/node/xstartup.mate
Log file is /home//.vnc/genoview:1.log

=================================================================
+ VNC Session name: TurboVNC
+ Your TurboVNC session is available on genoview.toulouse.inra.fr
+
+ CAUTION: If you close this interactive SLURM job,
+ your VNC session will be destroyed !
+
+ Please use your LDAP login and password to authenticate
+
+ Connection summary
+ ------------------
+ VNC_URL : genoview.toulouse.inra.fr:1
+ HTTP_URL : http://genoview.toulouse.inra.fr:5801
+ Authentication : LDAP
=================================================================

Access the visualization node

Then, you can acces to the visualization node using:

- your web browser with the HTTP_URL: http://genoview.toulouse.inra.fr:5801
- a vnc client with VNC_URL : genoview.toulouse.inra.fr:1

(on Windows, run vncviewer-java.bat)

Ask for more ressources (cpus, mem)

On front-end server, replace the command "runVisuSession.sh" by the following command (example for cpus=4 and mem=16G):

$ srun --partition=interq -c 4 --mem=16gb --job-name=TurboVNC /tools/bin/node/runVisuSession_node.sh 1024x768 mate

How to set my cpu and memory pre-allocation?

On a test job in COMPLETED state, check the result of the seff command:

seff jobid

and adjust time with -t , memory with --mem et cpu with --cpus-per-task options.

How can I get my jobs processed faster on the cluster?

The smaller a job, the faster it is processed. Set your different pre-allocations (time, memory and cpu) as closely as possible to your needs.

On a test job in COMPLETED state, check the result of the seff command:

seff jobid

and adjust time with -t , memory with --mem et cpu with --cpus-per-task options.

SGE to SLURM

Useful commands

SGE	SLURM	Comments
qsub script.sh	sbatch script.sh	sbatch is only for script
qsub -l mem=XG -l h_vmem=YG -b y	srun --mem=YG	No h_vmem parameters with Slurm.
qsub -m bea	sbatch/srun --mail-type=BEGIN,END,FAIL	Notify user by email when certain event types occur.
qsub -b y "command"	sbatch --wrap="command"	submit command line
qsub -sync y "command"	srun "command"	submit a job in real time
qsub -pe parallel_smp 8	sbatch/srun -c 8 (--cpus-per-task=8)	by default, job are on one node (-N=1 <--> --nodes=nnodes)
qsub -pe parallel_fill n or qsub -pe parallel_rr n	sbatch/srun -N=nnodes (--nodes=nnodes) -n=ntasks (--ntasks=ntasks) -c=ncpus (--cpus-per-task)ncpus	No parallel environnement with slurm
qstat -u login	squeue -u login	See all your submitted jobs
qstat -j job_id	scontrol show job job_id	Running job details
qacct -j job_id	sacct --unit=G --format JobID,jobname,NTasks,nodelist,CPUTime,ReqMem,MaxVMSize,Elapsed -j job_id	Finished job details.
qquota_cpu login	squota_cpu	See your CPU time quota
qdel -j job_id	scancel job_id	Kill a job
qrsh	srun --pty bash	Interactive job
qlogin	srun --x11 --pty bash	Interactive jgraphical ob with X11 forwarding