How to use the queueing system on the Alphas
NOTE: this documentation is obsolete. DQS (aka
GridEngine) is no longer available on DEC/Alpha/Tru64 and Sun/Solaris computers at CLASSE. Those hardware architectures are no longer available as computing facilities. As of 2013, information about the use of
GridEngine for running jobs on the CLASSE Intel-architecture Linux compute farm is available at
https://wiki.classe.cornell.edu/Computing/GridEngine
The Distributed Queueing System
The queueing system for the CLEO Analysis Facility (CAF) is arranged following a resource-allocation model, with batch queue groups controlling access to the various resources. A job should be submitted to the queue group named after the resource that the job requires, where the required resource may be a ROAR data set or a tape drive.
Jobs that require tape drives should always be submitted to the appropriate tape queue, no matter what disk data set is being used. Tape jobs must use the "utb" facility to request the appropriate tape mount(s) by the batch operator. Examples of how to use the mount request commands
mntdlt
and
mnt4mt
are shown below.
Changes in the current version of DQS
This is a major change since versions 2 that we were using for the past few years. Most of new features tend to be somewhat invisible to the DQS users. One of the limitations in the past was that one can not pass environment variables to the batch job, this has changed. The second is the group concept that we used to classify the resource type of machines, this is now changed to resource list which give us finer control of what the machine is for. The third major one is the implementation of cluster, or cell, as described in its man pages. Each cell consists of its own queuing master and nodes that it manages, with its own separate accounting and scheduling policy.
You can now pass all your current environment variables with the -V switch, either on the command line of in your script. If only a few of the variables is needed, use -v follow by the list of named variables. The downside is that now you have to define some cleo related variables like using cleodef in your script if you are not using any of the -V or -v switches.
The group assignment that we used to select the nodes is now replaced by resource list, e.g. the roar data set group 4sd-g is broken down to individual resource 4sD, 4sE, 4sF and 4sG. All entries in the directory listing from /cdat/roar/list are considered as available resources. A job needing a dlt drive should be using the resource specifier "-l dlt" as in the following example. To access the files from the tape staging facility, currently consists of lnsmc1 and lnsmc2, you can use the shorthand resource name mc1 or mc2, and check the file listing from /cdat/roar/list/mc{1,2}, which Ken Mclean is maintainning.
In addition to CAF, DAF, a third cell, SPF (Service Provider Facility/Special Project Farm) is created for people running service tasks so it won't penalize their normal analysis jobs. You can pass the job info from CAF to SPF with "-cell spf" either on the command line or inside the srcipt.
Qstat shows only the running and pending jobs, if you want to see the full listing of the available queues, use the -f switch. Some people had problem of job disappearing without any output using the -cwd switch, if you had similar encounter,, avoid using it.
A Sample Batch ROAR Job
A script for a simple ROAR job might look like this:
#! /bin/ksh
#$ -l 4s7
/cleo/clib/bin/rchk << EoD
datad in /cdat/roar/datset/4s7_1/p2_54032-54042_4s7_1_rh.rp
datad in /cdat/roar/datset/4s7_1/p2_54045-54055_4s7_1_rh.rp
go exit
EoD
If you were to call this file
"myscript"
, then the command
% qsub myscript
would send the script to a node that has the resource of 4s7 data set.
Notice that the second line begins with the characters "#$"--this indicates that the line contains embedded flags for the queueing system. If your script is written for a shell other than
/bin/sh
, you will need to add "-S /bin/csh" to your command line or "#$ -S /bin/csh" to your script.
The CPU time limit for most of the queues is set to 12 hours equivalent on the slowest node, with the assumption that it would be enough for most people to run through the entire data set or tape. Most of the resource list can be found by qconf -sc rl_nodename, for example.
A Sample Batch Job Using DLT (or 4mm) Tapes
There are two simple commands to deal with tape allocation and mounting which can save you (and the batch operator) from some common pitfalls. The following is an example script that requests the DLT tape labelled DLT123:
#$ -S /bin/sh
#$ -l dlt
. cleodefs
#
# send mount request (mntdlt or mnt4mt), exit if it fails.
# Then get the tape device name via utbdrv.
#
mntdlt DLT123 || exit 1 # exit 1 is executed if mount fails
tape_dev=`utbdrv -d` # -d = get drive name
my_roar_job <<-EOD
datatape input $tape_dev
go
exit
EOD
mt -f $tape_dev offline
exit
If you require a 4mm tape drive, replace "-l dlt" with "-l 4mt", and change
mntdlt
to
mnt4mt
.
A Sample Batch Job Using One DLT and One 4mm Tape
This slightly more complicated example uses two different tape drives, one DLT and one 4mm. This example reads
fzx
files from the DLT tape and writes selected
rp
files to the 4mm tape.
#$ -S /bin/ksh
#$ -l dlt,4mt
#
#
# An example batch job that requires both dlt and 4mt tape
# ( how to use the commands mntdlt, mnt4mt and utbdrv )
#
TAPINP=DLT123 # input dlt tape
TAPOUT=CRN246 # output 4mt tape
RPFILE=/cdat/stm/${USER}/${TAPINP}.rp # intermediate rp file
mntdlt $TAPINP || exit 1 # mount a dlt tape , exit if failed
export TAPE=`utbdrv -d` # get drv name and set variable TAPE
/cleo/clib/bin/clever <<-! # a typical clever job
stream/def input data $TAPE bit
act input
stream/def output data $RPFILE oc
act output
anal 0
exit
!
mt offline # unload tape w/ $TAPE defined
mnt4mt -w $TAPOUT || exit 1 # mount a writable(-w) 4mt
export TAPE=`utbdrv -d` # reset TAPE env variable
dd if=$RPFILE of=$TAPE bs=32400 # a typical spool job
mt offline # unload the 2nd tape
rm $RPFILE
exit
See the man pages for
qsub
,
qdel
,
qstat
,
qconf
and
utb
for details on those commands.