How to use the queueing system on the Alphas

NOTE: this documentation is obsolete. DQS (aka GridEngine) is no longer available on DEC/Alpha/Tru64 and Sun/Solaris computers at CLASSE. Those hardware architectures are no longer available as computing facilities. As of 2013, information about the use of GridEngine for running jobs on the CLASSE Intel-architecture Linux compute farm is available at https://wiki.classe.cornell.edu/Computing/GridEngine

The Distributed Queueing System

The queueing system for the CLEO Analysis Facility (CAF) is arranged following a resource-allocation model, with batch queue groups controlling access to the various resources. A job should be submitted to the queue group named after the resource that the job requires, where the required resource may be a ROAR data set or a tape drive.

Jobs that require tape drives should always be submitted to the appropriate tape queue, no matter what disk data set is being used. Tape jobs must use the "utb" facility to request the appropriate tape mount(s) by the batch operator. Examples of how to use the mount request commands mntdlt and mnt4mt are shown below.

Changes in the current version of DQS

This is a major change since versions 2 that we were using for the past few years. Most of new features tend to be somewhat invisible to the DQS users. One of the limitations in the past was that one can not pass environment variables to the batch job, this has changed. The second is the group concept that we used to classify the resource type of machines, this is now changed to resource list which give us finer control of what the machine is for. The third major one is the implementation of cluster, or cell, as described in its man pages. Each cell consists of its own queuing master and nodes that it manages, with its own separate accounting and scheduling policy.

You can now pass all your current environment variables with the -V switch, either on the command line of in your script. If only a few of the variables is needed, use -v follow by the list of named variables. The downside is that now you have to define some cleo related variables like using cleodef in your script if you are not using any of the -V or -v switches.

The group assignment that we used to select the nodes is now replaced by resource list, e.g. the roar data set group 4sd-g is broken down to individual resource 4sD, 4sE, 4sF and 4sG. All entries in the directory listing from /cdat/roar/list are considered as available resources. A job needing a dlt drive should be using the resource specifier "-l dlt" as in the following example. To access the files from the tape staging facility, currently consists of lnsmc1 and lnsmc2, you can use the shorthand resource name mc1 or mc2, and check the file listing from /cdat/roar/list/mc{1,2}, which Ken Mclean is maintainning.

In addition to CAF, DAF, a third cell, SPF (Service Provider Facility/Special Project Farm) is created for people running service tasks so it won't penalize their normal analysis jobs. You can pass the job info from CAF to SPF with "-cell spf" either on the command line or inside the srcipt.

Qstat shows only the running and pending jobs, if you want to see the full listing of the available queues, use the -f switch. Some people had problem of job disappearing without any output using the -cwd switch, if you had similar encounter,, avoid using it.

A Sample Batch ROAR Job

A script for a simple ROAR job might look like this:

#! /bin/ksh
#$ -l 4s7
/cleo/clib/bin/rchk << EoD
datad in /cdat/roar/datset/4s7_1/p2_54032-54042_4s7_1_rh.rp
datad in /cdat/roar/datset/4s7_1/p2_54045-54055_4s7_1_rh.rp
go exit

If you were to call this file "myscript", then the command
        % qsub myscript

would send the script to a node that has the resource of 4s7 data set.

Notice that the second line begins with the characters "#$"--this indicates that the line contains embedded flags for the queueing system. If your script is written for a shell other than /bin/sh, you will need to add "-S /bin/csh" to your command line or "#$ -S /bin/csh" to your script.

The CPU time limit for most of the queues is set to 12 hours equivalent on the slowest node, with the assumption that it would be enough for most people to run through the entire data set or tape. Most of the resource list can be found by qconf -sc rl_nodename, for example.

A Sample Batch Job Using DLT (or 4mm) Tapes

There are two simple commands to deal with tape allocation and mounting which can save you (and the batch operator) from some common pitfalls. The following is an example script that requests the DLT tape labelled DLT123:
#$ -S /bin/sh
#$ -l dlt
. cleodefs
# send mount request (mntdlt or mnt4mt), exit if it fails.
# Then get the tape device name via utbdrv.
mntdlt DLT123  || exit 1          # exit 1 is executed if mount fails
tape_dev=`utbdrv -d`              # -d = get drive name

my_roar_job <<-EOD
datatape input $tape_dev

mt -f $tape_dev offline

If you require a 4mm tape drive, replace "-l dlt" with "-l 4mt", and change mntdlt to mnt4mt.

A Sample Batch Job Using One DLT and One 4mm Tape

This slightly more complicated example uses two different tape drives, one DLT and one 4mm. This example reads fzx files from the DLT tape and writes selected rp files to the 4mm tape.

#$ -S /bin/ksh
#$ -l dlt,4mt
#  An example batch job that requires both dlt and 4mt tape
#  ( how to use the commands mntdlt, mnt4mt and utbdrv )

TAPINP=DLT123                           # input  dlt tape
TAPOUT=CRN246                           # output 4mt tape
RPFILE=/cdat/stm/${USER}/${TAPINP}.rp   # intermediate rp file

mntdlt $TAPINP || exit 1                # mount a dlt tape , exit if failed

export TAPE=`utbdrv -d`                 # get drv name and set variable TAPE

/cleo/clib/bin/clever <<-!              # a typical clever job
 stream/def input  data $TAPE  bit
 act        input
 stream/def output data $RPFILE oc
 act        output

 anal  0
mt offline                              # unload tape w/ $TAPE defined

mnt4mt -w $TAPOUT || exit 1             # mount a writable(-w) 4mt
export TAPE=`utbdrv -d`                 # reset TAPE env variable

dd if=$RPFILE of=$TAPE bs=32400         # a typical spool job
mt offline                              # unload the 2nd tape


See the man pages for qsub, qdel, qstat, qconf and utb for details on those commands.
Topic revision: r2 - 18 Jul 2013, SeldenBallJr
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding CLASSE Wiki? Send feedback