CSL: Sun Grid Engine at Duke Computer Science

The Sun Grid Engine (SGE) system manages the department batch queue. Grid Engine runs jobs on the departmental and research compute nodes.

The CS SGE Engine setup organizes compute resources into two queues.

Two additional queues exist to hold computers owned by specific research groups.

Jobs queued in compsci and research are low-priority jobs in SGE parlance. Low-priority jobs have the advantage that they can run on the nodes owned by research groups, such as the architecture group of the Donald Lab. This means low-priority jobs have the largest pool of potential machines to run on. However, if a high-priority job is submitted when all resources are utilized, a low-priority job will be slowed down by 95% to give the high priority job 95% of the CPU.

For the basics of Grid Engine operation, please see the following links

Job scripts

All jobs submitted to Grid Engine must be shell scripts. Grid Engine will scan the script text for qsub option flags. The same flags can be on the qsub command or embedded in the script. Lines in the script beginning with #$ will be interpretted as containing qsub flags.

The following job runs the program hostname. The script passes gridengine the -cwd flag to run the job in current working directory when qsub was executed. This is the equivalent of running: qsub -cwd job.sh.

#!/bin/sh
#$ -cwd 

hostname

Examples

List running jobs
qstat
List jobs belonging to a user
qstat -u user
List running jobs and MPI slaves
qstat -g t
List compute nodes
qhost
Show a node's SGE resource attributes
qhost -F -h linux1
Submit a job
qsub job.sh
Direct a job to a queue
qsub -q compsci job.sh
Direct a job to a node
qsub -q compsci@linux1 job.sh
Delete a job
qdel -j [job number from qstat]

Here is a sample of mpich2 on Grid Engine. This script will run in the grisman_mpich parallel environement with 2 slave processes.

#!/bin/csh -f
# ---------------------------
# job name 
#$ -N MPI_Job
#
# pe request
#$ -pe grisman_mpich2 2
#
# Operate in current working directory
#$ -cwd
#
# ---------------------------

export MPIEXEC_RSH=/usr/bin/rsh

mpiexec -rsh -nopm -n $NSLOTS -machinefile $TMPDIR/machines my_mpiprogram