Tags

Compute Farms @ CLASSE

  • CLASSE has 20+ years of experience with batch queuing systems.
  • Initially used for high-energy physics:
    • CLASSE was host laboratory for the CLEO Collaboration:
      • 1979-2012, 200+ individuals / 20+ institutions (peak).
    • Early 1990's: 200+ node Solaris farm for simulations and data analysis.
      • Decommissioned in 2013-14.
  • Currently, 60-node Linux compute farm with approximately 400 cores.
    • Single threaded jobs
    • MPI parallel jobs
    • Multi-process or multi-node jobs
    • Interactive graphical jobs
    • GPU jobs (CUDA)
    • Used for:
      • Electron cloud, photocathode, SRF simulations
      • Theorists: parallel mathematica jobs, etc.

Batch Queuing Basics

  • Cluster of high-performance compute nodes:
    • In general, farm nodes are faster and have more memory than contemporaneous desktops.
    • All nodes run 64-bit Scientific Linux 7 (SL7).
  • Job scheduling software (Son of Grid Engine) provides equitable access.
    • Avoids resource contention.
    • Ensures jobs are executed on nodes with adequate resources.
  • Compute nodes are logically identical to all other CLASSE SL7 desktops/servers:
    • Same operating system and software stack.
    • Same access to all centralized resources (file systems, users/groups, environments, etc.).
    • Code developed on any CLASSE SL7 system can run on all other CLASSE SL7 systems.
  • Documentation: https://wiki.classe.cornell.edu/Computing/GridEngine.
  • A powerful tool, especially when coupled with 500+ TB of central disk storage.

Configuration

  • Queues, projects, and limits created and tuned as necessary.
  • Current settings:
    • Maximum of 60 simultaneously running jobs per user
      • Unlimited number of queued jobs
    • 48-hour wall clock time limit
    • Maximum of 24GB memory per batch job
    • Maximum of 64GB memory per interactive job
  • Numerous options for job submission, such as:
    • Memory requirements
    • Output locations
    • Email notifications
    • Nodes to use
    • Etc.

Current Hardware

  • Most recent deployments in compute farm:
    • Four IBM x3550 M4's with two 6-core 2.30GHz Xeon E5-2630's and 128GB DDR3 (left).
    • The IBM Flex System Enterprise (right):
      • Very flexible node configuration: up to four processors per node, flexible memory configuration, GPU's, 40Gb upgrades, etc.
x3550M4.jpg 0.18B0.jpeg

Grid Engine Demonstration

See GridEngine.
  • How to submit standard shell scripts (qsub).
  • How to create custom grid engine scripts that specify memory and CPU requirements, specify output directory, etc.
  • How to submit parallel jobs, and explanation of what parallel jobs are.
  • Submit simple batch job, seeing it in the queue, and then receiving email of results.
  • Submit interactive job (qrsh), for example Matlab benchmark.

Sample qsub Script

# Set script linux shell - "bash" is recommended
#$ -S /bin/bash

# Name of queued job and output files
#$ -N regression_tests_demo

# Send successful job completion message to this email address
#$ -m e -M defalco@cornell.edu

# To make sure that the .e and .o file arrive in the working directory
#$ -cwd

# Put farm node name and start timestamp in log file 
echo -e "\nOn $HOSTNAME, Starting at: " `date` "\n"

# Initalize your runtime environment
. /nfs/acc/libs/cesr/cesr_online.bashrc

# Move into directory to run the executable, if necessary
cd /nfs/acc/user/amd275/sge_demo/regression_tests

# Executable to run
./scripts/run_tests.py 

# Put farm node name and end timestamp in log file 
echo "On $HOSTNAME, Done at: " `date`

Job Submission Tips & Guidelines

  • General purpose login node lnx201.classe.cornell.edu
    • Log in with your CLASSE credentials and submit jobs.
  • Output and error logs are written to your home directory.
    • Contains name of node where the job ran.
  • Do not SSH into farm nodes directly.
    • Diverts resources from legitimately queued jobs.
    • Instead, launch interactive session through queuing system (qrsh).
    • For a specific node (e.g. to check CPU/memory usage): qrsh -q all.q@lnx326
  • For I/O bound processes:
    • Write temporary files to /tmp (local to each compute node) to avoid network latency.
    • At end of job, copy or rsync files to central storage.
    • Files in /tmp are automatically cleaned up periodically.

Other Recent Improvements (2018)

  • Updated to latest Son of Grid Engine scheduler.
    • Improved checkpointing capabilities
    • Improved intelligence in job scheduling (CPU speed, etc.) and prioritization
  • Enabling scheduling of GPU units.
  • Enabling full-desktop interactive jobs (X2Go).
  • New compute nodes.
  • Upgrading trailer (farm subnet) connection to 40Gb.
  • Upgrade to 10Gb low-latency interconnects
  • Always something new!

This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding CLASSE Wiki? Send feedback