CHESS Data Storage Management
Please see CHESSUsersDataTransfer for instructions on remotely accessing and copying your data.
The CLASSE IT group maintains a redundant 10Gb High-Availability Linux Cluster to serve and secure centralized storage for CHESS. This document outlines procedures for storing, managing, and archiving this data. Please see DataStorageManagement for a higher-level overview of data storage and management at CLASSE.
Introduction
Mailing List
There is a Cornell Lyris mailing list (
chess-daqadmin-l@cornell.edu) for CHESS DAQ announcements.
Cluster Overview
The CHESS DAQ Cluster serves several filesystems which are available using NFS and Samba from anywhere in CLASSE. Each CHESS station and detector has a dedicated 10Gb connection to this central storage.
In addition, the new CHESS Cluster can run high-availability services that automatically migrate between cluster members in the event of a failure. For example, the Maia detector requires a binary logger "blogd" daemon that receives data from the detector and writes to disk (and in turn streams that data to clients). If the clustered service generates and writes data itself, in general there will be an underlying file system bound to that service. For example, the "blogd" daemon writes to a local file system which is available remotely at /nfs/chess/maia.
File Systems
Overview
We have five tiers or classes of CHESS Data, spread out through the following filesystems.
During each rotation, automatic creation of:
- Per-cycle directories (above) with parallel directory structure as CHESS_DAQ.
- Plus /nfs/chess/aux/cycles to contain softlinks to raw/metadata/reduced/scratch directories for each BTR
Current sizes and allocations.
File Systems can be expanded or created as needed and funding allows (~$23K/280TB).
The currently available CHESS file systems include:
Please note that unless "classe.cornell.edu" is in your search path, you will need to use chesssamba.classe.cornell.edu or chessdaq.classe.cornell.edu for access from Windows.
Backend file system organization
- CHESS_DAQ = file system for most recent raw data, organized as CHESS_DAQ/year-cycle#/station/PI-proposal/
- CHESS_DAQ/current is a link to appropriate CHESS_DAQ/year-cycle#
- PREVIOUSDAQ = read-only file system for data from previous cycle, organized as PREVIOUSDAQ/year-cycle#/station/PI-proposal
- CHESS_RAW = Read-only file system and export providing persistent directory structure for clients, and restore point for archived data.
- CHESS_RAW/current is a link to CHESS_DAQ/current
- Current CHESS_RAW/year-cycle# is a link to corresponding CHESS_DAQ/year-cycle#
- Previous CHESS_RAW/year-cycle# is a link to corresponding PREVIOUSDAQ/year-cycle#
- All other CHESS_RAW/year-cycle# directories are restore points for archived data
- CHESS_AUX = auxiliary project data that needs regular backups
- CHESS_AUX/cycles/year-cycle# = auxiliary data that's related to corresponding raw data
- CHESS_USER = user and project data that needs nightly backups
Client Configuration
- Detectors and stations that write data mount CHESS_DAQ (or appropriate CHESS_DAQ/current/station).
- All other clients mount and access data from CHESS_RAW.
For more, please see
DAQClientConfiguration .
Data Archival and Rotation
Throughout each run, we take nightly incremental backups of the DAQ and MAIA filesystems. These incrementlas are stored indefinitely.
At the end of each run:
- DAQ and MAIA backups will be scheduled and announced.
- the filesystems will be made read only
- one full backup of CHESS_DAQ and CHESS_MAIA will be written to tape. This will be moved offsite. The nightly incremental backups remain in the robot and effectively give the first full backup.
Before each run:
- CHESS Data Archival and Rotation will be scheduled and announced
- this will include which directories will be removed from disk. Unless we hear otherwise, we will remove all of the data (not the top-level directories and directory listings) in previousdaq, previousmaia, raw, and rawmaia. Use, for example
tree -L 2 -d /mnt/raw
to identify directories to be removed.
- Everything in PREVIOUSDAQ and PREVIOUSMAIA will be deleted from disk.
- A directory listing of everything removed from disk will be created and stored. For example, CHESS_RAW/2013-1/directorylisting-2014-01-01.txt will list everything that was in the 2013-1 cycle before it was deleted on 2014-01-01.
- Everything in CHESS_DAQ will move to PREVIOUSDAQ
- Everything in CHESS_MAIA will move to PREVIOUSMAIA
- The root directory structure will be created in CHESS_DAQ
- Links in CHESS_RAW will be updated to point to any moved files.