CHESS Data Storage Management
Please see CHESSUsersDataTransfer for instructions on remotely accessing and copying your data.
The CLASSE IT group maintains a redundant 10Gb High-Availability Linux Cluster to serve and secure centralized storage for CHESS. This document outlines procedures for managing and archiving this data. Please see DataStorageManagement for a higher-level overview of data storage and management at CLASSE.
There is a Cornell Lyris mailing list (firstname.lastname@example.org
) for CHESS DAQ announcements. For now, this is a closed list, so please submit a ServiceRequest
if you want to join.
All CHESS stations have been connected and (web believe) are saving data to the DAQ.
The new CHESS Cluster serves several filesystems which are available using NFS and Samba from anywhere in CLASSE. Each CHESS station and detector has a 10Gb connection to this central storage.
In addition, the new CHESS Cluster can run high-availability services that automatically migrate between cluster members in the event of a failure. For example, the Maia detector requires a binary logger "blogd" daemon that receives data from the detector and writes to disk (and in turn streams that data to clients). If the clustered service generates and writes data itself, in general there will be an underlying file system bound to that service. For example, the "blogd" daemon writes to a local file system which is available remotely at /nfs/chess/maia.
The proposed file system sizes use existing in-house infrastructure and storage. File Systems can be expanded or created as needed and funding allows (~$20K/65TB).
All raw CHESS Data will be archived twice per cycle. For details, please see #Raw_Data_Archival_and_Rotation .
Most raw CHESS data that is written over NFS or Samba will use a single "CURRENT CYCLE" directory / link. As described above, services that run in the cluster may use a separate file system dedicated to that service.
The currently available and proposed CHESS file systems include:
Please note that unless "classe.cornell.edu" is in your search path, you will need to use chesssamba.classe.cornell.edu or chessdaq.classe.cornell.edu for access from Windows.
Backend file system organization
- CHESS_DAQ = file system for most recent raw data, organized as CHESS_DAQ/year-cycle#/station/PI-proposal/
- CHESS_DAQ/current is a link to appropriate CHESS_DAQ/year-cycle#
- PREVIOUSDAQ = read-only file system for data from previous cycle, organized as PREVIOUSDAQ/year-cycle#/station/PI-proposal
- CHESS_RAW = Read-only file system and export providing persistent directory structure for clients, and restore point for archived data.
- CHESS_RAW/current is a link to CHESS_DAQ/current
- Current CHESS_RAW/year-cycle# is a link to corresponding CHESS_DAQ/year-cycle#
- Previous CHESS_RAW/year-cycle# is a link to corresponding PREVIOUSDAQ/year-cycle#
- All other CHESS_RAW/year-cycle# directories are restore points for archived data
- CHESS_AUX = auxiliary project data that needs regular backups
- CHESS_AUX/cycles/year-cycle# = auxiliary data that's related to corresponding raw data
- CHESS_USER = user and project data that needs nightly backups
- Detectors and stations that write data mount CHESS_DAQ (or appropriate CHESS_DAQ/current/station).
- All other clients mount and access data from CHESS_RAW.
For more, please see DAQClientConfiguration
Data Archival and Rotation
Throughout each run, we take nightly incremental backups of the DAQ and MAIA filesystems. These incrementlas are stored indefinitely.
At the end of each run:
- DAQ and MAIA backups will be scheduled and announced.
- the filesystems will be made read only
- one full backup of CHESS_DAQ and CHESS_MAIA will be written to tape. This will be moved offsite. The nightly incremental backups remain in the robot and effectively give the first full backup.
Before each run:
- CHESS Data Archival and Rotation will be scheduled and announced
- this will include which directories will be removed from disk. Unless we hear otherwise, we will remove all of the data (not the top-level directories and directory listings) in previousdaq, previousmaia, raw, and rawmaia. Use, for example
tree -L 2 -d /mnt/raw to identify directories to be removed.
- Everything in PREVIOUSDAQ and PREVIOUSMAIA will be deleted from disk.
- A directory listing of everything removed from disk will be created and stored. For example, CHESS_RAW/2013-1/directorylisting-2014-01-01.txt will list everything that was in the 2013-1 cycle before it was deleted on 2014-01-01.
- Everything in CHESS_DAQ will move to PREVIOUSDAQ
- Everything in CHESS_MAIA will move to PREVIOUSMAIA
- The root directory structure will be created in CHESS_DAQ
- Links in CHESS_RAW will be updated to point to any moved files.