Tags

CHESS Data Storage Management

Please see CHESSUsersDataTransfer for instructions on remotely accessing and copying your data.

The CLASSE IT group maintains a redundant 10Gb High-Availability Linux Cluster to serve and secure centralized storage for CHESS. This document outlines procedures for managing and archiving this data. Please see DataStorageManagement for a higher-level overview of data storage and management at CLASSE.

Introduction

Mailing List

There is a Cornell Lyris mailing list (chess-daqadmin-l@cornell.edu) for CHESS DAQ announcements. For now, this is a closed list, so please submit a ServiceRequest if you want to join.

Status

All CHESS stations have been connected and (web believe) are saving data to the DAQ.

Cluster Overview

The new CHESS Cluster serves several filesystems which are available using NFS and Samba from anywhere in CLASSE. Each CHESS station and detector has a 10Gb connection to this central storage.

In addition, the new CHESS Cluster can run high-availability services that automatically migrate between cluster members in the event of a failure. For example, the Maia detector requires a binary logger "blogd" daemon that receives data from the detector and writes to disk (and in turn streams that data to clients). If the clustered service generates and writes data itself, in general there will be an underlying file system bound to that service. For example, the "blogd" daemon writes to a local file system which is available remotely at /nfs/chess/maia.

File Systems

The proposed file system sizes use existing in-house infrastructure and storage. File Systems can be expanded or created as needed and funding allows (~$20K/65TB).

All raw CHESS Data will be archived twice per cycle. For details, please see #Raw_Data_Archival_and_Rotation .

Most raw CHESS data that is written over NFS or Samba will use a single "CURRENT CYCLE" directory / link. As described above, services that run in the cluster may use a separate file system dedicated to that service.

The currently available and proposed CHESS file systems include:

Name Size Linux Path Windows PathSorted ascending Writable by Backup Notes
CHESS_OPT
500GB
/nfs/chess/opt
 
chess
nightly
CHESS Maintained Software
CHESS_APS
25TB
/nfs/chess/scratch/aps
 
chess
 
processed data by the APS group.
CHESS_DAQ
230TB
/nfs/chess/daq
\\chessdaq\daq
chess
nightly incrementals, with additional full archive created and stored at end of run
 
CHESS_ID1A3
100TB
/nfs/chess/id1a3
\\chessid1a3\id1a3
chess
nightly incrementals, with additional full archive created and stored at end of run
 
CHESS_ID3A
100TB
/nfs/chess/id3a
\\chessid3a\id3a
chess
nightly incrementals, with additional full archive created and stored at end of run
 
CHESS_ID4B
100TB
/nfs/chess/id4b
\\chessid4b\id4b
chess
nightly incrementals, with additional full archive created and stored at end of run
 
CHESS_AUX
15TB
/nfs/chess/auxiliary
\\chesssamba\auxiliary
chess
nightly
auxiliary metadata and processed raw data
CHESS_MAIA
20TB
/nfs/chess/maia
\\chesssamba\maia
maiagroup on blogd
#Raw_Data_Archival_and_Rotation
Underlying file system for blogd service.
Previously known and also available as \\samba\chess_maia
PREVIOUSMAIA
20TB
/nfs/chess/previousmaia
\\chesssamba\previousmaia
none
none
 
PREVIOUSDAQ
230TB
/nfs/chess/previousdaq
\\chesssamba\raw
none
none
Read-only file system to store data from previous cycle.
100TB
/nfs/chess/previousid3a
\\chesssamba\raw
none
none
Read-only file system to store data from previous cycle.
100TB
/nfs/chess/previousid4b
\\chesssamba\raw
none
none
Read-only file system to store data from previous cycle.
100TB
/nfs/chess/previousid1a3
\\chesssamba\raw
none
none
Read-only file system to store data from previous cycle.
CHESS_RAW
200TB
/nfs/chess/raw
\\chesssamba\raw
none
none
Read-only mount point for all clients.
Provides persistent directory structure for accessing data.
RAWMAIA
10TB
/nfs/chess/raw/maia
\\chesssamba\raw\maia
none
none
Read-only mount point for all clients.
Provides persistent directory structure for accessing data.
CHESS_SCRATCH
10TB
/nfs/chess/scratch
\\chesssamba\scratch
chess
none
scratch / temp data that doesn't need to be backed up
CHESS_USER
26TB
/nfs/chess/user
\\chesssamba\user
chess
nightly
User and project files and data
CHESS_ADMIN
500GB
/nfs/chess/admin
\\samba\chess_admin
chessadmin
nightly
For sharing files amongst CHESS Admin staff
CHESS_WWW
50GB
/nfs/chess/www
\\samba\chess_www
classewww
nightly
CHESS websites

Please note that unless "classe.cornell.edu" is in your search path, you will need to use chesssamba.classe.cornell.edu or chessdaq.classe.cornell.edu for access from Windows.

Backend file system organization

  • CHESS_DAQ = file system for most recent raw data, organized as CHESS_DAQ/year-cycle#/station/PI-proposal/
    • CHESS_DAQ/current is a link to appropriate CHESS_DAQ/year-cycle#
  • PREVIOUSDAQ = read-only file system for data from previous cycle, organized as PREVIOUSDAQ/year-cycle#/station/PI-proposal
  • CHESS_RAW = Read-only file system and export providing persistent directory structure for clients, and restore point for archived data.
    • CHESS_RAW/current is a link to CHESS_DAQ/current
    • Current CHESS_RAW/year-cycle# is a link to corresponding CHESS_DAQ/year-cycle#
    • Previous CHESS_RAW/year-cycle# is a link to corresponding PREVIOUSDAQ/year-cycle#
    • All other CHESS_RAW/year-cycle# directories are restore points for archived data
  • CHESS_AUX = auxiliary project data that needs regular backups
    • CHESS_AUX/cycles/year-cycle# = auxiliary data that's related to corresponding raw data
  • CHESS_USER = user and project data that needs nightly backups

Client Configuration

  • Detectors and stations that write data mount CHESS_DAQ (or appropriate CHESS_DAQ/current/station).
  • All other clients mount and access data from CHESS_RAW.

For more, please see DAQClientConfiguration .

Data Archival and Rotation

Throughout each run, we take nightly incremental backups of the DAQ and MAIA filesystems. These incrementlas are stored indefinitely.

At the end of each run:
  • DAQ and MAIA backups will be scheduled and announced.
  • the filesystems will be made read only
  • one full backup of CHESS_DAQ and CHESS_MAIA will be written to tape. This will be moved offsite. The nightly incremental backups remain in the robot and effectively give the first full backup.

Before each run:
  • CHESS Data Archival and Rotation will be scheduled and announced
    • this will include which directories will be removed from disk. Unless we hear otherwise, we will remove all of the data (not the top-level directories and directory listings) in previousdaq, previousmaia, raw, and rawmaia. Use, for example tree -L 2 -d /mnt/raw to identify directories to be removed.
  • Everything in PREVIOUSDAQ and PREVIOUSMAIA will be deleted from disk.
    • A directory listing of everything removed from disk will be created and stored. For example, CHESS_RAW/2013-1/directorylisting-2014-01-01.txt will list everything that was in the 2013-1 cycle before it was deleted on 2014-01-01.
  • Everything in CHESS_DAQ will move to PREVIOUSDAQ
  • Everything in CHESS_MAIA will move to PREVIOUSMAIA
  • The root directory structure will be created in CHESS_DAQ
  • Links in CHESS_RAW will be updated to point to any moved files.
Topic revision: r56 - 14 Sep 2020, DevinBougie
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding CLASSE Wiki? Send feedback