Tags

CHESS Data Storage Management

Please see CHESSUsersDataTransfer for instructions on remotely accessing and copying your data.

The CLASSE IT group maintains a redundant 10Gb High-Availability Linux Cluster to serve and secure centralized storage for CHESS. This document outlines procedures for storing, managing, and archiving this data. Please see DataStorageManagement for a higher-level overview of data storage and management at CLASSE.

Introduction

Mailing List

There is a Cornell Lyris mailing list (chess-daqadmin-l@cornell.edu) for CHESS DAQ announcements.

Cluster Overview

The CHESS DAQ Cluster serves several filesystems which are available using NFS and Samba from anywhere in CLASSE. Each CHESS station and detector has a dedicated 10Gb connection to this central storage.

In addition, the new CHESS Cluster can run high-availability services that automatically migrate between cluster members in the event of a failure. For example, the Maia detector requires a binary logger "blogd" daemon that receives data from the detector and writes to disk (and in turn streams that data to clients). If the clustered service generates and writes data itself, in general there will be an underlying file system bound to that service. For example, the "blogd" daemon writes to a local file system which is available remotely at /nfs/chess/maia.

File Systems

Overview

We have five tiers or classes of CHESS Data, spread out through the following filesystems.

Data Tier Filesystem Description Auto-created Nightly Incremental Backup Full Backup Cleanup / Rotation Path
DAQ / RAW
CHESS_DAQ
Raw data
Yes
Yes
Per-cycle
Per-cycle
/nfs/chess/daq
\\chessdaq\daq
METADATA
CHESS_AUX/metadata
Metadata
Yes
Yes
Per-cycle
No
/nfs/chess/aux/metadata
\\chesssamba\aux\metadata
REDUCED DATA
CHESS_AUX/reduced_data/user
Reduced data NOT associated w/ BTR
No
No
Per-cycle
No
/nfs/chess/aux/reduced_data/user
\\chesssamba\aux\reduced_data\user
 
CHESS_AUX/reduced_data/cycles
Reduced data associated w/ BTR
Yes
No
Per-cycle
No
/nfs/chess/aux/reduced_data/cycles
\\chesssamba\aux\reduced_data\cycles
USER
CHESS_USER
Daily working area
No
Yes
Monthly
No
/nfs/chess/user
\\chesssamba\user
SCRATCH
CHESS_SCRATCH/user
Temporary files NOT associated w/ BTR
No
No
No
No
/nfs/chess/scratch/user
\\chesssamba\scratch\user
 
CHESS_SCRATCH/cycles
Temporary files associated w/ BTR
Yes
No
No
No
/nfs/chess/scratch/cycles
\\chesssamba\scratch\cycles

During each rotation, automatic creation of:
  • Per-cycle directories (above) with parallel directory structure as CHESS_DAQ.
  • Plus /nfs/chess/aux/cycles to contain softlinks to raw/metadata/reduced/scratch directories for each BTR

Current sizes and allocations.

File Systems can be expanded or created as needed and funding allows (~$23K/280TB).

The currently available CHESS file systems include:

Name Size Linux Path Windows Path Writable by Backup Notes
CHESS_DAQ
230TB
/nfs/chess/daq
\\chessdaq\daq
chess
nightly incrementals, with additional full archive created and stored at end of run
 
PREVIOUSDAQ
230TB
/nfs/chess/previousdaq
\\chesssamba\raw
none
none
Read-only file system to store data from previous cycle.
CHESS_ID3A
100TB
/nfs/chess/id3a
\\chessid3a\id3a
chess
nightly incrementals, with additional full archive created and stored at end of run
 
100TB
/nfs/chess/previousid3a
\\chesssamba\raw
none
none
Read-only file system to store data from previous cycle.
CHESS_ID4B
100TB
/nfs/chess/id4b
\\chessid4b\id4b
chess
nightly incrementals, with additional full archive created and stored at end of run
 
100TB
/nfs/chess/previousid4b
\\chesssamba\raw
none
none
Read-only file system to store data from previous cycle.
CHESS_ID1A3
100TB
/nfs/chess/id1a3
\\chessid1a3\id1a3
chess
nightly incrementals, with additional full archive created and stored at end of run
 
100TB
/nfs/chess/previousid1a3
\\chesssamba\raw
none
none
Read-only file system to store data from previous cycle.
CHESS_MAIA
1TB
/nfs/chess/maia
\\chesssamba\maia
maiagroup on blogd
#Raw_Data_Archival_and_Rotation
Underlying file system for blogd service.
Previously known and also available as \\samba\chess_maia
PREVIOUSMAIA
1TB
/nfs/chess/previousmaia
\\chesssamba\previousmaia
none
none
 
CHESS_RAW
200TB
/nfs/chess/raw
\\chesssamba\raw
none
none
Read-only mount point for all clients.
Provides persistent directory structure for accessing data.
CHESS_AUX
100TB
/nfs/chess/auxiliary
\\chesssamba\auxiliary
chess
nightly
auxiliary metadata and processed raw data
CHESS_USER
30TB
/nfs/chess/user
\\chesssamba\user
chess
nightly
User and project files and data
MACCHESS
10TB
/nfs/chess/macchess
\\chesssamba\macchess
chess
nightly
user filesystem for the macchess group.
CHESS_OPT
500GB
/nfs/chess/opt
 
chess
nightly
CHESS Maintained Software
CHESS_ADMIN
500GB
/nfs/chess/admin
\\samba\chess_admin
chessadmin
nightly
For sharing files amongst CHESS Admin staff
CHESS_WWW
50GB
/nfs/chess/www
\\samba\chess_www
classewww
nightly
CHESS websites
CHESS_APS
25TB
/nfs/chess/scratch/aps
 
chess
 
processed data by the APS group.
CHESS_SCRATCH
30TB
/nfs/chess/scratch
\\chesssamba\scratch
chess
none
scratch / temp data that doesn't need to be backed up

Please note that unless "classe.cornell.edu" is in your search path, you will need to use chesssamba.classe.cornell.edu or chessdaq.classe.cornell.edu for access from Windows.

Backend file system organization

  • CHESS_DAQ = file system for most recent raw data, organized as CHESS_DAQ/year-cycle#/station/PI-proposal/
    • CHESS_DAQ/current is a link to appropriate CHESS_DAQ/year-cycle#
  • PREVIOUSDAQ = read-only file system for data from previous cycle, organized as PREVIOUSDAQ/year-cycle#/station/PI-proposal
  • CHESS_RAW = Read-only file system and export providing persistent directory structure for clients, and restore point for archived data.
    • CHESS_RAW/current is a link to CHESS_DAQ/current
    • Current CHESS_RAW/year-cycle# is a link to corresponding CHESS_DAQ/year-cycle#
    • Previous CHESS_RAW/year-cycle# is a link to corresponding PREVIOUSDAQ/year-cycle#
    • All other CHESS_RAW/year-cycle# directories are restore points for archived data
  • CHESS_AUX = auxiliary project data that needs regular backups
    • CHESS_AUX/cycles/year-cycle# = auxiliary data that's related to corresponding raw data
  • CHESS_USER = user and project data that needs nightly backups

Client Configuration

  • Detectors and stations that write data mount CHESS_DAQ (or appropriate CHESS_DAQ/current/station).
  • All other clients mount and access data from CHESS_RAW.

For more, please see DAQClientConfiguration .

Data Archival and Rotation

Throughout each run, we take nightly incremental backups of the DAQ and MAIA filesystems. These incrementlas are stored indefinitely.

At the end of each run:
  • DAQ and MAIA backups will be scheduled and announced.
  • the filesystems will be made read only
  • one full backup of CHESS_DAQ and CHESS_MAIA will be written to tape. This will be moved offsite. The nightly incremental backups remain in the robot and effectively give the first full backup.

Before each run:
  • CHESS Data Archival and Rotation will be scheduled and announced
    • this will include which directories will be removed from disk. Unless we hear otherwise, we will remove all of the data (not the top-level directories and directory listings) in previousdaq, previousmaia, raw, and rawmaia. Use, for example tree -L 2 -d /mnt/raw to identify directories to be removed.
  • Everything in PREVIOUSDAQ and PREVIOUSMAIA will be deleted from disk.
    • A directory listing of everything removed from disk will be created and stored. For example, CHESS_RAW/2013-1/directorylisting-2014-01-01.txt will list everything that was in the 2013-1 cycle before it was deleted on 2014-01-01.
  • Everything in CHESS_DAQ will move to PREVIOUSDAQ
  • Everything in CHESS_MAIA will move to PREVIOUSMAIA
  • The root directory structure will be created in CHESS_DAQ
  • Links in CHESS_RAW will be updated to point to any moved files.
Topic revision: r64 - 16 Sep 2021, WernerSun
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding CLASSE Wiki? Send feedback