You are here: CLASSE Wiki>CHESS Web>DataStorageManagement (16 Sep 2021, WernerSun)Edit Attach

CHESS Data Storage Management

Please see CHESSUsersDataTransfer for instructions on remotely accessing and copying your data.

The CLASSE IT group maintains a redundant 10Gb High-Availability Linux Cluster to serve and secure centralized storage for CHESS. This document outlines procedures for storing, managing, and archiving this data. Please see DataStorageManagement for a higher-level overview of data storage and management at CLASSE.

Introduction

CHESS DAQ Overview (2015-07-23)
CHESS DAQ @ station computers (2015-09-09)

Mailing List

There is a Cornell Lyris mailing list (chess-daqadmin-l@cornell.edu) for CHESS DAQ announcements.

Cluster Overview

The CHESS DAQ Cluster serves several filesystems which are available using NFS and Samba from anywhere in CLASSE. Each CHESS station and detector has a dedicated 10Gb connection to this central storage.

In addition, the new CHESS Cluster can run high-availability services that automatically migrate between cluster members in the event of a failure. For example, the Maia detector requires a binary logger "blogd" daemon that receives data from the detector and writes to disk (and in turn streams that data to clients). If the clustered service generates and writes data itself, in general there will be an underlying file system bound to that service. For example, the "blogd" daemon writes to a local file system which is available remotely at /nfs/chess/maia.

File Systems

Overview

We have five tiers or classes of CHESS Data, spread out through the following filesystems.

Data Tier	Filesystem	Description	Auto-created	Nightly Incremental Backup	Full Backup	Cleanup / Rotation	Path
REDUCED DATA	CHESS_AUX/reduced_data/user	Reduced data NOT associated w/ BTR	No	No	Per-cycle	No	/nfs/chess/aux/reduced_data/user \\chesssamba\aux\reduced_data\user
USER	CHESS_USER	Daily working area	No	Yes	Monthly	No	/nfs/chess/user \\chesssamba\user
SCRATCH	CHESS_SCRATCH/user	Temporary files NOT associated w/ BTR	No	No	No	No	/nfs/chess/scratch/user \\chesssamba\scratch\user
DAQ / RAW	CHESS_DAQ	Raw data	Yes	Yes	Per-cycle	Per-cycle	/nfs/chess/daq \\chessdaq\daq
METADATA	CHESS_AUX/metadata	Metadata	Yes	Yes	Per-cycle	No	/nfs/chess/aux/metadata \\chesssamba\aux\metadata
	CHESS_AUX/reduced_data/cycles	Reduced data associated w/ BTR	Yes	No	Per-cycle	No	/nfs/chess/aux/reduced_data/cycles \\chesssamba\aux\reduced_data\cycles
	CHESS_SCRATCH/cycles	Temporary files associated w/ BTR	Yes	No	No	No	/nfs/chess/scratch/cycles \\chesssamba\scratch\cycles

During each rotation, automatic creation of:

Per-cycle directories (above) with parallel directory structure as CHESS_DAQ.
Plus /nfs/chess/aux/cycles to contain softlinks to raw/metadata/reduced/scratch directories for each BTR

Current sizes and allocations.

File Systems can be expanded or created as needed and funding allows (~$23K/280TB).

The currently available CHESS file systems include:

Name	Size	Linux Path	Windows Path	Writable by	Backup	Notes
CHESS_DAQ	230TB	/nfs/chess/daq	\\chessdaq\daq	chess	nightly incrementals, with additional full archive created and stored at end of run
PREVIOUSDAQ	230TB	/nfs/chess/previousdaq	\\chesssamba\raw	none	none	Read-only file system to store data from previous cycle.
CHESS_ID3A	100TB	/nfs/chess/id3a	\\chessid3a\id3a	chess	nightly incrementals, with additional full archive created and stored at end of run
PREVIOUSID3A	100TB	/nfs/chess/previousid3a	\\chesssamba\raw	none	none	Read-only file system to store data from previous cycle.
CHESS_ID4B	100TB	/nfs/chess/id4b	\\chessid4b\id4b	chess	nightly incrementals, with additional full archive created and stored at end of run
PREVIOUSID4B	100TB	/nfs/chess/previousid4b	\\chesssamba\raw	none	none	Read-only file system to store data from previous cycle.
CHESS_ID1A3	100TB	/nfs/chess/id1a3	\\chessid1a3\id1a3	chess	nightly incrementals, with additional full archive created and stored at end of run
PREVIOUSID1A3	100TB	/nfs/chess/previousid1a3	\\chesssamba\raw	none	none	Read-only file system to store data from previous cycle.
CHESS_MAIA	1TB	/nfs/chess/maia	\\chesssamba\maia	maiagroup on blogd	#Raw_Data_Archival_and_Rotation	Underlying file system for blogd service. Previously known and also available as \\samba\chess_maia
PREVIOUSMAIA	1TB	/nfs/chess/previousmaia	\\chesssamba\previousmaia	none	none
CHESS_RAW	200TB	/nfs/chess/raw	\\chesssamba\raw	none	none	Read-only mount point for all clients. Provides persistent directory structure for accessing data.
CHESS_AUX	100TB	/nfs/chess/auxiliary	\\chesssamba\auxiliary	chess	nightly	auxiliary metadata and processed raw data
CHESS_USER	30TB	/nfs/chess/user	\\chesssamba\user	chess	nightly	User and project files and data
MACCHESS	10TB	/nfs/chess/macchess	\\chesssamba\macchess	chess	nightly	user filesystem for the macchess group.
CHESS_OPT	500GB	/nfs/chess/opt		chess	nightly	CHESS Maintained Software
CHESS_ADMIN	500GB	/nfs/chess/admin	\\samba\chess_admin	chessadmin	nightly	For sharing files amongst CHESS Admin staff
CHESS_WWW	50GB	/nfs/chess/www	\\samba\chess_www	classewww	nightly	CHESS websites
CHESS_APS	25TB	/nfs/chess/scratch/aps		chess		processed data by the APS group.
CHESS_SCRATCH	30TB	/nfs/chess/scratch	\\chesssamba\scratch	chess	none	scratch / temp data that doesn't need to be backed up

Please note that unless "classe.cornell.edu" is in your search path, you will need to use chesssamba.classe.cornell.edu or chessdaq.classe.cornell.edu for access from Windows.

Backend file system organization

CHESS_DAQ = file system for most recent raw data, organized as CHESS_DAQ/year-cycle#/station/PI-proposal/
- CHESS_DAQ/current is a link to appropriate CHESS_DAQ/year-cycle#
PREVIOUSDAQ = read-only file system for data from previous cycle, organized as PREVIOUSDAQ/year-cycle#/station/PI-proposal
CHESS_RAW = Read-only file system and export providing persistent directory structure for clients, and restore point for archived data.
- CHESS_RAW/current is a link to CHESS_DAQ/current
- Current CHESS_RAW/year-cycle# is a link to corresponding CHESS_DAQ/year-cycle#
- Previous CHESS_RAW/year-cycle# is a link to corresponding PREVIOUSDAQ/year-cycle#
- All other CHESS_RAW/year-cycle# directories are restore points for archived data
CHESS_AUX = auxiliary project data that needs regular backups
- CHESS_AUX/cycles/year-cycle# = auxiliary data that's related to corresponding raw data
CHESS_USER = user and project data that needs nightly backups

Client Configuration

Detectors and stations that write data mount CHESS_DAQ (or appropriate CHESS_DAQ/current/station).
All other clients mount and access data from CHESS_RAW.

For more, please see DAQClientConfiguration .

Data Archival and Rotation

Throughout each run, we take nightly incremental backups of the DAQ and MAIA filesystems. These incrementlas are stored indefinitely.

At the end of each run:

DAQ and MAIA backups will be scheduled and announced.
the filesystems will be made read only
one full backup of CHESS_DAQ and CHESS_MAIA will be written to tape. This will be moved offsite. The nightly incremental backups remain in the robot and effectively give the first full backup.

Before each run:

CHESS Data Archival and Rotation will be scheduled and announced
- this will include which directories will be removed from disk. Unless we hear otherwise, we will remove all of the data (not the top-level directories and directory listings) in previousdaq, previousmaia, raw, and rawmaia. Use, for example tree -L 2 -d /mnt/raw to identify directories to be removed.
Everything in PREVIOUSDAQ and PREVIOUSMAIA will be deleted from disk.
- A directory listing of everything removed from disk will be created and stored. For example, CHESS_RAW/2013-1/directorylisting-2014-01-01.txt will list everything that was in the 2013-1 cycle before it was deleted on 2014-01-01.
Everything in CHESS_DAQ will move to PREVIOUSDAQ
Everything in CHESS_MAIA will move to PREVIOUSMAIA
The root directory structure will be created in CHESS_DAQ
Links in CHESS_RAW will be updated to point to any moved files.

Topic revision: r64 - 16 Sep 2021, WernerSun

CHESS

Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding CLASSE Wiki? Send feedback