Data Transfer Recommendations
Over the past year, the CLASSE IT group has incrementally revamped the computing infrastructure for CHESS data acquisition (DAQ), and during the last run of 2015, it was used at all CHESS stations (excluding MacCHESS) for the first time. This new system, known as CHESS-DAQ, consists of a dedicated high-speed network that connects the experimental stations to over 130 terabytes of enterprise-class redundant disk arrays, as well as an offsite magnetic tape library for long-term archival. From September through December 2015, we collected roughly 40 terabytes of data, which is more than a factor of 12 increase over the first run of 2015.
This rapid growth in data volume raises additional considerations for CHESS users who wish to transfer their data to a portable storage device while onsite. During this past run, we have observed data transfer speeds limited by the intrinsic speed of the user's portable storage device, which can vary by an order of magnitude from 10 MB/s to 100 MB/s, depending on the device. At 10 MB/s, a dataset of 100 GB would take roughly 3 hours to transfer. Another important variable is the number of files to be transferred; for a given overall dataset size, the transfer speed plummets as the number of files increases.
Therefore, we would like to make the following recommendations for this upcoming Spring 2016 run:
- If you expect to collect more than approximately 100 GB of data, please make arrangements for data transfer with your staff scientist at least one week before you arrive. We may advise you to transfer your data piecemeal throughout your time onsite, among other suggestions.
- Some considerations when purchasing a USB drive for data transfer:
- Please check online reviews for performance measurements. If possible, select a drive that is:
- benchmarked at 100 MB/s or faster for sustained sequential transfer (not burst speed), and note the difference between megabits (Mb) and megabytes (MB)
- USB3 instead of USB2
- a hard disk drive (HDD) or a solid state drive (SSD), not a so-called thumb drive or memory stick (often sold as a "USB flash drive").
- You may also wish to consult with your local IT support for compatibility requirements with your institution's computers.
- If your dataset is too large for the transfer to complete before you depart, we may
- issue you a CLASSE account so you can log in from your home institution and use SFTP/SCP to transfer your data or
- ship you your hard drive at your expense after the transfer is complete. Be sure to provide shipping information to the Users' Office.
For the future, we are exploring new services that may facilitate both onsite and offsite data transfers.
Also, please note that, thanks to our new data archiving capability, we can make available any dataset collected in Fall 2015 or later. As of now, our data retention policy is:
- Data from the current run and the previous run are immediately available on disk for SFTP/SCP.
- On the timescale of a week, data from any prior run can be retrieved from tape storage and restored to disk for SFTP/SCP. To request a dataset to be restored, please contact CLASSE IT: https://wiki.classe.cornell.edu/Computing/ServiceRequestTips
Proprietary users who do not want their data archived may contact the Users' Office to request accommodation.