Tags

Data Stewardship

It's after ten o'clock. Do you know where your data are?

Each of us is individually responsible for making sure that the information which we have accumulated for CLASSE is properly preserved. If it's valuable enough for the NSF, Cornell and CLASSE to have paid us to collect it, then it's valuable enough for us to take care of it. Whether it's used for Laboratory administration or physics research, the information we record is important.

Cornell's statement about our responsibilities with regard to administrative and other data is available at

The National Academy of Sciences has written a report on the Stewardship of Research Data, which is available here:

Important files must be stored on reliable file servers

CLASSE maintains more than a dozen multi-terabyte RAID file servers. They'll reliably store our data, protecting them against hardware failures. Some, but not all, of these RAID file servers are backed up, so files can be recovered which were deleted long ago, perhaps by accident. Other RAID file servers are not backed up, so should be used for information which can be regenerated in one way or another.

List of File Systems

For a list of file systems available from Linux, Mac OS X, and Windows, please see: https://www.classe.cornell.edu/private/computing/filesystems.html

To check if your file system is currently being backed up, see BackupSchedule. To request changes to this schedule, please open a ServiceRequest.

Please see NetworkedFilesystems for instructions on accessing our central filesystems from Linux, OS X, and Windows.

Video: Data Stewardship and File Access under Windows

A 10 minute video describing Data Stewardship and access to filesystems from Windows is available at DataStewardshipVideo

The Windows Samba User Disk and the Linux Home Disk

Our personal files should be stored either in our Windows Samba UserDisk directories or our Linux HomeDisk directories. As it is a heavily used critical central resource, only small files should be kept in an individual's home disk area. To help ensure appropriate usage of our home disk, each user has a 1GB quota on their home disk. The Windows Samba UserDisk has a 5GB quota. Please read HomeDisk and UserDisk for more information.
  • From Linux:
    • our Linux Home Disk is available at /home/userid
    • our Samba User Disk is available at /nfs/user/userid
  • From Windows:
    • our Linux Home Disk is available at \\samba.classe.cornell.edu\home\userid
    • our Samba User Disk is available at \\samba.classe.cornell.edu\user\userid
    • Type the above addresses into a Windows Explorer address bar.
  • From Macintosh:
    • our Linux Home Disk is available at cifs://samba.classe.cornell.edu/home/userid
    • our Samba User Disk is available at cifs://samba.classe.cornell.edu/user/userid
    • From the Finder, click on "Go" and select "Connect to Server", then enter the above addresses in "Server Address".

  • From a non-CLASSE network (like RedRover), you either need to connect to the CLASSE virtual private network (see OpenVPN -- CLASSE login required), or you can use Pydio at https://pydio.classe.cornell.edu/
  • From the CHESS network, you do not need the VPN. However, you will be asked to log into samba with your CLASSE credentials. On Windows, you might need to enter your username as "CLASSE\userid" (note backslash, not forward slash), where userid is your CLASSE userid.
  • From a CLASSE system, you may replace samba.classe.cornell.edu above with just samba (for ease of typing).
  • In the past, the Windows User Disk was on PC50. That file server has been decomissioned and its contents were moved to the Windows Samba UserDisk .
  • Files in our Windows Samba User Disk and Linux Home Disk might or might not have access restrictions protecting them from unauthorized access. Historically they have been readable by anyone. If files need to be protected, it is the responsibility of each of us to make sure that they are. Please read the Wiki page UserDisk for more information on protecting data in the Windows Samba User Disk.
  • The Windows Samba User Disk and Linux Home Disk filesystems have relatively small quotas. This is because maintaining long-term backups is quite expensive. We occasionally have to recover files which were created more than a decade ago, so this expense is well justified.

Scratch Disk

Hundreds of GigaBytes of not-backed-up space are available for storing temporary, intermediate data files. Please be sure to read the Wiki page TemDisk and understand the automatic deletion policies of these filesystems.
  • From Linux, this scratch disk space is available as /cdat/tem and /cdat/tem2
  • From Windows, the same filesystems are available as \\samba.classe.cornell.edu\tem and \\samba.classe.cornell.edu\tem2
  • See TemDisk for more information.

Cornell Box

Cornell Box is a cloud-based file storage service which is good for collaborating with people who don't have CLASSE accounts or for temporarily storing scratch files. Like the local CLASSE scratch disk space described above, Cornell Box is not backed up, so you will not be able to retrieve files that have been accidentally deleted (and emptied from the Trash folder). More information about Cornell Box is available at https://it.cornell.edu/box

Group and Project Disk

The files which need to be accessed by members of a group or project should be stored in that project's or group's own filesystems.

If you do not know the locations of your project's filesystems, contact your supervisor, project or group leader. They know where the appropriate space is, which filesystems are backed up, and what other special characteristics they might have. For example, some filesystems can be accessed only if special arrangements have been made or if you're a member of a particular group.

Please contact your supervisor, project or group leader with questions about which filesystems are appropriate for your use, and please see each project or group's internal documentation for descriptions of their filesystems' organization. Some examples of this documentation include:

If no RAID disk space has been allocated for your group or project, have your supervisor, group, or project leader submit a service request asking for some.

Backed-up Project Space

  • Files which are difficult to recreate (e.g. program source code) should be kept in your project's backed up filesystem.
  • Backed-up project filesystems usually have significantly larger quotas than your personal home directory, but still are limited because of the expense in maintaining its backups.
  • For a list of backed-up filesystems, please see BackupSchedule .

Not-Backed-up Project Space

  • Files wich are relatively easy to recreate (e.g. by re-running an analysis program) should be kept in your project's non-backed-up filesystem.
  • Project data filesystems usually do not have disk space quotas affecting individuals, but may have a limited total capacity.

AutoDesk Inventor: Use Vault regularly

You should frequently check in to the Vault server the AutoDesk design files and the accompanying documentation that you've been working on.

A note about file permissions

Users who are new to the Linux operating system and networked computing environments may be unfamiliar with how file permissions are handled. When a file or directory is created on our networked filesystems (also called central storage), the default file permissions are set such that it can be seen and read (and executed, for binary files) by anyone with a CLASSE account (including external CHESS users), but it can be written to or modified only by the owner. In particular, this is true of Linux home directories and top-level user directories (/nfs/user or \\samba\user shares). This openness of access is common to computing at large labs (CERN, Fermilab, SLAC, etc.).

Every user has fine-grained control over the permissions on the files and directories that he or she owns; access can be restricted or expanded at will. For assistance, please submit a ServiceRequest.

Note that user directories contain a "private" directory (/nfs/user/_userid_/private or \\samba\user\_userid_\private), which, by default, is readable and writable only by the owner.

For more information on Linux file permissions, see, for example, https://www.linux.com/learn/tutorials/309527-understanding-linux-file-permissions. For a list of CLASSE security groups, please see https://wiki.classe.cornell.edu/Main/WikiGroups.

Only temporary scratch files should be kept on your local desktop computer

Your local computer, the one you sit in front of, whether desktop or laptop, should be used to store only short-term information. Some local filesystems automatically delete files that have not been accessed in a certain amount of time.

  • You can store files temporarily in a local scratch directory:
    • Under Windows: in C:\TEMP
    • Under Linux (and Mac OS X): in /tmp and /var/tmp
    • See TemDisk for more information.

  • The local disks on most desktop and laptop computers are not backed up. Their contents might be lost at any time due to software or hardware failures or when your system is replaced or upgraded.
  • If you put files on the local disk for performance reasons, be sure to copy any important files to an appropriate fileserver.
    • CLASSE IT can provide some software for synchronization of files from local to a backed up fileserver.
      • There may be some cost to your account in acquiring the software.
      • This software does not absolve you of making sure your files end up on a fileserver. IT IS NOT A BACKUP of your LOCAL COMPUTER.
  • Too many people have lost a lot of time because files kept on a local system have been lost.
    • Some students have had to rewrite lost theses, adding many months to their stay at Cornell.
    • Researchers have had to redo experiments because data from previous experiments have been lost.

Some data must not be kept on any CLASSE computers

Unfortunately, some types of data must not be kept on any CLASSE computers. In particular, information which the University has classified as "Confidential" must not be stored on any CLASSE filesystem. The Laboratory cannot afford the cost of such information being exposed, nor can it afford the kinds of protections required to keep it secure. Also, Cornell's Research Division has mandated that no Confidential data may be kept on any computers in any of its departments. CLASSE is part of the Research Division.

The following types of data are classified as "Confidential." Their protection is mandated by federal and state laws:
  • Social Security Numbers
  • Credit Card Numbers
  • Drivers License Numbers
  • Bank Account Numbers
  • Health or Medical Treatment Records (but not Radiation Badge records)

This kind of information is often found in old student grade files, performance appraisals and employment applications. They're often buried among files which have been on computers for many years.You, personally, are responsible for finding it and getting rid of it.

We all must develop good habits and clean up, remove, or secure confidential data on our computers and in our workplace. Thie instructions below outline four steps - each fairly simple and straightforward - that should be taken in order to reduce and improve our handling of confidential data. After your first time through the steps, the process will be much easier in the future. For most people this process will be as easy or easier than scanning your computer(s) for viruses or spyware.

Although this description attempts to make this process easy and efficient, each employee must know that securing confidential information is a serious responsibility. Officially, Cornell employees are all now required to comply with the university's new "Data Cleanup and Inventory Initiative." Information about Cornell's Data Cleanup project is available at http://www.cit.cornell.edu/datacleanup/.

FOUR STEPS to Data Cleanup and Inventory

  1. Inventory Locations of Your Files
    • Make a list of all of the Cornell (CHESS, LEPP, MacCHESS, etc.) owned computers that you use and locations where you store files.
    • Make sure you don't forget network folders, webdrive folders, samba shares, nfs mounts, removeable disks, USB sticks, etc.
  2. Install Scanning Software
    • Cornell has a site license for a program called "Identity Finder" for Mac and Windows computers. Find_SSNs is available to help Linux users search for confidential data.
      • If Identity Finder is not already installed on a lab-managed computer, submit a ServiceRequest to get Identity Finder installed on CLASSE computers.
  3. Scan All Your Files
    • You must configure Identity Finder to scan all your file locations and look for confidential information.
      • A "walkthrough" describing how to configure Identity Finder is available at https://wiki.lepp.cornell.edu/lepp/bin/view/Computing/RunningIdentityFinder9. It includes a link to a video showing Identity Finder in use. Be sure to scan all the locations you identified in step 1) above.
      • These instructions must be read before you try to run Identity Finder, otherwise the program will make no sense to you or you won't be scanning the right places.
    • For instructions on running Find_SSNs on linux, please see LnxConfidentialData.
  4. Take Action on Found Violations
    • You must remove any confidential information that the scan locates.

Some places we should NOT be storing files

We should not be storing any files on a CLASSE computer with a name starting with PC or MC, except for temporary scratch files on a local desktop computer.

  • If there are project related files stored on a PC or Mac,
    • You should find out from your project leader when your project's files are scheduled to be moved to a project RAID disk filesystem.
    • If no project RAID disk space has been allocated, have your project leader submit a service request asking for some.
Topic revision: r59 - 07 Feb 2019, AdminDevinBougie
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding CLASSE Wiki? Send feedback