Draft Run Metadata Table Structure ---------------------------------- DATASET Definition ------------------ This table maps a dataset name onto one or more runs. There is a discussion about whether a dataset is defined by a list of runs, a single run range or a set of run ranges. To remain moderately general, we will use a set of run ranges. Note that not every run number in the range necessarily exists in the database. Some runs are junk and are not stored. DATASET_NAME STRING PRIMARY KEY FIRST_RUN INTEGER LAST_RUN INTEGER You could go wild with constraints, and it may be worth it. One important constraint is that run ranges must be disjoint. RAW RUN Properties ------------------ This table contains only non-versioned, immutable information about each raw run. This information is collected from the Online systems. RUN_NUMBER INTEGER PRIMARY KEY START_TIME DATE EVENT_COUNT INTEGER LAST_EVENT_NUMBER INTEGER LUMINOSITY FLOAT BEAM_ENERGY FLOAT MAGNETIC_FIELD FLOAT TERMINATION_STATUS STRING DETECTOR_CONFIG_ID STRING TRIGGER_CONFIG ??? (encoded as a bitmask in the raw data) RUNMANAGER_APPROVED SMALLINT It may be that we record all runs, even the junk runs that are never populated, since we are collecting some of this information at a time when the value of a run is unknown. However, we may delete runs with fewer than a certain number of events. TERMINATION_STATUS will probably be restricted to a set of well known values (and may therefore become an integer). TODO: additional run conditions, CESR condition data? Pull stuff out of elog? DETECTOR CONFIGURATION ---------------------- This changes very rarely, so it makes sense to break it out into a separate table. DETECTOR_CONFIG_ID STRING COMPONENT STRING PRESENT SMALLINT PRIMARY KEY(DETECTOR_CONFIG_ID,COMPONENT) For each detector configuration, a record will appear in this table for every component indicating whether it was present in that configuration. REPAIRED RUN Properties ----------------------- This table is probably not involved in user queries. However, it is necessary for Reconstruction. This table records if a run was truncated at population time because some fraction of the events were determined to be junk due to a problem with the detector that arose after the run was started. (This nasty behavior then has an impact on the interpretation of the detector quality information.) It may also be the case that run repair deleted a small number of events. Run repair will have to update this table if it deletes any events. RUN_NUMBER INTEGER MODIFICATION_TIME DATE EVENT_COUNT INTEGER LAST_EVENT_NUMBER INTEGER PRIMARY KEY(RUN_NUMBER, MODIFICATION_TIME) Since more than one truncation may occur, the primary key must be more than the run number. RECONSTRUCTED RUN Properties ---------------------------- RUN_NUMBER INTEGER PROVENANCE_GROUP_ID INTEGER EVENTS_WRITTEN INTEGER START_TIME DATE FINISH_TIME DATE PRIMARY KEY(RUN_NUMBER, PROVENANCE_GROUP_ID) The start time is the time the first reconstruction (sub)job started. The finish time is the time the last reconstruction (sub)job finished. The time of the collation activity is not recorded, since it does not modify any of the data, merely repackage it. The event class count is the number of events written out for this run that are in the specified event class. An event may be (and frequently is) a member of more than one class, so there are no obvious constraints to apply to the event class counts. RECONSTRUCTED RUN Event Counts ------------------------------ RUN_NUMBER INTEGER PROVENANCE_GROUP_ID INTEGER EVENT_CLASS STRING EVENT_COUNT INTEGER PRIMARY KEY(RUN_NUMBER, EVENT_CLASS) Records how many events in the run were categorized in to particular event classes. ENERGY CLASS ------------ This maps an energy class name onto an energy range. There may be more than one energy class for a particular energy range. (e.g., PSI, PSI-on, PSI-off will overlap) ENERGY_RANGE STRING PRIMARY KEY MIN_BEAM_ENERGY FLOAT MAX_BEAM_ENERGY FLOAT Provenance Group ---------------- PROVENANCE_GROUP_ID INTEGER PRIMARY KEY PARENT_PROV_GROUP INTEGER SOFTWARE_VERSION_ID STRING CONSTANTS_TAG STRING CONFIGURATION_ID STRING Processing Change ----------------- This table records information about a change in processing that resulted in the creation of a new provenance group. PROCESSING_CHANGE_ID STRING PRIMARY KEY PROCESSING_CHANGE_TYPE STRING PROCESSING_CHANGE_DESC STRING PROCESSING_CHANGE_DATE DATE what needs to be captured to recognize deviations from standard processing: - load order of modules with date and checksums - parameters and datatypes produced of each module - load order and datatypes produced of all data sources - order and sinks of all paths - provenance of all sources - Platform and operating system Software Version ---------------- This table records information about a change in the code relase used that resulted in the creation of a new provenance group. SOFTWARE_VERSION_ID STRING PRIMARY KEY CODE_RELEASE STRING PACKAGE_VERSION_ID STRING PACKAGE_CHANGE_REASON STRING PACKAGE_CHANGE_DATE DATE Package Version --------------- This table records the versions of all the packages which compose a code release. PACKAGE_VERSION_ID STRING PACKAGE_NAME STRING PACKAGE_CVS_TAG STRING PRIMARY KEY(PACKAGE_VERSION_ID, PACKAGE_NAME) RECONSTRUCTION QUALITY ---------------------- This table records the set of reconstruction components which were good e.g. is the muon identification good for a run. This is an attribute of the reconstructed run, and ideally would be represented as a SET in the reconstruction run table rather than a separate table. This will have lots of repetition, so it may be possible to optimize this. RUN_NUMBER INTEGER PROVENANCE_GROUP_ID INTEGER TYPE STRING ISGOOD SMALLINT PRIMARY KEY(RUN_NUMBER,PROVENANCE_GROUP_ID, TYPE) CALIBRATED SCALERS ------------------ This table contains the information about a run that is iteratively refined over time (i.e., updates are not necessarily linked to provenance group updates). TYPE STRING? RUN_NUMBER INTEGER VALUE depends on the type, but mostly FLOATS? CREATED DATE PRIMARY KEY(TYPE, RUN_NUMBER, CREATED) CREATED is the date the new value was produced. The current values for TYPE are calibrated beam energy calibrated magnetic field, calibrated luminosity or a detector-component-quality descriptor. All of the types identified so far are calculated post-Reconstruction. We probably will need other tables for constraints (e.g., valid event classes), but they are trivial and we can create them if and when they are needed.