Draft Run Metadata Table Structure
----------------------------------

DATASET Definition
------------------
This table maps a dataset name onto one or more runs. There is a discussion
about whether a dataset is defined by a list of runs, a single run range
or a set of run ranges. To remain moderately general, we will use a set
of run ranges.

Note that not every run number in the range necessarily exists
in the database. Some runs are junk and are not stored.

DATASET_NAME		STRING PRIMARY KEY
FIRST_RUN		INTEGER
LAST_RUN		INTEGER

You could go wild with constraints, and it may be worth it.
One important constraint is that run ranges must be disjoint.


RAW RUN Properties
------------------
This table contains only non-versioned, immutable information about each
raw run. This information is collected from the Online systems.

RUN_NUMBER		INTEGER PRIMARY KEY
START_TIME		DATE
EVENT_COUNT		INTEGER
LAST_EVENT_NUMBER	INTEGER
LUMINOSITY		FLOAT
BEAM_ENERGY		FLOAT
MAGNETIC_FIELD		FLOAT
TERMINATION_STATUS	STRING
DETECTOR_CONFIG_ID	STRING
TRIGGER_CONFIG		??? (encoded as a bitmask in the raw data)
RUNMANAGER_APPROVED	SMALLINT

It may be that we record all runs, even the junk runs that are never
populated, since we are collecting some of this information at a time
when the value of a run is unknown.  However, we may delete runs
with fewer than a certain number of events.

TERMINATION_STATUS will probably be restricted to a set of well known
values (and may therefore become an integer).

TODO: additional run conditions, CESR condition data?  Pull stuff out
of elog?


DETECTOR CONFIGURATION
----------------------
This changes very rarely, so it makes sense to break it out into a
separate table.

DETECTOR_CONFIG_ID	STRING
COMPONENT		STRING
PRESENT			SMALLINT
PRIMARY KEY(DETECTOR_CONFIG_ID,COMPONENT)

For each detector configuration, a record will appear in this table
for every component indicating whether it was present in that
configuration.


REPAIRED RUN Properties
-----------------------
This table is probably not involved in user queries. However, it is necessary
for Reconstruction. This table records if a run was truncated at population
time because some fraction of the events were determined to be junk due to
a problem with the detector that arose after the run was started. (This
nasty behavior then has an impact on the interpretation of the detector
quality information.)
It may also be the case that run repair deleted a small number of events.
Run repair will have to update this table if it deletes any events.

RUN_NUMBER		INTEGER
MODIFICATION_TIME	DATE
EVENT_COUNT		INTEGER
LAST_EVENT_NUMBER	INTEGER
PRIMARY KEY(RUN_NUMBER, MODIFICATION_TIME)

Since more than one truncation may occur, the primary key must be more
than the run number.


RECONSTRUCTED RUN Properties
----------------------------

RUN_NUMBER		INTEGER
PROVENANCE_GROUP_ID	INTEGER
EVENTS_WRITTEN		INTEGER
START_TIME		DATE
FINISH_TIME		DATE
PRIMARY KEY(RUN_NUMBER, PROVENANCE_GROUP_ID)

The start time is the time the first reconstruction (sub)job started.
The finish time is the time the last reconstruction (sub)job finished.
The time of the collation activity is not recorded, since it does not
modify any of the data, merely repackage it.

The event class count is the number of events written out for this run that
are in the specified event class. An event may be (and frequently is) a
member of more than one class, so there are no obvious constraints to apply
to the event class counts.


RECONSTRUCTED RUN Event Counts
------------------------------

RUN_NUMBER		INTEGER
PROVENANCE_GROUP_ID	INTEGER
EVENT_CLASS		STRING
EVENT_COUNT		INTEGER
PRIMARY KEY(RUN_NUMBER, EVENT_CLASS)

Records how many events in the run were categorized in to particular
event classes.

ENERGY CLASS
------------

This maps an energy class name onto an energy range.
There may be more than one energy class for a particular energy range.
(e.g., PSI, PSI-on, PSI-off will overlap)

ENERGY_RANGE		STRING	PRIMARY KEY
MIN_BEAM_ENERGY		FLOAT
MAX_BEAM_ENERGY		FLOAT


Provenance Group
----------------

PROVENANCE_GROUP_ID	INTEGER PRIMARY KEY
PARENT_PROV_GROUP	INTEGER
SOFTWARE_VERSION_ID	STRING
CONSTANTS_TAG		STRING
CONFIGURATION_ID	STRING


Processing Change
-----------------

This table records information about a change in processing that
resulted in the creation of a new provenance group.

PROCESSING_CHANGE_ID	STRING PRIMARY KEY
PROCESSING_CHANGE_TYPE	STRING
PROCESSING_CHANGE_DESC	STRING
PROCESSING_CHANGE_DATE	DATE

what needs to be captured to recognize deviations from standard
processing:
 - load order of modules with date and checksums
 - parameters and datatypes produced of each module
 - load order and datatypes produced of all data sources
 - order and sinks of all paths
 - provenance of all sources
 - Platform and operating system

Software Version
----------------

This table records information about a change in the code relase used
that resulted in the creation of a new provenance group.

SOFTWARE_VERSION_ID	STRING PRIMARY KEY
CODE_RELEASE		STRING
PACKAGE_VERSION_ID	STRING
PACKAGE_CHANGE_REASON	STRING
PACKAGE_CHANGE_DATE	DATE


Package Version
---------------

This table records the versions of all the packages which compose a
code release.

PACKAGE_VERSION_ID	STRING
PACKAGE_NAME		STRING
PACKAGE_CVS_TAG		STRING
PRIMARY KEY(PACKAGE_VERSION_ID, PACKAGE_NAME)


RECONSTRUCTION QUALITY
----------------------

This table records the set of reconstruction components
which were good e.g. is the muon identification good for a run.  This
is an attribute of the reconstructed run, and ideally would be
represented as a SET in the reconstruction run table rather than a
separate table.  This will have lots of repetition, so it may be
possible to optimize this.

RUN_NUMBER		INTEGER
PROVENANCE_GROUP_ID	INTEGER
TYPE			STRING
ISGOOD 			SMALLINT
PRIMARY KEY(RUN_NUMBER,PROVENANCE_GROUP_ID, TYPE)


CALIBRATED SCALERS
------------------

This table contains the information about a run that is iteratively refined
over time (i.e., updates are not necessarily linked to provenance group
updates).

TYPE			STRING?
RUN_NUMBER		INTEGER
VALUE			depends on the type, but mostly FLOATS?
CREATED			DATE
PRIMARY KEY(TYPE, RUN_NUMBER, CREATED)

CREATED is the date the new value was produced.

The current values for TYPE are
calibrated beam energy
calibrated magnetic field,
calibrated luminosity or
a detector-component-quality descriptor.

All of the types identified so far are calculated post-Reconstruction.

We probably will need other tables for constraints (e.g., valid event classes),
but they are trivial and we can create them if and when they are needed.