Domain for CMS DBS Use Cases
The basic processes of detector physics help name datasets and parts of datasets manipulated by the DBS. We provide a glossary here.
Very little about the physics is relevant to dataset naming or structure.
In a high-energy collision, the likelihood that two particles will collide depends on the cross-section, measured in barns. While the detector takes data every 20ns, and we call this an event,
the number of collisions is some percentage of the number of events, roughly scaled by cross-section. Physics events
are those with actual collisions during the event. They are also called crossings
or interactions. Luminosity
is the number of particles per unit area per unit time times the opacity of the target, usually expressed in cm-2
. The integrated luminosity is the integral of the luminosity with respect to time. The luminosity is an important value to characterize the performance of an accelerator.
After particles collide, they form new particles which often decay into other new particles. These fly off from the collision into the detector. Some particles are charged and others, such as K shorts
, are neutral. Showers
are particle interactions that generate a slew of charged particles. Jets
are particle interactions that generate a slew of charged and neutral particles.
Particular particle decays are called physics processes.
The purpose of a high energy experiment is to measure these physics processes. For example Higgs → Z → µ-µ+.
We measure physics processes using a detector. It has many pieces (sometimes called sub-detectors) to measure different parts of a physics event.
There are many different types of detectors which are capable to measure trajectories of particles. Typical examples are: Wire chambers
(or Drift chambers, DC)
are sheets of wires designed to record when a charged particle passes through them;
is a set of detectors which have a strip on silicon target
is a device that uses a semiconductor (usually silicon) to detect traversing charged particles or the absorption of photons central fiber detector
a set of scintillating fibers mounted on concentric cylinders. Each time when a particle crosses a detector it's capable to record a passing trajectory of the particle as set of hits
(the set of mesurements in given detector).
Muons tend to pass through most inner detector. They are identified in special detector called muon detector
(muon tracking system). Usually such detector cover the outer regions of physics detector.
Because many important interactions produce neutral particles, invisible to tracking detector, they are measured, instead, in the calorimeter
which records the amount of energy deposited.
Each sub-detector is capable to detect from passing particles their projectories with different set of precision. Such trajectories called tracks. So, the tracks in particular sub-detector is called as detector specific names, e.g.
, cft tracks
is the sum of all tracking information. It excludes calorimeter information.
A detector is a large system with its own properties. Any detector property relevant to understanding its data is called a detector condition.
For instance, pedestals
are properties required to decode the meaning of hits. An important detector condition is what parts of the detector are working, either because only a percentage is complete or because a part is not operable.
is a short time interval in human terms, such as a day, of taking data from the detector. How runs are numbered depends on the experiment. For CMS, the years of data-taking are separated into several (about 50) high level triggers
, and runs are unique within a high-level trigger.
The steps necessary for analysis are very much a part of the physicists' domain language. There are two kinds of analysis data, monte carlo
The detector produces data, while Monte Carlo, or MC data, is generated to mimic the data.
A known set of analysis steps on the data is called a pass
of the data. For instance, when there is a revision to a reconstruction algorithm, the production coordinator must initiate another pass of the reconstruction.
The first analysis step happens as in the data acquisition system of the detector. Certain physics processes produces identifiable signatures in the detector. For instance, a high-energy electron with little around it or a large, low-angle energy deposition in the calorimeter. These signatures are signs that an event (a particular 20ns recording time of the detector) may contain data relevant to a set of physics processes.
A trigger line
is a set of criteria for selecting events where a particular set of physics processes may have occurred. The selected events, themselves, are called a trigger stream.
Two trigger streams may contain the same event.
Two kinds of data leave the detector, detector conditions and event data. event data
is any data associated just with a particular 20ns firing of the detector. Among other things, it contains a different types of data objects, e.g.silicon hit objects
, calorimetry objects
, etc. These data objects, and any added by later analysis, are all called event objects,
and they are all associated with the same event.
The first data to leave the detector is called raw
data. Then data is reconstructed
using reconstruction software where a variety of different algorithms are applied to identify tracks in a detectors and build different event objects. The reconstruction software is used to reconstruct raw as well as MC data.
There is an early analysis of the data to look for obvious problems with the detector or analysis. Data is called unchecked
before and checked
Monte Carlo is an algorithm used to generate approximations to data from real physics processes. MC can be categorized according to which subset of physics processes it mimics or according to what steps of data analysis it approximates.
The physics processes a physicist wants to study may be very rare. If the Monte Carlo algorithm generates physics processes with statistics in accordance to the actual detector, then it is called generic MC.
If the algorithm generates only a subset of those physics processes, in order for the physicist to test a detection algorithm, then it is called signal MC.
There are some instances where it may be useful to generate particular physics processes that are nearby, but not exactly, those the physicist's algorithm is searching for. While the name signal is then a misnomer, it is good enough for our purposes.
A second, simultaneous, distinction is at what level of analysis Monte Carlo is used to approximate the data.
- Full Monte Carlo starts from first principles to simulate the physics process, then use GEANT to watch that process interact with the detector, then reconstruct detector hits to form tracks. This is slow.
- Fast Monte Carlo has a combination of two flavors, either individually or together. There are even other variations of ways to speed MC.
- Generate particles, but parameterize detector response.
- Make tracks directly so that you parameterize reconstruction.
The various flavors of Fast MC skip steps that a data analysis requires. It may even do some of the same processing steps, skip a couple, then do some more that are the same.
General Data Breakdown
However datasets are constructed for a project, physicists understand the data objects within them according to a few parameters. The first is the id of the event within a run. Then it is possible to distinguish events according to when they are produced during the analysis process. For instance, hits and calorimetry data come before tracks during processing stages. You can think of these two as axes for the event object domain.
Lastly, physicists are aware that some data is on tape and some readily at-hand. The distinction is not an implementation detail. You might think of the left axis of the preceding graph as how accessible certain data objects are, hopefully arranged according to how frequently they are used and how expensive it is not to have them close at hand.
Together, these make three axes for a physicist's understanding of how to select event objects. A single event contains many event objects which can be categorized according to access cost and analysis history.
The CMS collaboration distinguishes types of data by data tiers
. Each tier represents a stage of processing, and many common datafiles contain data belonging to several tiers.
Samples and Skims
is a set of data all relevant to a single physics process. A trigger stream is a specific kind of sample related to data acquisition, but there are analyses later in data processing that produce more specific samples. A physicist might ask their data manager, "Do we have any of this sample?" If not, the data manager might say, "We'll make more tomorrow, and you should see even more next week." Receiving new data which meets a certain criteria increases your sample.
is set of data for a particular physics process, where the data is seen as a subset of another dataset. As such, a skim would be a sample.
-- Drew Dolgert
- 26 Apr 2006