We are making a prototype for histogram visualization using
OpenDX reading from XML files. The XML files are transliterated from ROOT format.
The XML structure, using < for "contains", looks like:
- cmsdata < event
- event < product
- product < object
- object < datamember | container
- datamember < datamember | container
- container < object
These tags represent the ROOT instance hierarchy of events contain products contain datamembers which may be containers of other datamembers.
Searches of the data are specified with strings that look like "(recoCaloJets,caloJetICone5,)[*].p4_". In parentheses is the product. The brackets indicate the product is a container of members, and ".p4_" means they each have a p4_ datamember.
For visualization, we often want to retrieve the p4_ members and compare them with, say, vertex_ members from the same event, product, and member of the container. We implemented that by recording the indices of the event, product, and datamember in the XML file as a "locator". For instance, the third event yields a 2. Its first product is a 0. Eighth jet in that product is 7, and the p4_ is the first datamember, for a 0. "2 0 7 0". Matching it with a vertex_ member means matching only "2 0 7", because the vertex_ is the third datamember in that jet, "2 0 7 2".
How many locators need to match? It depends on how similar the initial query and brushed query are. In the above example, we compared "(recoCaloJets,caloJetICone5,)[*].p4_" to "(recoCaloJets,caloJetICone5,)[*].vertex_". What if we compare "(recoCaloJets,caloJetICone5,)[*].p4_" to "(recoCaloMETs,Met,)[*].vertex_"? Then we want to see all vertex_ from the specified product that are in the same
event as the previously found p4_. One way to say this is that we need to find a match in locators between initial and brushed queries that, at a minimum, satisfies the maximal overlap in their two search strings.
We can do some compression, as well, on the queries. If we have a set of locators for p4_ such as [3 0 1 4, 3 0 2 4, 3 0 3 4], then only [3,3,3] matter, so we can just transfer a [3] to the program.
--
AndrewDolgert - 22 Nov 2006