The following is the original description I sent out to describe the system
--------------------------------
Principals
--------------------------------
Want users to concentrate on physics rather than debugging or book keeping. We also want to make best use of computer resources in order to minimize the time a user has to wait for a result (user time is more important than CPU time). We want to promote data exploration which means the user has to feel like the system will let them safely back out of a choice (say a series of selection criteria) and go a different way and maybe come back to the first set of choices.
--------------------------------
Possible system design
--------------------------------
Layered approach (the farther up the system the user works the more the system will do for them)
goal: allow user to choose the level to which they are comfortable plus allows system to be built incrementally with early user feedback
immediate feedback using preliminary results
goal: requests that take a long time (e.g. making a plot over a large dataset) show their progress so users can diagnose problems (e.g. bad intervals for the plot) early and abort the request. Also, users are happier if they know things are preceeding
pre-emptive construction of event lists
goal: while user are entering their request (e.g. 'want 3 jets in event') we start processing the data and finding those events which pass the list of criteria the user has already specified. Since computers work faster than people can type it may be possible to do some of the work of answering the users request while they user is still specifying the request
auto-generation of event tags
goal: users typically vary only a few selection criteria at one time. We could create a course 'event tag' which holds course results from the typical selection criterial (e.g. This event has '0 jets, #jets =1, #jets=2, 3<#jets<5, #jets > 5). Then when processing request we first look at the tags to see if the selection criteria can be meet and if it is then we look at that event in more detail
domain specific declarative selection syntax
goal: allow user to use a selection syntax which is 'natural' to them. In addition, declarative syntax allows to system to know what you want done and therefore the system can better optimize your request (e.g. reorder the criteria so the fastest checks can be done first). Also allows one to capture better provenance information about the event selection
use previous 'event lists' if selections are tightened
goal: when a selection is tightened we know that all events that did not pass the previous selection will not pass the new selection
warn if new selection criteria are looser than original skim
goal: avoid the problem of thinking your skim is looser than it actually is
auto-setup of distributed processing
goal: if one has a local cluster available for analysis they system will automatically move your data to the cluster and run the requests there. Users should not have to do anything special to make this work. To make clusters easy to set up we can use the ZeroConf (what Apple calls 'Bonjour') to do auto-discovery over the network.
tracking provenance
goal: provide all book keeping needed to understand ones results (e.g. all cuts that were applied when making a plot)
job completion time estimates with ability to migrate processing to the Grid
goal: user only has to learn one analysis tool in order to do work on their local machine, local cluster or the grid. If job completion is estimated to exceed a certain amount of time, the system could ask the user if they would like the job to be submitted to the Grid.
ability to detach client and then reattach without losing work
goal: processing some requests can take longer than I'm willing to stay in one place but I do not want to lose my work if I stop it e.g. close my laptop when I leave work and then reconnect when I get home. Therefore processing should be allowed (in a controlled manner) to complete even without the client around.
-------------------------------------------------
Possible highest layer user interface
-------------------------------------------------
Notebook style
full undo/redo
highly interactive data interaction
e.g. dragging in the plot window will change characteristics of the plot
command completions
build 'macros' from history
--
ChrisDJones - 11 Aug 2006