Event Store Administration
Event Store (ES) administration is a most complicated and challenging task.
It needs to deal with different underlying databases, a variety of file formats and
use cases. It should support data and DB integrity checks, DB transactions,
data movement, DB upgrades, etc.
This document is a first attempt to summarize some of the common use
cases and describes the existing tools.
Event Store notations:
- a timeStamp is a time in YYYMMDD format assigned to data by the czar
- a grade is a grade assigned to data by czar
Please note that we distinguish a grade into "writable" and "readable"
- writable grade (e.g physics-unchecked) is used for the injection
- readable grade (e.g. daq) is used by users for data access
- dataVersionName is a description of how/when data were produced
Please note: aliases are specificVersionName or svName.
Example: P2-20040312-Feb13_04_P2-data32. Event Store does not restrict the format of assigned
dataVersionName, but for convinience for production system we following the following convension:
P2-dateWhenWeStartP2-ReleaseName-dataset.
- graphid is an id which is formed by combination of timeStamp and a grade
- svid is an id associated with a dataVersionName
A full description of Event Store graph and injection policies can be found on
the web:
https://wiki.lepp.cornell.edu/CleoSWIG/bin/view/Main/EventStore
Event Store software:
An administrative toolkit is written in python and rely on specific python modules to access underlying DB.
We use
MySQL-python (
http://sourceforge.net/projects/mysql-python) to access
MySQL DB and pysqlite (
http://initd.org/tracker/pysqlite) to access
SQLite DB. The
SQLite version 3 and
MySQL version 4.xx have been used which provides a full 64-bit support and autoincrement.
Event Store databases:
Currently for 'Personal Event Store size' we use
SQLite as the underlying DB. It doesn't
require any administration and it is available for variety of OS's. The original
Event Store used the
SQLite DB which can be found at
/nfs/objy/EventStoreDB/ESDBDir/sqlite.db
For 'Group Event Store size' we use the
MySQL DB with the following
configuration:
-
lnx151 is a dedicated master ES server. This server is used for all injection
into Event Store.
-
lnx150 is a dedicated slave ES server. This server is used by all clients
(users and pass2) to access data from Event Store.
On a master ES server the following DBs are available:
- EventStore is the main Event Store DB
- EventStoreTest is a complete copy of main the Event Store DB (which is kept in
sync with EventStore by nights crontab job). The purpose
of this DB is for testing injection.
- EventStoreTMP is a temporary DB dedicated to development
- EventStoreUnionTest is a temporary DB for union tests of
GroupEventStoreToolkit
- EventStoreCalibrate is intended to be used for calibration, but currently is an empty DB
- EventStoreMC is intended to be used for MC data. A first attempt to
inject MC data was done previously, but later we decided to
inject MC into main Event Store DB
Event Store tables:
The layout of all tables can be found in the attachment.
An administrative tools must handle transactions to allow parallel injection into Event Store.
Due to differences in type names and support of the underlying DB the transactions and table layout are
different for different DBs. As an example,
SQLite DB doesn't provide row locking and instead use table locking mechanism for transactions. During that period of time a separate cursor needs to be used to access data from DB.
Integrity checks:
Below we list common use cases which have been used so far. For each of them several DB and data integrity checks are performed before the injection step. Among of them
- All input files should have non-overlapping data
- input files may have overlapping events, but those events should contain different data, e.g.
fileA and fileB are post-p2 corrections, fileA contains BeamEnergy and fileB contains Dedx data.
Examples: inject pass2 output
- 2photon_hot_runXXX.pds
- qcd_hot_runXXX.pds
- bhagam_hot_runXXX.pds
- unknown_hot_runXXX.pds
Problem: suppose we injected only three of them (qcd, 2photon and bhagam) and later when problem was discovered we trying to inject unknown_hot_runXXX.pds. Since key/location files for first three files were already created and presented in ES, it will be forbiden to inject with the same grade/timeStamp.
Solution: inject with new grade/timeStamp or clean-up ES DB and reinject again all data.
- Added data should not overlap with data existing in ESDB
- You cannot inject the same data twice
- File names should be unique (you cannot inject a file with the same file name twice)
The following algorithm is applied to do integrity checks:
if supplied files contains different dataTypes:
fail to inject
else:
if supplied files contains overlapping events:
fail to inject
if supplied timeStamp/grade/svName is present in ESDB:
if run presents in ESDB:
if data from supplied files are present in ESDB:
# scan location file for dataTypes and key file for sync. values
if dataType from supplied files are the same wrt to found loc. file from ESDB:
if syncValues from supplied files overlap with syncValues from key file found in ESDB:
fail to inject
else:
allow to inject
else:
allow to inject
else:
allow to inject
else
allow to inject
else
allow to inject
Use cases:
-
Case 1) Create a new fresh ESDB and add new data to ESDB
Requirements:
- notify admin. tool that a new DB needs to be created
- specify the timeStamp and grade for the new data
- specify the location of your data
- specify the output location for storing the auxilliary key and location
files
Optional:
- specify which DB to use (default: SQLite)
- specify which DB name to use (default: EventStoreTMP)
- location of log files (default: current directory)
- transaction log (default: current directory)
- DB authentication
- verbose level
-
Case 2) Add data to ESDB.
Requirements:
- specify the timeStamp and grade for the new data
- specify the location of your data
- specify output location for storing the auxilliary key and location
files
- specify a list of parents (in the form of dataVersionNames)
Optional: see Case1.
- write out the original data file to another location (used by
populator)
Below I will list concrete examples of these cases:
- a) add raw data
- b) add new pass2 output (a dataset)
- c) add post-p2 corrections
- d) add Dskims
-
Case 3) Move data from one grade to another
Requirements:
- old and new grade's name
- old and new time stamps
Optional: see Case1.
- list of runs which need to be excluded
- run range to allow movement of only specified range
- dataVersionName
-
Case 4) Move files from one location to another
Requirements:
- list of files to be moved
- destination directory
-
Case 5) Delete an existing grade
Requirements:
- grade and timeStamp
We will discuss details of how to perform these tasks below.
It is desirable to keep a complete history of users command as well as a
transaction log to allow debugging and DB recovery.
Right now all administrative tools are written in python and grouped in the
GroupEventStoreTookit package which is available from cleo CVS. The toolkit is not
associated with any particular release, although in order to run it
MySQLdb and
sqlite python modules need to be set up in your environment. By default
the underlying OS does not provide those tools and they're installed in
other_sources together with python. To avoid these complications a set of
wrappers around the actual python scripts is provided. As an example,
ESBuilder is a wrapper which set up all
environment variables necessary in order to use the injection script
ESBuilder.py (among them
location of
MySQLdb and
SQLIte modules,
PYTHONPATH, etc.)
The toolkit contains documentation (which needs to be updated) and
a description of all options along with some examples.
Below you can find a list of administrative tasks which can be done using this
GroupEventStoreToollkit:
- create a new Event Store
- currently two DBs are supported: SQLite and MySQL (a third one, BerkeleyDB, has been dropped).
- add a file or set of files to DB
the following formats are supported:
- input : PDS, binary, IDXA
- output: key file, pds location, binary location
- add data from an event list based on input IDXA file
- move file(s) inside of Event Store
- delete a grade from Event Store
- perform updates of Event Store tables
- keep history of user commands
- print contents of underlying DB tables
- print content of data (pds and binary), key and location files
- convert DB content from MySQL<->SQLite
It also has the following features:
- transaction log
- history log, to keep history of users requests
There are several tools available to perform various tasks:
- ESBuilder - is an injection tool
- ESDump - dumps the content of ESDB tables
- ESFileContent - prints the content of pds/binary/key/(pds/binary)-location files
- ESVersionManager - is an ES Version management tool
- ESAddComment and ESGetComment are new additions which add/get comments from ESDB.
All tools contains a standard set of options, such as -mysql or -log, and their description and usage can be found by invoking 'ESXxxxxxxx -help'. In order to run any tool you need to setup ESTOOLKIT environment variable to point to locatio of
GroupEventStoreToolkit.
Below you can find description of ESBuilder options as they listed by
specifying '-help', '--help', '-examples' options
==================================================================================
ESBuilder -help
==================================================================================
Usage: Usage: ESBuilder [ -help ] [ --help ] [ -examples ] [ -profile ]
[ -verbose ] [ -historyFile < fileName > ]
[ -esdb < dbName > ] [ -sqlite < fileName > ]
[ -mysql host [ -user < userName > -password < password > ] ]
[ -log < fileName or %stdout or %stderr > ] [ -logDir < dir > ]
[ -add < dir or file or pattern of files > ]
[ -grade < grade > ] [ -time < timeStamp > ]
[ -dataVersionName < name > ] [ -view < skim > ] [ -no-key ]
[ -listOfParents < dataVersionName's > ] [ -output < dir > ]
[ -esdb < whichDBToUse > ] [ -oBinFile < fileName > ]
Options can be specified in any order.
For option description please run 'ESBuilder --help'
For use cases please run 'ESBuilder -examples'
Contact: Valentin Kuznetsov, vk@mail.lepp.cornell.edu
==================================================================================
ESBuilder --help
==================================================================================
Option description:
* -grade: specifies the grade, e.g. "physics", "p2-unchecked"
* -time: specifies the timeStamp, e.g. 20090227
* -add: adds data file(s) to the EventStore
You may specify: directory, file name or a list of files
For patterns use '*', e.g MC*tau*.pds
* -output: output location for storing key/location files
* -dataVersionName: specifies the data version name (aka svName)
-view: specifies the view, e.g. "tau"
-no-key: do not generate a key file (e.g. when adding post-p2 corrections)
-listOfParents specify list of parents for given injection,
e.g. while injecting p2-unchecked grade its parent is 'daq'.
-esdb: specify which DB to use, default is EventStoreTMP
-newDB: force the creation of a new EventStore
-sqlite use the SQLite version of EventStore
default sqlite.db, otherwise a fileName needs to be provided
-mysql use the MySQL version of EventStore. In order to access MySQL
you need either provide login/password through the -user/-password
options or create $HOME/.esdb.conf with user:password entry
-verbose: verbose mode, a lot of useful printout
-idleMode when this flag is specified, no key/location file will be
generated (useful once you have them and want reproduce DB
content). But content of DB will be updated. USE WITH CAUTION.
-delete delete a grade from EventStore. USE WITH CAUTION.
You need to provide the grade and the timeStamp.
-oBinFile this option designed for use by the populator HSM script. You need to
provide the fileName of the binaryFile which you indend to write.
-log specify the log file. You may either provide a file name or
'%stdout' or '%stderr'. Please note that '%' is required in front of
stdout or stderr to distinguish them from a fileName.
The default log name is constructed as
esdb.log.YYYYMMDD_HHMMSS_PID and the log file is written to the directory
where the script is invoked. Please note that once a job successfully
finishes, the esdb.log.YYYYMMDD_HHMMSS_PID is copied to the global
log file "esdb.log" and is removed from teh local (logDir) directory,
otherwise esdb.log.YYYYMMDD_HHMMSS_PID remains.
-logDir specify the output log directory.
-profile perform internal profiling.
Please note: required parameters are marked with (*). All options can be
specified in any order. By default: view='all', EventStoreTMP DB is used and key/location
files are generated.
For a complete discussion of EventStore, see
http://www.lns.cornell.edu/restricted/CLEO/meetings/2005/jan05.html.files/kuznetsov_28.pdf
==================================================================================
ESBuilder -examples
==================================================================================
Adding all files from /cdat/tem/myData directory into physics grade using 20090215
timeStamp and P2-data99_vs1 data version name. The key/location files will be written
into the /cdat/tem/myData directory. All data will be injected on MySQL running
on lnxXXX into EventStoreTMP (which is default DB name):
ESBuilder -add /cdat/tem/myData -grade physics -time 20090215
-dataVersionName P2-data99_vs1 -mysql lnxXXX
Adding pattern (My*.pds) from /cdat/tem/myData directory into physics grade using 20090215
timeStamp, P2-data99_vs1 data version name and qcd view. Put output files (key/location)
into /cdat/tem/output. At this time we inject into sqlite.db
ESBuilder -add /cdat/tem/myData/My*.pds -grade physics -time 20090215
-dataVersionName P2-data99_vs1 -output /cdat/tem/output -view qcd
-sqlite /cdat/tem/sqlite.db
Injection of raw data:
ESBuilder -mysql lnxXXX -output /cdat/tem/index -time 0 -grade daq
-dataVersionName daq -add /cdat/cleo/r205114.bin
Add file /cdat/tem/myData/DTag.pds into the physics grade using 20090215 as the
timeStamp. Associate this data with P2-data99_vs1 (parent graph) and assign
P2-data99-DTag data version name. Put output files (key/location)
into /cdat/tem/output. Here we also specify a concrete DB we going to use, EventStoreDB.
ESBuilder -add /cdat/tem/myData/DTag.pds -grade physics -time 20090215
-dataVersionName P2-data99-DTag -listOfParents P2-data99_vs1
-output /cdat/tem/output -mysql lnxXXX -esdb EventStoreDB
==================================================================================
The examples listed above cover some of the common use cases. But to be
specific I will provide the exact list of options for every use case we discussed so far.
Please note that the
-add option accepts the following combinations:
-add /some/path/my/dir
-add /some/path/my/dir/file.pds
-add /some/path/my/dir/*file*.pds
Also there are some default settings:
-
EventStoreTMP is used as the default DB name
- SQLite is used as the default DB
- all logs are written into the current location unless -log and -logDir are provided
Use cases:
-
Case 1: Create a new fresh ESDB and add new data to ESDB
ESBuilder -add /some/path/my/dir -grade physics-unchecked -time 20090215
-dataVersionName P2-data99_vs1 -newDB
optionally you may add the following:
-logDir /my/log/dir -log %stdout -esdb MyEventStore (-mysql lnx151 or -sqlite
sqlite.db)
-
Case2a: add raw data
ESBuilder -add /some/path/my/dir/runXXXXXX.bin -grade daq-unchecked
-time 20090215 -dataVersionName P2-data99_vs1 -mysql lnx151 -esdb
EventStoreTest -oBinFile /hsm/location/file.bin
-
Case 2b: add new pass2 output
ESBuilder -mysql lnx151 -esdb EventStoreTest -output /my/output/dir
-time 20041217 -grade p2-unchecked
-dataVersionName P2-20041203-20041104_P2-data34_vs5
-add /cdat/sol409/disk2/pass2/data34_vs5/run204327/*_hot_*.pds
-
Case 2c: add post-p2 corrections
ESBuilder -add /cdat/sol514/disk2/pass2/data35_vs1/run205085/p2post_205085_vs2.pds
-grade physics-unchecked -time 20090215
-dataVersionName P2-20050320-20050120_FULL-data35_vs1-PP2-EBeam
-listOfParents P2-20041223-20041104_P2-data35_vs1
-output /my/output/dir -esdb EventStoreTest -mysql lnx151 -no-key
-
Case 2d: add Dskims
ESBuilder -mysql lnx151 -esdb EventStoreTest -output /my/output/dir
-logDir /my/output/dir/logs -time 20090215
-grade dtag-unchecked -dataVersionName DSkim-20050323-20050316_FULL
-add /nfs/cleoc/data1/event/dir/dskim_data35_205079.pds
-listOfParents P2-20050320-20050120_FULL-data35_vs1-PP2-EBeam
-
Case 3: Move data from one grade to another
To add a new version with same run range:
ESVersionManager -grade p2-unchecked p2-postprocess -time 0 0
-runRange 111222 111333
if runRange option is skipped all run ranges which belong to p2-unchecked
grade and timeStamp=0 will be moved.
To add a new version excluding some runs:
ESVersionManager -grade p2-unchecked p2-postprocess -time 0 0 -runList runlist.txt
runlist.txt is required file containing runs which need to be excluded
-
Case 4: Move files from one location to another
ESBuilder -move file.txt /new/location/dir -esdb EventStoreTest
-
Case 5: Delete an existing grade
ESBuilder -delete physics 20091205 -esdb EventStoreTest
Miscelleneous
The
GroupEventStoreToolkit contains a Test subdirectory with several tests.
All use cases listed above with their variations can be found there.
Every tool can be run in a profiler mode, by specifying '-profile' option.
A new tool
feedMetaDataDB.py has been developed to add new data to an existing Meta Data DB.
It is based on HTTPlib and uses the underlying http protocol to contact the CS DB rather than the buggy SOAP.
There are CGI scripts for a Web-based interface, although no one look at those.
There is a tool to find a particular file based on run/grade/timeStamp info.
On-going activities
Here is incomplete list of tasks which I need to implement or fix in the existing version of
GroupEventStoreToolkit:
- re-write algorithm for adding post-p2 like data. The existing algorithm only accepts a single file, and we may provide a list of files.
- code reorganization. Due to fast development cycle, some obsolete and/or un-optimized code exists.
- Finalize and document API and workflow.
- Finalize useful set of options based on users feedback
- Add new code for merging one DB with another. Example: MN may create a new DB which need to be merged with Cornell.
- Add more union tests
- Fix permission of key/location files (make it 0444)
- 64-bit case with SQLite
- Try to resolve SOAP problems and use SOAPpy
- Create tools for separate DB integrity checks (needs to be done separetely from injection process)
- Adjust CGI scripts to the new API and if necessary tune Web-based interface
--
ValentinKuznetsov - 05 Apr 2005