Tags

Event Store Administration

Event Store (ES) administration is a most complicated and challenging task. It needs to deal with different underlying databases, a variety of file formats and use cases. It should support data and DB integrity checks, DB transactions, data movement, DB upgrades, etc.

This document is a first attempt to summarize some of the common use cases and describes the existing tools.

Event Store notations:

  • a timeStamp is a time in YYYMMDD format assigned to data by the czar
  • a grade is a grade assigned to data by czar
    Please note that we distinguish a grade into "writable" and "readable"
    • writable grade (e.g physics-unchecked) is used for the injection
    • readable grade (e.g. daq) is used by users for data access
  • dataVersionName is a description of how/when data were produced
    Please note: aliases are specificVersionName or svName. Example: P2-20040312-Feb13_04_P2-data32. Event Store does not restrict the format of assigned dataVersionName, but for convinience for production system we following the following convension: P2-dateWhenWeStartP2-ReleaseName-dataset.
  • graphid is an id which is formed by combination of timeStamp and a grade
  • svid is an id associated with a dataVersionName

A full description of Event Store graph and injection policies can be found on the web: https://wiki.lepp.cornell.edu/CleoSWIG/bin/view/Main/EventStore

Event Store software:

An administrative toolkit is written in python and rely on specific python modules to access underlying DB. We use MySQL-python (http://sourceforge.net/projects/mysql-python) to access MySQL DB and pysqlite (http://initd.org/tracker/pysqlite) to access SQLite DB. The SQLite version 3 and MySQL version 4.xx have been used which provides a full 64-bit support and autoincrement.

Event Store databases:

Currently for 'Personal Event Store size' we use SQLite as the underlying DB. It doesn't require any administration and it is available for variety of OS's. The original Event Store used the SQLite DB which can be found at /nfs/objy/EventStoreDB/ESDBDir/sqlite.db

For 'Group Event Store size' we use the MySQL DB with the following configuration:
  • lnx151 is a dedicated master ES server. This server is used for all injection into Event Store.
  • lnx150 is a dedicated slave ES server. This server is used by all clients (users and pass2) to access data from Event Store.

On a master ES server the following DBs are available:
  • EventStore is the main Event Store DB
  • EventStoreTest is a complete copy of main the Event Store DB (which is kept in sync with EventStore by nights crontab job). The purpose of this DB is for testing injection.
  • EventStoreTMP is a temporary DB dedicated to development
  • EventStoreUnionTest is a temporary DB for union tests of GroupEventStoreToolkit
  • EventStoreCalibrate is intended to be used for calibration, but currently is an empty DB
  • EventStoreMC is intended to be used for MC data. A first attempt to inject MC data was done previously, but later we decided to inject MC into main Event Store DB

Event Store tables:

The layout of all tables can be found in the attachment. An administrative tools must handle transactions to allow parallel injection into Event Store. Due to differences in type names and support of the underlying DB the transactions and table layout are different for different DBs. As an example, SQLite DB doesn't provide row locking and instead use table locking mechanism for transactions. During that period of time a separate cursor needs to be used to access data from DB.

Integrity checks:

Below we list common use cases which have been used so far. For each of them several DB and data integrity checks are performed before the injection step. Among of them
  • All input files should have non-overlapping data
    • input files may have overlapping events, but those events should contain different data, e.g. fileA and fileB are post-p2 corrections, fileA contains BeamEnergy and fileB contains Dedx data.
    Examples: inject pass2 output
    • 2photon_hot_runXXX.pds
    • qcd_hot_runXXX.pds
    • bhagam_hot_runXXX.pds
    • unknown_hot_runXXX.pds
    Problem: suppose we injected only three of them (qcd, 2photon and bhagam) and later when problem was discovered we trying to inject unknown_hot_runXXX.pds. Since key/location files for first three files were already created and presented in ES, it will be forbiden to inject with the same grade/timeStamp. Solution: inject with new grade/timeStamp or clean-up ES DB and reinject again all data.
  • Added data should not overlap with data existing in ESDB
  • You cannot inject the same data twice
  • File names should be unique (you cannot inject a file with the same file name twice)

The following algorithm is applied to do integrity checks:
if supplied files contains different dataTypes:
   fail to inject
else:
   if supplied files contains overlapping events:
      fail to inject

if supplied timeStamp/grade/svName is present in ESDB:
  if run presents in ESDB:
     if data from supplied files are present in ESDB:
        # scan location file for dataTypes and key file for sync. values
        if dataType from supplied files are the same wrt to found loc. file from ESDB:
           if syncValues from supplied files overlap with syncValues from key file found in ESDB:
              fail to inject
           else:
              allow to inject
        else:
           allow to inject
     else:
        allow to inject
  else
     allow to inject  
else
   allow to inject

Use cases:

  • Case 1) Create a new fresh ESDB and add new data to ESDB
      Requirements:
    • notify admin. tool that a new DB needs to be created
    • specify the timeStamp and grade for the new data
    • specify the location of your data
    • specify the output location for storing the auxilliary key and location files
      Optional:
    • specify which DB to use (default: SQLite)
    • specify which DB name to use (default: EventStoreTMP)
    • location of log files (default: current directory)
    • transaction log (default: current directory)
    • DB authentication
    • verbose level
  • Case 2) Add data to ESDB.
      Requirements:
    • specify the timeStamp and grade for the new data
    • specify the location of your data
    • specify output location for storing the auxilliary key and location files
    • specify a list of parents (in the form of dataVersionNames)
      Optional: see Case1.
    • write out the original data file to another location (used by populator)
      Below I will list concrete examples of these cases:
    • a) add raw data
    • b) add new pass2 output (a dataset)
    • c) add post-p2 corrections
    • d) add Dskims
  • Case 3) Move data from one grade to another
      Requirements:
    • old and new grade's name
    • old and new time stamps
      Optional: see Case1.
    • list of runs which need to be excluded
    • run range to allow movement of only specified range
    • dataVersionName
  • Case 4) Move files from one location to another
      Requirements:
    • list of files to be moved
    • destination directory
  • Case 5) Delete an existing grade
      Requirements:
    • grade and timeStamp

We will discuss details of how to perform these tasks below. It is desirable to keep a complete history of users command as well as a transaction log to allow debugging and DB recovery.

Administrative toolkit:

Right now all administrative tools are written in python and grouped in the GroupEventStoreTookit package which is available from cleo CVS. The toolkit is not associated with any particular release, although in order to run it MySQLdb and sqlite python modules need to be set up in your environment. By default the underlying OS does not provide those tools and they're installed in other_sources together with python. To avoid these complications a set of wrappers around the actual python scripts is provided. As an example, ESBuilder is a wrapper which set up all environment variables necessary in order to use the injection script ESBuilder.py (among them location of MySQLdb and SQLIte modules, PYTHONPATH, etc.) The toolkit contains documentation (which needs to be updated) and a description of all options along with some examples.

Below you can find a list of administrative tasks which can be done using this GroupEventStoreToollkit:
  • create a new Event Store
    • currently two DBs are supported: SQLite and MySQL (a third one, BerkeleyDB, has been dropped).
  • add a file or set of files to DB
    the following formats are supported:
    • input : PDS, binary, IDXA
    • output: key file, pds location, binary location
  • add data from an event list based on input IDXA file
  • move file(s) inside of Event Store
  • delete a grade from Event Store
  • perform updates of Event Store tables
  • keep history of user commands
  • print contents of underlying DB tables
  • print content of data (pds and binary), key and location files
  • convert DB content from MySQL<->SQLite

    It also has the following features:
  • transaction log
  • history log, to keep history of users requests

There are several tools available to perform various tasks:
  • ESBuilder - is an injection tool
  • ESDump - dumps the content of ESDB tables
  • ESFileContent - prints the content of pds/binary/key/(pds/binary)-location files
  • ESVersionManager - is an ES Version management tool
  • ESAddComment and ESGetComment are new additions which add/get comments from ESDB.
All tools contains a standard set of options, such as -mysql or -log, and their description and usage can be found by invoking 'ESXxxxxxxx -help'. In order to run any tool you need to setup ESTOOLKIT environment variable to point to locatio of GroupEventStoreToolkit.

Below you can find description of ESBuilder options as they listed by specifying '-help', '--help', '-examples' options
==================================================================================
ESBuilder -help
==================================================================================
Usage: Usage: ESBuilder [ -help ] [ --help ] [ -examples ] [ -profile ]
                 [ -verbose ] [ -historyFile < fileName > ]
                 [ -esdb < dbName > ] [ -sqlite < fileName > ]
                 [ -mysql host [ -user < userName > -password < password > ] ]
                 [ -log < fileName or %stdout or %stderr > ] [ -logDir < dir > ]
                 [ -add < dir or file or pattern of files > ]
                 [ -grade < grade > ] [ -time < timeStamp > ]
                 [ -dataVersionName < name > ]  [ -view < skim > ] [ -no-key ]
                 [ -listOfParents < dataVersionName's > ] [ -output < dir > ]
                 [ -esdb < whichDBToUse > ] [ -oBinFile < fileName > ]

Options can be specified in any order.
For option description please run 'ESBuilder --help'
For use cases please run 'ESBuilder -examples'
Contact: Valentin Kuznetsov, vk@mail.lepp.cornell.edu

==================================================================================
ESBuilder --help
==================================================================================
Option description:
*       -grade:   specifies the grade, e.g. "physics", "p2-unchecked"
*       -time:    specifies the timeStamp, e.g. 20090227
*       -add:     adds data file(s) to the EventStore
                  You may specify: directory, file name or a list of files
                  For patterns use '*', e.g MC*tau*.pds
*       -output:  output location for storing key/location files
*       -dataVersionName: specifies the data version name (aka svName)

        -view:    specifies the view, e.g. "tau"
        -no-key:  do not generate a key file (e.g. when adding post-p2 corrections)
        -listOfParents specify list of parents for given injection,
                  e.g. while injecting p2-unchecked grade its parent is 'daq'.
        -esdb:    specify which DB to use, default is EventStoreTMP
        -newDB:   force the creation of a new EventStore
        -sqlite   use the SQLite version of EventStore
                  default sqlite.db, otherwise a fileName needs to be provided
        -mysql    use the MySQL version of EventStore. In order to access MySQL
                  you need either provide login/password through the -user/-password
                  options or create $HOME/.esdb.conf with user:password entry
        -verbose: verbose mode, a lot of useful printout
        -idleMode when this flag is specified, no key/location file will be
                  generated (useful once you have them and want reproduce DB
                  content). But content of DB will be updated. USE WITH CAUTION.
        -delete   delete a grade from EventStore. USE WITH CAUTION.
                  You need to provide the grade and the timeStamp.
        -oBinFile this option designed for use by the populator HSM script. You need to
                  provide the fileName of the binaryFile which you indend to write.
        -log      specify the log file. You may either provide a file name or
                  '%stdout' or '%stderr'. Please note that '%' is required in front of
                  stdout or stderr to distinguish them from a fileName.
                  The default log name is constructed as
                  esdb.log.YYYYMMDD_HHMMSS_PID and the log file is written to the directory
                  where the script is invoked. Please note that once a job successfully
                  finishes, the esdb.log.YYYYMMDD_HHMMSS_PID is copied to the global
                  log file "esdb.log" and is removed from teh local (logDir) directory,
                  otherwise esdb.log.YYYYMMDD_HHMMSS_PID remains.
        -logDir   specify the output log directory.
        -profile  perform internal profiling.

Please note: required parameters are marked with (*). All options can be
specified in any order. By default: view='all', EventStoreTMP DB is used and key/location
files are generated.

For a complete discussion of EventStore, see
http://www.lns.cornell.edu/restricted/CLEO/meetings/2005/jan05.html.files/kuznetsov_28.pdf

==================================================================================
ESBuilder -examples
==================================================================================
Adding all files from /cdat/tem/myData directory into physics grade using 20090215
timeStamp and P2-data99_vs1 data version name. The key/location files will be written
into the /cdat/tem/myData directory. All data will be injected on MySQL running
on lnxXXX into EventStoreTMP (which is default DB name):
ESBuilder -add /cdat/tem/myData -grade physics -time 20090215
          -dataVersionName P2-data99_vs1 -mysql lnxXXX

Adding pattern (My*.pds) from /cdat/tem/myData directory into physics grade using 20090215
timeStamp, P2-data99_vs1 data version name and qcd view. Put output files (key/location)
into /cdat/tem/output. At this time we inject into sqlite.db
ESBuilder -add /cdat/tem/myData/My*.pds -grade physics -time 20090215
          -dataVersionName P2-data99_vs1 -output /cdat/tem/output -view qcd
          -sqlite /cdat/tem/sqlite.db

Injection of raw data:
ESBuilder -mysql lnxXXX -output /cdat/tem/index -time 0 -grade daq
          -dataVersionName daq -add /cdat/cleo/r205114.bin

Add file /cdat/tem/myData/DTag.pds into the physics grade using 20090215 as the
timeStamp. Associate this data with P2-data99_vs1 (parent graph) and assign
P2-data99-DTag data version name. Put output files (key/location)
into /cdat/tem/output. Here we also specify a concrete DB we going to use, EventStoreDB.
ESBuilder -add /cdat/tem/myData/DTag.pds -grade physics -time 20090215
          -dataVersionName P2-data99-DTag -listOfParents P2-data99_vs1
          -output /cdat/tem/output -mysql lnxXXX -esdb EventStoreDB
==================================================================================

The examples listed above cover some of the common use cases. But to be specific I will provide the exact list of options for every use case we discussed so far. Please note that the -add option accepts the following combinations:
-add /some/path/my/dir
-add /some/path/my/dir/file.pds
-add /some/path/my/dir/*file*.pds
Also there are some default settings:
  • EventStoreTMP is used as the default DB name
  • SQLite is used as the default DB
  • all logs are written into the current location unless -log and -logDir are provided

Use cases:

  • Case 1: Create a new fresh ESDB and add new data to ESDB
    ESBuilder -add /some/path/my/dir -grade physics-unchecked -time 20090215 
    -dataVersionName P2-data99_vs1 -newDB
    

    optionally you may add the following:
    -logDir /my/log/dir -log %stdout -esdb MyEventStore (-mysql lnx151 or -sqlite
    sqlite.db)
    

  • Case2a: add raw data
    ESBuilder -add /some/path/my/dir/runXXXXXX.bin -grade daq-unchecked
    -time 20090215 -dataVersionName P2-data99_vs1 -mysql lnx151 -esdb
    EventStoreTest -oBinFile /hsm/location/file.bin
    

  • Case 2b: add new pass2 output
    ESBuilder -mysql lnx151 -esdb EventStoreTest -output /my/output/dir
    -time 20041217 -grade p2-unchecked 
    -dataVersionName P2-20041203-20041104_P2-data34_vs5
    -add /cdat/sol409/disk2/pass2/data34_vs5/run204327/*_hot_*.pds
    

  • Case 2c: add post-p2 corrections
    ESBuilder -add /cdat/sol514/disk2/pass2/data35_vs1/run205085/p2post_205085_vs2.pds
    -grade physics-unchecked -time 20090215
    -dataVersionName P2-20050320-20050120_FULL-data35_vs1-PP2-EBeam
    -listOfParents P2-20041223-20041104_P2-data35_vs1
    -output /my/output/dir -esdb EventStoreTest -mysql lnx151 -no-key
    

  • Case 2d: add Dskims
    ESBuilder -mysql lnx151 -esdb EventStoreTest -output /my/output/dir
    -logDir /my/output/dir/logs -time 20090215
    -grade dtag-unchecked -dataVersionName DSkim-20050323-20050316_FULL
    -add /nfs/cleoc/data1/event/dir/dskim_data35_205079.pds
    -listOfParents P2-20050320-20050120_FULL-data35_vs1-PP2-EBeam
    

  • Case 3: Move data from one grade to another
    To add a new version with same run range:
    ESVersionManager -grade p2-unchecked p2-postprocess -time 0 0 
    -runRange 111222 111333
    
    if runRange option is skipped all run ranges which belong to p2-unchecked
    grade and timeStamp=0 will be moved.
    
    To add a new version excluding some runs:
    ESVersionManager -grade p2-unchecked p2-postprocess -time 0 0 -runList runlist.txt
    runlist.txt is required file containing runs which need to be excluded
    

  • Case 4: Move files from one location to another
    ESBuilder -move file.txt /new/location/dir -esdb EventStoreTest
    

  • Case 5: Delete an existing grade
    ESBuilder -delete physics 20091205 -esdb EventStoreTest
    

Miscelleneous

The GroupEventStoreToolkit contains a Test subdirectory with several tests. All use cases listed above with their variations can be found there.

Every tool can be run in a profiler mode, by specifying '-profile' option.

A new tool feedMetaDataDB.py has been developed to add new data to an existing Meta Data DB. It is based on HTTPlib and uses the underlying http protocol to contact the CS DB rather than the buggy SOAP.

There are CGI scripts for a Web-based interface, although no one look at those.

There is a tool to find a particular file based on run/grade/timeStamp info.

On-going activities

Here is incomplete list of tasks which I need to implement or fix in the existing version of GroupEventStoreToolkit:
  • re-write algorithm for adding post-p2 like data. The existing algorithm only accepts a single file, and we may provide a list of files.
  • code reorganization. Due to fast development cycle, some obsolete and/or un-optimized code exists.
  • Finalize and document API and workflow.
  • Finalize useful set of options based on users feedback
  • Add new code for merging one DB with another. Example: MN may create a new DB which need to be merged with Cornell.
  • Add more union tests
  • Fix permission of key/location files (make it 0444)
  • 64-bit case with SQLite
  • Try to resolve SOAP problems and use SOAPpy
  • Create tools for separate DB integrity checks (needs to be done separetely from injection process)
  • Adjust CGI scripts to the new API and if necessary tune Web-based interface

-- ValentinKuznetsov - 05 Apr 2005
Topic revision: r3 - 20 Apr 2005, ValentinKuznetsov
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding CLASSE Wiki? Send feedback