XRootD Buffer

From DUNE
Jump to navigation Jump to search

UNDER CONSTRUCTION

General Design

Note, the definitive source of data rate/volume numbers is in DUNE DocDB #1086. The extracted numbers below may become out of date.

  • A few bits of info:
    • 230MB per readout (all 6 APAs, 5ms window)
    • Lossless compression: 4
    • File size will determine the number of parallel writes and vice versa
    • Nominal "max top instantaneous rate": 3GB/s
    • Nominal spill is 4.5s
    • Spill cycle: 22.5s
    • Nominal data in spill: 13.5GB
    • Nominal data out of spill: 13.5GB
    • Average sustained: 1.2GB/s
  • A proposal for the online buffer:DUNE DocDB 1628

XROOTD-FTS Interface

Design Considerations

  • ofs.notify can be set to send ASCII messages to a FIFO or a user app/script
  • format and content of the message can be configured by the developer
  • considerations for blocking of xrootd by the reader of the FIFO

Required Functionality

  • Poll FIFO file descriptor for waiting data.
  • Parse message
  • Record message in persistent storage (eg, a database of some shape)
  • Promptly notify tasks that must handle message
  • Return to Polling ASAP
  • Multiple handlers of messages:
    • notifying F-FTS when both DAQ data file and metadata file are ready for reading
    • possibly run metadata file producer (unless DAQ handles this)
    • notifying shift of errors via run control
    • notifying experts via various (email, SMS)
  • Must have recovery mode to replay notification based on some criteria

Proposed, High-Level Design

The figure provides a cartoon of the actors involved in buffering the DAQ data files and associated metadata files.

Notifier.svg

Explanation:

  • Multiple DAQ Event Builder (EB) nodes write to XRootD
  • XRootD Redirector sends transfers to specific XRootD on specific storage node (not shown)
  • XRootD Redirector writes message to FIFO on state changes
  • Handler poll() to get notified of FIFO ready to read
  • Handler records message in local DB and forwards to Process Dispatcher and returns to poll() ASAP. Internally this may involve message dispatch to threaded workers over thread safe message queue.
  • Process Dispatcher actually handles the message by running a number of processes for each message. A map of message type to processes is part of its configuration.
  • Notifier is responsible to tell FTS via HTTP POST about files being ready to transfer and recording this act in to the DB. The notification largely consists of the XRootD URL of the files to transfer. Note: this is not a persistent HTTP query; the response says nothing about the success of the transfer (check if this is true).
  • If the buffer nodes are made responsible for producing the metadata file, we start this process. Note, this design currently does not take into account how to run this process on the storage node hosting the DAQ data file.
  • The Recovery process is a command line program that can query SAM and the local DB to determine what files entered the buffer but never made it to EOS. It can also resend messages to the Process Dispatcher to replay the actions to attempt to reprocess the messages.
  • The local DB is accessed by agents via SQLAlchemy using a shared code base for the ORM models. A central table holds info about each incoming XRootD message and associated tables hold results of processing the message by the various other agents, including replays.

Options:

The Process Dispatcher can be responsible for writing message to DB.


The various agents shown here may be implemented as separate processes or amalgamated to some extent. Eg, Process Dispatcher may absorb Handler and Notifier, but probably should keep Metadata File Producer and Recovery as external. The main point is that the each "bubble" in the drawing should act asynchronously from the others and not block. Eg, the handler should give the ProcessDispatcher notification and that should return immediately.