Prompt Data Process Consensus System

From DUNE
Jump to: navigation, search

Overview

Data streaming out of DAQ will be put into a buffer before being passed on to EOS storage. Some portions of it may be used locally for quick and basic monitoring. In turn, data in EOS will have more than one "consumer":

  • CASTOR, the CERN tape archival system.
  • Express, calibration and "deep monitoring" streams to be run at CERN.
  • System tasked with transferring the data to FNAL.

A system is needed to allow for multiple, independent processes to run promptly on raw data.

Use Case

A likely scenario:

1) Run control initiates a run

2) DAQ opens file 1 for writing

3) DAQ closes file 1 and opens file 2

4) N prompt processors open file 1 for reading, each of which runs in parallel:

a) copy to CERN EOS (big disk)
b) online monitoring
c) prompt processing
d) ad-hoc (human driven) usage of the file

For the CERN EOS copy a checksum of the copy should be calculated and compared to one made on the source.

A third level of workflow is likely to be started once the EOS copy (4a) checksum is shown to be correct:

e) copy file from CERN EOS to CERN CASTOR (tape)
f) copy file from CERN EOS to FNAL SAM dropbox

Only after all registered users (a-f) of the raw data file report successfully may the file be purged.

g) purge file 1

5) Run control stops run

6) Some run summary process runs on all files of the run.

7) Some online monitoring process runs on all output of 4b.

8) All registered processes report success and a purge process deletes file 1.


A possible solution

There are more than one solutions to consistently manage the collections of processes described above as well as the state of data being stored, transmitted and transformed. One of the more straightforward approaches is using a database with a schema design to support the use cases envisioned for the protoDUNE systems.