CERN Prototype

From DUNE
Jump to navigation Jump to search

Materials and Meetings

Infrastructure

Expected Data Volume and Rates

Estimates in this area were developed over a period of time. Both data rate and volume are determined primarily by the number of tracks due to cosmic ray muons, recorded within the readout window, which is commensurate with the electron collection time in the TPC (~2ms).

For a quick summary of the data rates, data volume and related requirements see:

A few numbers:

  • Planned trigger rate: 200Hz
  • Instantaneous data rate in DAQ: 1GB/s
  • Sustained average: 200MB/s

Based on this, the nominal network bandwidth required to link the DAQ to CERN storage elements is ~2GB/s. This is based on the essential assumption that zero suppression will be used in all measurements. There are considerations for taking some portion of the data in non-zs mode, which would require approximately 20GB/s connectivity. Since WA105 specified this as their requirement, DUNE-PT may be able to obtain a link in this range.

The measurement program is still being updated, the total volume of data to be taken will be ~O(1PB). Brief notes on the statistics can be found in Appendix II of the "Materials" page.

Software and Computing

Intro

Due to the short time available for data taking, the data to be collected during the experiment is considered "precious" (impossible or hard to reproduce) and redundant storage must be provided for such data. One primary copy would be stored on tape at CERN, another at FNAL and auxiliary copies will be shared between US sites e.g. BNL and NERSC. There is a proposal to reuse software which was proven in IceCube and Daya Bay experiments ("Spade"), to move data between CERN and the US with appropriate degree of automation, error checking and correction, and monitoring. Other systems may be utilized.

An effort will be made to implement near-time processing and monitoring of data quality, including full tracking in express production streams. This can be done on a subset of the raw data. In order to process data with about same speed as it is collected. At the same time, a very rough estimate indicates that for off-line processing, ~5000 cores would be sufficient to make a fist reconstruction pass of the data with the same velocity as it is received.

Handling the data

Storage at CERN

EOS is a high-performance distributed disk storage system based on XRootD. It is being used by major LHC experiments as the destinations to which DAQ is writing raw data.

CASTOR is the principal tape storage system at CERN. It does have a built-in disk layer, which was earlier utilized in production and other activities but this is no longer the case since this functionality is handled more efficiently by EOS. For that reason, the disk storage that exists in CASTOR serves as a buffer for I/O and system functions.

Historical references and more recent writeups can be found in the ATLAS section of the CERN Data Handling page.

DAQ Interface

ATLAS example

Useful links can be found on the page describing handling of LHC data at CERN. "SFO" means subfarm output processor. It's function is to receive event data which passed triggers of all levels and assemble it before committing to mass storage managed by CERN central services. SFO nodes use XFS file system for data storage.

CMS example