CERN Prototype
Materials and Meetings
- CERN Prototype Materials - collection of reference materials and history of this subject
- Current series of meetings at CERN (as of 2015)
- Measurements/analysis group
- DocDB 9989: Beam Group presentation 11/18/14.
- EHN1 Extension Coordination - CERN Neutrino Platfrom Project (sharepoint pages at CERN)
Expected Data Volume and Rates
Estimates in this area were developed over a period of time. Both data rate and volume are determined primarily by the number of tracks due to cosmic ray muons, recorded within the readout window, which is commensurate with the electron collection time in the TPC (~2ms).
For a quick summary of the data rates, data volume and related requirements see this presentation. A few numbers:
- Planned trigger rate: 200Hz
- Instantaneous data rate in DAQ: 1GB/s
- Sustained average: 200MB/s
The measurement program is still being updated, the total volume of data to be taken will be ~O(1PB).
Energy Scale and Resolution
In terms of detector characterization, some of the important parameters include energy scale and resolution for both single tracks and showers - hadronic and EM. Let's consider them first (using comments from T.Junk):
- Energy scale: for Gaussian distribution, uncertainty will be sigma/sqrt(n). Assuming resolution of 1%, and aiming for ±0.1% precision, only 100 events would be needed.
- Hadronic showers: older calorimeters had resolution of 80%/sqrt(E). Since sampling fraction in LAr TPC is higher, we are likely to do better than this, but still conservatively assume O(10%)/sqrt. Qualitatively, we can follow arguments similar to the previous item. It follows then that O(103-104) events will be enough for the purposes of this measurement. Indeed, looking at typical test beam and calibration practices (per papers published), we see that 104 events is the typical statistics for a given incident beam momentum.
In summary, depending on case, this part of the measurement program can be accomplished with event sample of the size ~O(103-104), and in some cases less.
Measuring the "fake" rate, i.e. particle mis-identification, is important for certain physics to be addressed by the experiment (cf. proton decay). If the probability of mis-PID is "p", then the statistical uncertainty can be expressed as sqrt(p*(1-p)/n). This can also be understood in terms of precise measurements of "tails" in certain distributions. If we are looking at probabilities of mis-identification of the order of 10-6, this translates into quite substantial statistics. At the time of writing, we need more guidance in this area, but in general it appears that in this case we would indeed be motivated to take as much data as practically feasible. This does mean that we will aim to take a few million events in each of a few momentum bins (TBD).
- Reading: Nucleon Decay Searches
Let's assume we want to measure a signal with 5% precision and signal to background ratio is 1:10. To provide this scale of accuracy, we will need ~105 events in the momentum bin of interest.
By looking at the statistics needed for each bin and type of measurement (see the table above) and the respective event size estimate, we can estimate the total volume of data to be handled.
It will be possible to meet the requirements of most items on the program of measurements collecting the data on the scale of a few hundred TB. There is one challenging item on the list, which is the "fake" particle rate (i.e. the rate of mis-identification). It is likely that to perform a satisfactory measurement of this type, we will need to collect a few hundred more TB of data. The current estimate then, still imprecise, is of the order of 1PB.
Software and Computing
As of March 2015, this is work in progress. In accordance with common requirements, we anticipate preserving three copies of "precious" data to be collected during the experiment. One primary copy would be stored on tape at CERN, another at FNAL and auxiliary copies will be shared between US sites e.g. BNL and NERSC. There are proposal to reuse software which was proven in IceCube and Daya Bay experiments, to move data between CERN and the US with appropriate degree of automation, error checking and correction, and monitoring.
The salient point of the Software and Computing plan is near-time processing and monitoring of data quality, including full tracking in express production streams. This can be done on a subset of the raw data. At the same time, a rough estimate indicates that for off-line processing, ~5000 cores will be sufficient to process data with about same speed as it is collected.
Handling the data
Storage at CERN
In early 2000s, the CASTOR system was deployed at CERN which provides front-end to mass storage, in the form of both tape and disk pools. In early 2010s, the disk pools were largely migrated to EOS, a newer and high-performance system which has better functionality for managing large disk pools. CASTOR is still used for custodial data on tape.
EOS is derived from xrootd and root files are accessible natively.
Links of interest
Note: some of these links may be restricted to users associated with respective LHC experiments. This will be resolved at a later date (i.e. relevant and public information extracted, reduced and systematized).
- Overview of LHC storage operations at CERN
- Beginner's Tutorial for EOS
- Technology and Storage Infrastructure group at CERN
- CDR (Central Data Recording, not to be confused with Conceptual Design Report)
- Misc
- Some Older but informative links