Difference between revisions of "CERN Prototype"

From DUNE
Jump to navigation Jump to search
Line 141: Line 141:
 
* ATLAS documentation
 
* ATLAS documentation
 
** [https://twiki.cern.ch/twiki/bin/view/AtlasComputing/TierZeroExpertOnCallNotes Tier-0 expert on call notes]
 
** [https://twiki.cern.ch/twiki/bin/view/AtlasComputing/TierZeroExpertOnCallNotes Tier-0 expert on call notes]
* *CDR* (Central Data Recording, not to be confused with Conceptual Design Report)
+
* ***CDR*** (Central Data Recording, not to be confused with Conceptual Design Report)
 
** [https://twiki.cern.ch/twiki/bin/view/FIOgroup/CDR_Problem CDR Troubleshooting]
 
** [https://twiki.cern.ch/twiki/bin/view/FIOgroup/CDR_Problem CDR Troubleshooting]
 
** [https://twiki.cern.ch/twiki/bin/view/FIOgroup/CDR_Config CDR Configuration]
 
** [https://twiki.cern.ch/twiki/bin/view/FIOgroup/CDR_Config CDR Configuration]
 
** [https://twiki.cern.ch/twiki/bin/view/P326/CDR An older CDR configuration example]
 
** [https://twiki.cern.ch/twiki/bin/view/P326/CDR An older CDR configuration example]

Revision as of 23:55, 30 June 2015

Select Documents and Meetings

  • 2015
    • DocDB 10385: Brief review of Computing Requirements for the test
    • DocDB 10428: Draft proposal for the test and comments on the document

Expected Data Volume

Event Size Estimates

Measurement categories are based on slide 9 in DocDB 9993. Estimated events sizes are based on Monte Carlo simulations run previously for the 10kt version of the Far Detector, and are only meant to be precise within an order of magnitude. However, simple channel count and ADC digitization rate considerations do confirm that this is the right scale for the data produced per event (with zero suppression). Based on a MC data point for electrons (including showers) and experience in uBooNE, we can also conclude that the event sizes for single tracks and showers of comparable total energy (contained in the detector) would not be widely different.

Particle Type Momentum Range (GeV/c) Bin (MeV/c) Approx. event size (MB)
p 0.1-2.0 100 1
p 2.0-10.0 200 5
π± 0.1-2.0 100 1
π± 2.0-10.0 200 5
μ± 0.1-1.0 50 1
μ± 1.0-10.0 200 5
e± 0.1-2.0 100 1
e± 2.0-10.0 200 4
K+ 0.1-1.0 100 1
γ(π0) 0.1-2.0 100 1
γ(π0) 2.0-5.0 200 5


Statistics

Energy Scale and Resolution

In terms of detector characterization, some of the important parameters include energy scale and resolution for both single tracks and showers - hadronic and EM. Let's consider them first (using comments from T.Junk):

  • Energy scale: for Gaussian distribution, uncertainty will be sigma/sqrt(n). Assuming resolution of 1%, and aiming for ±0.1% precision, only 100 events would be needed.
  • Hadronic showers: older calorimeters had resolution of 80%/sqrt(E). Since sampling fraction in LAr TPC is higher, we are likely to do better than this, but still conservatively assume O(10%)/sqrt. Qualitatively, we can follow arguments similar to the previous item. It follows then that O(103-104) events will be enough for the purposes of this measurement. Indeed, looking at typical test beam and calibration practices (per papers published), we see that 104 events is the typical statistics for a given incident beam momentum.

In summary, depending on case, this part of the measurement program can be accomplished with event sample of the size ~O(103-104), and in some cases less.

PID

Measuring the "fake" rate, i.e. particle mis-identification, is important for certain physics to be addressed by the experiment (cf. proton decay). If the probability of mis-PID is "p", then the statistical uncertainty can be expressed as sqrt(p*(1-p)/n). This can also be understood in terms of precise measurements of "tails" in certain distributions. If we are looking at probabilities of mis-identification of the order of 10-6, this translates into quite substantial statistics. At the time of writing, we need more guidance in this area, but in general it appears that in this case we would indeed be motivated to take as much data as practically feasible. This does mean that we will aim to take a few million events in each of a few momentum bins (TBD).

Systematics

Let's assume we want to measure a signal with 5% precision and signal to background ratio is 1:10. To provide this scale of accuracy, we will need ~105 events in the momentum bin of interest.

Summary

By looking at the statistics needed for each bin and type of measurement (see the table above) and the respective event size estimate, we can estimate the total volume of data to be handled.

It will be possible to meet the requirements of most items on the program of measurements collecting the data on the scale of a few hundred TB. There is one challenging item on the list, which is the "fake" particle rate (i.e. the rate of mis-identification). It is likely that to perform a satisfactory measurement of this type, we will need to collect a few hundred more TB of data. The current estimate then, still imprecise, is of the order of 1PB.

Software and Computing

Intro

As of March 2015, this is work in progress. In accordance with common requirements, we anticipate preserving three copies of "precious" data to be collected during the experiment. One primary copy would be stored on tape at CERN, another at FNAL and auxiliary copies will be shared between US sites e.g. BNL and NERSC. There are proposal to reuse software which was proven in IceCube and Daya Bay experiments, to move data between CERN and the US with appropriate degree of automation, error checking and correction, and monitoring.

The salient point of the Software and Computing plan is near-time processing and monitoring of data quality, including full tracking in express production streams. This can be done on a subset of the raw data. At the same time, a rough estimate indicates that for off-line processing, ~5000 cores will be sufficient to process data with about same speed as it is collected.

Handling the data

Links of interest

Note: some of these links may be restricted to users associated with respective LHC experiments. This will be resolved at a later date (i.e. relevant and public information extracted, reduced and systematized).