|
Stephanie's Homepage
|
Level 5
Level 5 cuts are done in ROOT, and are run by me using local computers at Canterbury.
Cut variables
Level 5 uses multivariate analysis (TMVA), a machine learning algorithm with ROOT. There are three precuts followed by the training of the BDT at this level:
- Z vertex position: CredoFit4_Pos_Z > -450 metres and < 450 metres
- String containment: &rarr This precut is due to the IC40 geometry because many background events (especially corner clippers) survive outside the detector volume.
- DOM charge contaniment: &rarr DOM with the largest charge must not be on an outer string
Development of cuts
Figure 1 shows the first precut for TMVA, the reconstructed z vertex position. Figure 2 shows an arial view of the reconstructed x and y vertex position which illustrates the effect of the two containment precuts for TMVA. After these precuts the training and testing for TMVA is run. The output from the BDT is cut on in the next level.
 |
 |
 |
 |
Figure 1: Reconstructod z vertex position using 4 iteration credo reconstruction. The cuts are shown at CredoFit4_Pos_Z > -450 metres and CredoFit4_Pos_Z < 450 metres in black. Figure 2: Reconstructod x and y vertex positions using 4 iteration credo reconstruction. The cut is on the outer strings and are shown in black. a) Before precuts. b After String containment. c After DOM charge containment.
The passing rates for level 5 are shown in Table 1.
| . | Trigger Rate (Hz) | Level 2 Rate (Hz) | Level 3 Rate (Hz) | Level 4 Rate (Hz) | Level 5 Rate (Hz) |
| Experimental data | 1500 | 16.3 (1.1%) | 1.75 (10.7%) | 2.54 × 10-2 (1.5%) | 2.09 × 10-3 (8.21%) |
| Monte Carlo | 1270 | 12.5 (1.0%) | 0.92 (7.4%) | 3.30 × 10-2 (3.6%) | 2.49 × 10-3 (7.54%) |
| E-2 signal | 2.55 × 10-4 | 1.48 × 10-4 (58.0%) | 1.15 × 10-4 (77.9%) | 5.55 × 10-5 (48.2%) | 1.83 × 10-5 (39.97%) |
Table 1: Passing rates for level 5.
The cut variables used by TMVA (run after the precuts) are as follows:
- Z Vertex Position: CredoFit4_Pos_Z
- Zenith Track Direction: SPEFit32_Zenith
- Track Reduced Log Likelihood: SPEFit32_rlogl
- Linefit Velocity LineFit_LFVel
- Eigenvalue Ratio: PoleToI_evalratio
- Fill Ratio: SDM1_FillRatioFromMeanPlusRMS
- Time Vertex Split: SplitSPECascadeLlhVertex2_Time-SplitSPECascadeLlhVertex1_Time
- Split Containment: &radic [(SplitSPECascadeLlhVertex1_Pos_X)2+(SplitSPECascadeLlhVertex1_Pos_Y)2+(SplitSPECascadeLlhVertex1_Pos_Z)2]
All available standard CORSIKA (excluding two-component) is used in training, testing and evaluating stages of TMVA. 2,000 files of electron neutrino E-1 signal (datasets 2182 and 2510) are used for training and testing, the remaining 6,000 files (dataset 3221) are used in the evaluation. Below are the plots from TMVA, including input variables, correlation matrices, overtraining checks and cut efficiencies.
 |
 |
Figure 3: Variables used in TMVA.
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
Figure 4: Variables used in TMVA in both log and linear (normalised to one) scale. These distributions are shown for after the precuts.
 |
 |
Figure 5: Correlation Matrices for TMVA. a) Signal. b) Background.
 |
 |
 |
Figure 6: Other TMVA plots. a) Overtraining check. b) Cut efficiencies. c) ROC curve.
|