next up previous contents index
Next: Error messages of Up: Bin by bin Previous: Other notes:

Statistical methods and numerical notes:

(For simplicity, this is written as if the N option were in effect.)

The methods used for the S and C options are correct for unweighted events and Poisson statistics for 1- or 2- dimensional histograms. Errors may result in either the S and C options for small tolerances if bin contents are greater than the largest allowed integer.

For the S option with unweighted events, the test (which is uniformly most powerful) treats N = sum of the two bin contents as having chosen via a binomial distibution which histogram to enter. The binomial parameter p is given by the relative normalization of the histograms (0.5 if the total number of entries in each histogram was the same). For DIFFS values greater than TOL, the first two digits are correct. For values less than TOL, the two digits to the right of the first non-zero TOL digit are significant, i.e. for TOL=0.0001, 0.000xxx are significant. One can force higher accuracy by setting TOL smaller (or even 0), but calculation time will increase, and warning messages will be issued. A Gaussian approximation is used when there are 25 or more events in each bin, and TOL>0.001.

The C option for unweighted events in the data histogram simply calculates the Poisson probability of finding n, the ID2 bin value, given a mean equal to the bin value of ID1. A Gaussian approximation is used when the the mean is 106 or larger, and TOL is 0.001 or larger. Given the expected mean, the choice of TOL implies bounds (n<,n> ) on n (i.e. n within these bounds passes). An error occurs when the approximations used in calculating DIFFS give an incorrect value for n< or n> . No such errors occur for mean <105 and TOL >10-15 . The errors in n< or n> are less than 2 for mean <106 , TOL >10-6 , or mean <107 , TOL >10-5 . There is a maximum n beyond which DIFFS returns zero, so bins with n > nmax always fail. For mean <107 , this is irrelevant for values of TOL >10-9 .

For the profile histogram S option, HDIFFB calculates the t test probability that both bin means were produced from a population with the same mean. The C option calculates the probability of finding the value in ID1 given a Gaussian with μ and σ given by the ID2 contents. Small numbers of entries for either test give DIFFS values which are too large, and HDIFFB will reject too many events in profile histograms.

For weighted events, the S and C options use a Gaussian approximation. This results in DIFFS values which are too low. HDIFFB rejects too many bins for weighted events, particularly for small numbers of equivalent events.


next up previous contents index
Next: Error messages of Up: Bin by bin Previous: Other notes:

Janne Saarela
Tue May 16 09:09:27 METDST 1995