Statistical Considerations

Next: Bin by bin Up: Statistical differences between Previous: Weights and Saturation

Statistical Considerations

The Kolmogorov Test

The calculations in routine HDIFF are based on the Kolmogorov Test (See, e.g. [11], pages 269-270). It is usually superior to the better-known Chisquare Test for the following reasons:

It does not require a minimum number of events per bin, and in fact it is intended for unbinned data (this is discussed below).
It takes account not only of the differences between corresponding bins, but also the sign of the difference, and in particular it is sensitive to a sequence of consecutive deviations of the same sign.

In discussing the Kolmogorov test, we must distinguish between the two most important properties of any test: its power and the calculation of its confidence level.

The job of a statistical test is to distinguish between a null hypothesis (in this case: that the two histograms are compatible) and the alternative hypothesis (in this case: that the two are not compatible). The power of a test is defined as the probability of rejecting the null hypothesis when the alternative is true. In our case, the alternative is not well-defined (it is simply the ensemble of all hypotheses except the null) so it is not possible to tell whether one test is more powerful than another in general, but only with respect to certain particular deviations from the null hypothesis. Based on considerations such as those given above, as well as considerable computational experience, it is generally believed that tests like the Kolmogorov or Smirnov-Cramer-Von-Mises (which is similar but more complicated to calculate) are probably the most powerful for the kinds of phenomena generally of interest to high-energy physicists. This is especially true for two-dimensional data where the Chisquare Test is of little practical use since it requires either enormous amounts of data or very big bins.

The Confidence Level for 1-dimensional data

Using the terms introduced above, the confidence level is just the probability of rejecting the null hypothesis when it is in fact true. That is, if you accept the two histograms as compatible whenever the value of PROB is greater than 0.05, then truly compatible histograms should fail the test exactly 5% of the time. The value of PROB returned by HDIFF is calculated such that it will be uniformly distributed between zero and one for compatible histograms, provided the data are not binned (or the number of bins is very large compared with the number of events). Users who have access to unbinned data and wish exact confidence levels should therefore not put their data into histograms, but should save them in ordinary Fortran arrays and call the routine TKOLMO which is being introduced into the Program Library. On the other hand, since HBOOK is a convenient way of collecting data and saving space, the routine HDIFF has been provided, and we believe it is the best test for comparison even on binned data. However, the values of PROB for binned data will be shifted slightly higher than expected, depending on the effects of the binning. For example, when comparing two uniform distributions of 500 events in 100 bins, the values of PROB, instead of being exactly uniformly distributed between zero and one, have a mean value of about 0.56. Since we are physicists, we can apply a useful rule: As long as the bin width is small compared with any significant physical effect (for example the experimental resolution) then the binning cannot have an important effect. Therefore, we believe that for all practical purposes, the probability value PROB is calculated correctly provided the user is aware that:

The value of PROB should not be expected to have exactly the correct distribution for binned data.
The user is responsible for seeing to it that the bin widths are small compared with any physical phenomena of interest.
The effect of binning (if any) is always to make the value of PROB slightly too big. That is, setting an acceptance criterion of (PROB>0.05 will assure that at most 5% of truly compatible histograms are rejected, and usually somewhat less.

The Confidence Level for Two-dimensional Data

The Kolmogorov Test for 2-dimensional data is not as well understood as for one dimension. The basic problem is that it requires the unbinned data to be ordered, which is easy in one dimension, but is not well-defined (i.e. not scale-invariant) in higher dimensions. Paradoxically, the binning which was a nuisance in one dimension is now very useful, since it enables us to define an obvious ordering. In fact there are two obvious orderings (horizontal and vertical) which give rise to two (in general different) Kolmogorov distance measures. Routine HDIFF takes the average of the two distances to calculate the probability value PROB, which gives very satisfactory results. The precautions necessary for 1-dimensional data also apply to this case.

Next: Bin by bin Up: Statistical differences between Previous: Weights and Saturation

Janne Saarela
Tue May 16 09:09:27 METDST 1995

Statistical Considerations

The Kolmogorov Test

The Power

The Confidence Level for 1-dimensional data

The Confidence Level for Two-dimensional Data