next up previous contents index
Next: Bin by bin Up: Statistical differences between Previous: Weights and Saturation

Statistical Considerations

 

The Kolmogorov Test

The calculations in routine HDIFF are based on the Kolmogorov Test     (See, e.g. [11], pages 269-270). It is usually superior to the better-known Chisquare Test for the following reasons:

In discussing the Kolmogorov test, we must distinguish between the two most important properties of any test: its power and the calculation of its confidence level.

The Power

  The job of a statistical test is to distinguish between a null hypothesis (in this case: that the two histograms are compatible) and the alternative hypothesis (in this case: that the two are not compatible). The power of a test is defined as the probability of rejecting the null hypothesis when the alternative is true. In our case, the alternative is not well-defined (it is simply the ensemble of all hypotheses except the null) so it is not possible to tell whether one test is more powerful than another in general, but only with respect to certain particular deviations from the null hypothesis. Based on considerations such as those given above, as well as considerable computational experience, it is generally believed that tests like the Kolmogorov or Smirnov-Cramer-Von-Mises   (which is similar but more complicated to calculate) are probably the most powerful for the kinds of phenomena generally of interest to high-energy physicists. This is especially true for two-dimensional data where the Chisquare Test is of little practical use since it requires either enormous amounts of data or very big bins.

The Confidence Level for 1-dimensional data

  Using the terms introduced above, the confidence level is just the probability of rejecting the null hypothesis when it is in fact true. That is, if you accept the two histograms as compatible whenever the value of PROB is greater than 0.05, then truly compatible histograms should fail the test exactly 5% of the time. The value of PROB returned by HDIFF is calculated such that it will be uniformly distributed between zero and one for compatible histograms, provided the data are not binned (or the number of bins is very large compared with the number of events). Users who have access to unbinned data and wish exact confidence levels should therefore not put their data into histograms, but should save them in ordinary Fortran arrays and call the routine TKOLMO which is being introduced into the Program Library. On the other hand, since HBOOK is a convenient way of collecting data and saving space, the routine HDIFF has been provided, and we believe it is the best test for comparison even on binned data. However, the values of PROB for binned data will be shifted slightly higher than expected, depending on the effects of the binning. For example, when comparing two uniform distributions of 500 events in 100 bins, the values of PROB, instead of being exactly uniformly distributed between zero and one, have a mean value of about 0.56. Since we are physicists, we can apply a useful rule: As long as the bin width is small compared with any significant physical effect (for example the experimental resolution) then the binning cannot have an important effect. Therefore, we believe that for all practical purposes, the probability value PROB is calculated correctly provided the user is aware that:

The Confidence Level for Two-dimensional Data

The Kolmogorov Test for 2-dimensional data is not as well understood as for one dimension. The basic problem is that it requires the unbinned data to be ordered, which is easy in one dimension, but is not well-defined (i.e. not scale-invariant) in higher dimensions. Paradoxically, the binning which was a nuisance in one dimension is now very useful, since it enables us to define an obvious ordering. In fact there are two obvious orderings (horizontal and vertical) which give rise to two (in general different) Kolmogorov distance measures. Routine HDIFF takes the average of the two distances to calculate the probability value PROB, which gives very satisfactory results. The precautions necessary for 1-dimensional data also apply to this case.


next up previous contents index
Next: Bin by bin Up: Statistical differences between Previous: Weights and Saturation

Janne Saarela
Tue May 16 09:09:27 METDST 1995