The calculations in routine HDIFF are based on the Kolmogorov Test (See, e.g. [11], pages 269-270). It is usually superior to the better-known Chisquare Test for the following reasons:
In discussing the Kolmogorov test, we must distinguish between the two most important properties of any test: its power and the calculation of its confidence level.
The job of a statistical test is to distinguish between a null hypothesis (in this case: that the two histograms are compatible) and the alternative hypothesis (in this case: that the two are not compatible). The power of a test is defined as the probability of rejecting the null hypothesis when the alternative is true. In our case, the alternative is not well-defined (it is simply the ensemble of all hypotheses except the null) so it is not possible to tell whether one test is more powerful than another in general, but only with respect to certain particular deviations from the null hypothesis. Based on considerations such as those given above, as well as considerable computational experience, it is generally believed that tests like the Kolmogorov or Smirnov-Cramer-Von-Mises (which is similar but more complicated to calculate) are probably the most powerful for the kinds of phenomena generally of interest to high-energy physicists. This is especially true for two-dimensional data where the Chisquare Test is of little practical use since it requires either enormous amounts of data or very big bins.
Using the terms introduced above, the confidence level is just
the probability of rejecting the null hypothesis when it
is in fact true. That is, if you accept the two histograms
as compatible whenever the value of PROB
is greater than 0.05,
then truly compatible histograms should fail the test
exactly 5% of the time.
The value of PROB
returned by HDIFF is calculated such that
it will be uniformly distributed between zero and one
for compatible histograms, provided the
data are not binned (or the number of bins is very large compared
with the number of events).
Users who have access to unbinned data and wish exact confidence
levels should therefore not put their data into histograms,
but should save them in ordinary Fortran arrays and call the
routine TKOLMO which is being introduced into the Program Library.
On the other hand, since HBOOK
is a convenient way of collecting
data and saving space, the routine HDIFF has been provided,
and we believe it is the best test for comparison even on binned
data. However, the values of PROB
for binned data will be shifted
slightly higher than expected, depending on the effects of the
binning.
For example, when comparing two uniform distributions of 500
events in 100 bins, the values of PROB
, instead of being
exactly uniformly distributed between zero and one,
have a mean value of about 0.56.
Since we are physicists, we can apply a useful rule:
As long as the bin width is small compared with any significant
physical effect (for example the experimental resolution)
then the binning cannot have an important effect.
Therefore,
we believe that for all practical purposes, the probability value
PROB
is calculated correctly provided the user is aware that:
PROB
should not be expected to have
exactly the correct distribution for binned data.
PROB
slightly too big. That is, setting an acceptance criterion of
(PROB>0.05
will assure that at most
5% of truly
compatible histograms are rejected, and usually somewhat less.
The Kolmogorov Test for 2-dimensional data is not as well
understood as for one dimension.
The basic problem is that it requires the unbinned data to be
ordered, which is easy in one dimension, but is not
well-defined
(i.e. not scale-invariant) in higher dimensions.
Paradoxically, the binning which was a nuisance in one dimension
is now very useful, since it enables us to define
an obvious ordering.
In fact there are two obvious orderings (horizontal and vertical)
which give rise to two (in general different) Kolmogorov
distance measures.
Routine HDIFF takes the average of the two distances
to calculate the probability value PROB
,
which gives very satisfactory results.
The precautions necessary for 1-dimensional data also apply to this case.