In some problems it is necessary to apply weights to the Monte Carlo data before comparing it with the real data.
An example occurs when events are detected with some efficiency, which differs from bin to bin, and the form of which is known exactly. Rather than reject MC events on a random basis, it is more effective to include them all, weighted by the appropriate efficiency.
Another such instance arises if MC data has been generated according to one function, but another one is desired. For example, data on may have been generated using some form of the Bethe-Bloch Equation
and with hindsight it is realised that some other form
is more correct. This can be accomodated by weighting each bin by
In such a case the predicted number of events in each bin is modified and Equation becomes
The likelihood function of Equation is unchanged. The differentials of Equation become
and the differentials with respect to the give the equivalents of Equations and .
The solution of these 4 sets of equations proceeds as before. Notice that, as one would expect, if is the same for all i, then this merely amounts to a scaling of the Monte Carlo strength .
So far this assumes that the weight is the same for all events from a given source in a given bin: the quantity . This may not be the case if either (a) the bin size is so large that the weight factor varies significantly within the bin or (b) the weight factor depends not only on the variable(s) x used in making the comparison but also on some other variable(s) -- call it z -- which is not binned and used in the comparison; perhaps it does not exist in the real data. In either case the weights of different events from the same source in the same bin can be different.
In such a case the Equations -- still apply, with equal to the ideal average weight for source j in bin i. This may be a known quantity: more likely it has to be estimated using the actual weights attached to the MC data values themselves.
At this point one has to worry whether the discrepancy between the average actual weight and the true average weight should be included in the fitting procedure, estimation and errors. Now in practice this method of weighting only works satisfactorily if the weights do not differ very much. The variance of a sum of weights from Poisson sources is [18] and thus the proportional error on the bin contents is greater than the obtained from unweighted Poisson statistics, and this effect get worse as the spread of weights, , gets larger. Fluctuations in a small number of events with large weights will swamp the information obtained from low weight events. Thus in any application the spread in weights for a source in a bin should be small, and this means that the resulting uncertainty in its value will also be small.
Some insight can be gained by noting that in the set of Equations -- the weights always appear together with the . (Equation can be multiplied by to make this explicit). Thus if the weights are all too high by some factor the strengths will be low by exactly the same factor. So the error in the
estimates resulting from uncertainties in the weights is of the same order as that uncertainty, and in any application this should be small.
In HMCINI
, the weight distributions provided are normalised
so that
and the normalisation factors (These should always be 1 unless there is an efficiency component to the distribution) are preserved. Inclusion of weights is then regarded as a two stage problem.
This is implemented as follows.
The user (or HMCMLL) calls HMCLNL with the
. These are transformed
into the
within HMCLNL using the normalisation factors
calculated by HMCINI
.
where
and HMCLNL calculates the correct log-likelihood using the
and the normalised weight distributions.