next up previous contents index
Next: HBOOK routines Up: Fitting with finite Previous: Other points concerning

Weighted Events

In some problems it is necessary to apply weights to the Monte Carlo data before comparing it with the real data.

An example occurs when events are detected with some efficiency, which differs from bin to bin, and the form of which is known exactly. Rather than reject MC events on a random basis, it is more effective to include them all, weighted by the appropriate efficiency.

Another such instance arises if MC data has been generated according to one function, but another one is desired. For example, data on p, dE dx, cos&thetas; may have been generated using some form of the Bethe-Bloch Equation

dE dx= F0(p,&thetas;,mj)

and with hindsight it is realised that some other form F1(p,&thetas;,mj)

is more correct. This can be accomodated by weighting each bin by

wji=F1/ F0

In such a case the predicted number of events in each bin is modified and Equation gif becomes fi= ∑j=1mpjwjiAji

The likelihood function of Equation gif is unchanged. The differentials of Equation gif become i=1n(difi-1) wjiAji=0∀j

and the differentials with respect to the Aji give the equivalents of Equations gif and gif.

Aji= aji1 + pjwjiti

di1 - ti= fi= ∑jpjwjiaji1 + pjwjiti

The solution of these 4 sets of equations proceeds as before. Notice that, as one would expect, if wji is the same for all i, then this merely amounts to a scaling of the Monte Carlo strength pj .

So far this assumes that the weight is the same for all events from a given source in a given bin: the quantity wji . This may not be the case if either (a) the bin size is so large that the weight factor varies significantly within the bin or (b) the weight factor depends not only on the variable(s) x used in making the comparison but also on some other variable(s) -- call it z -- which is not binned and used in the comparison; perhaps it does not exist in the real data. In either case the weights of different events from the same source in the same bin can be different.

In such a case the Equations gif -- gif still apply, with wji equal to the ideal average weight for source j in bin i. This may be a known quantity: more likely it has to be estimated using the actual weights attached to the MC data values themselves.

At this point one has to worry whether the discrepancy between the average actual weight and the true average weight should be included in the fitting procedure, estimation and errors. Now in practice this method of weighting only works satisfactorily if the weights do not differ very much. The variance of a sum of weights from Poisson sources is iwi2 [18] and thus the proportional error on the bin contents iwi2/∑iwi is greater than the 1/N obtained from unweighted Poisson statistics, and this effect get worse as the spread of weights, w2- w 2 , gets larger. Fluctuations in a small number of events with large weights will swamp the information obtained from low weight events. Thus in any application the spread in weights for a source in a bin should be small, and this means that the resulting uncertainty in its value will also be small.

Some insight can be gained by noting that in the set of Equations gif -- gif the weights wji always appear together with the pj . (Equation gif can be multiplied by pj to make this explicit). Thus if the weights are all too high by some factor the strengths will be low by exactly the same factor. So the error in the pj

estimates resulting from uncertainties in the weights is of the same order as that uncertainty, and in any application this should be small.

In HMCINI, the weight distributions provided are normalised so that

iwjiaji= ∑iaji= Nj

and the normalisation factors (These should always be 1 unless there is an efficiency component to the distribution) are preserved. Inclusion of weights is then regarded as a two stage problem.

  1. The fitting of the reweighted Monte Carlo distributions to the data distribution, in which the Monte Carlo normalisation is preserved, to find a set of fractions Pj' . These correspond to the fractions of each source actually present in the data sample you provide.
  2. The transformation of the Pj' into the Pj , the fractions of each source which were present in the data sample before the efficiencies were applied to the data.

This is implemented as follows. The user (or HMCMLL) calls HMCLNL with the Pj . These are transformed into the Pj' within HMCLNL using the normalisation factors αj calculated by HMCINI.

Pj'= {∑jPjjαjPj}xαjPj

where

αj= {∑iwjiajiNj}

and HMCLNL calculates the correct log-likelihood using the Pj'

and the normalised weight distributions.


next up previous contents index
Next: HBOOK routines Up: Fitting with finite Previous: Other points concerning

Janne Saarela
Tue May 16 09:09:27 METDST 1995