Weighted Events

Next: HBOOK routines Up: Fitting with finite Previous: Other points concerning

Weighted Events

In some problems it is necessary to apply weights to the Monte Carlo data before comparing it with the real data.

An example occurs when events are detected with some efficiency, which differs from bin to bin, and the form of which is known exactly. Rather than reject MC events on a random basis, it is more effective to include them all, weighted by the appropriate efficiency.

Another such instance arises if MC data has been generated according to one function, but another one is desired. For example, data on $p, dE dx, cos&thetas;$ may have been generated using some form of the Bethe-Bloch Equation

$dE dx= F$ ₀(p,&thetas;,m_j)

and with hindsight it is realised that some other form $F$ ₁(p,&thetas;,m_j)

is more correct. This can be accomodated by weighting each bin by

$w$ _ji=F₁/ F₀

In such a case the predicted number of events in each bin is modified and Equation becomes $f$ _i= ∑_j=1^mp_jw_jiA_ji

The likelihood function of Equation is unchanged. The differentials of Equation become $∑$ _i=1ⁿ(d_if_i-1) w_jiA_ji=0∀j

and the differentials with respect to the $A$ _ji give the equivalents of Equations and .

$A$ _ji= a_ji1 + p_jw_jit_i

$d$ _i1 - t_i= f_i= ∑_jp_jw_jia_ji1 + p_jw_jit_i

The solution of these 4 sets of equations proceeds as before. Notice that, as one would expect, if $w$ _ji is the same for all i, then this merely amounts to a scaling of the Monte Carlo strength $p$ _j .

So far this assumes that the weight is the same for all events from a given source in a given bin: the quantity $w$ _ji . This may not be the case if either (a) the bin size is so large that the weight factor varies significantly within the bin or (b) the weight factor depends not only on the variable(s) x used in making the comparison but also on some other variable(s) -- call it z -- which is not binned and used in the comparison; perhaps it does not exist in the real data. In either case the weights of different events from the same source in the same bin can be different.

In such a case the Equations -- still apply, with $w$ _ji equal to the ideal average weight for source j in bin i. This may be a known quantity: more likely it has to be estimated using the actual weights attached to the MC data values themselves.

At this point one has to worry whether the discrepancy between the average actual weight and the true average weight should be included in the fitting procedure, estimation and errors. Now in practice this method of weighting only works satisfactorily if the weights do not differ very much. The variance of a sum of weights from Poisson sources is $∑$ _iw_i² [18] and thus the proportional error on the bin contents $∑$ _iw_i²/∑_iw_i is greater than the $1/ N$ obtained from unweighted Poisson statistics, and this effect get worse as the spread of weights, $w$ ²- w ² , gets larger. Fluctuations in a small number of events with large weights will swamp the information obtained from low weight events. Thus in any application the spread in weights for a source in a bin should be small, and this means that the resulting uncertainty in its value will also be small.

Some insight can be gained by noting that in the set of Equations -- the weights $w$ _ji always appear together with the $p$ _j . (Equation can be multiplied by $p$ _j to make this explicit). Thus if the weights are all too high by some factor the strengths will be low by exactly the same factor. So the error in the $p$ _j

estimates resulting from uncertainties in the weights is of the same order as that uncertainty, and in any application this should be small.

In HMCINI, the weight distributions provided are normalised so that

$∑$ _iw_jia_ji= ∑_ia_ji= N_j

and the normalisation factors (These should always be 1 unless there is an efficiency component to the distribution) are preserved. Inclusion of weights is then regarded as a two stage problem.

The fitting of the reweighted Monte Carlo distributions to the data distribution, in which the Monte Carlo normalisation is preserved, to find a set of fractions $P$ _j' . These correspond to the fractions of each source actually present in the data sample you provide.
The transformation of the $P$ _j' into the $P$ _j , the fractions of each source which were present in the data sample before the efficiencies were applied to the data.

This is implemented as follows. The user (or HMCMLL) calls HMCLNL with the $P$ _j . These are transformed into the $P$ _j' within HMCLNL using the normalisation factors $α$ _j calculated by HMCINI.

$P$ _j'= {∑_jP_j∑_jα_jP_j}xα_jP_j

where

$α$ _j= {∑_iw_jia_jiN_j}

and HMCLNL calculates the correct log-likelihood using the $P$ _j'

and the normalised weight distributions.

Next: HBOOK routines Up: Fitting with finite Previous: Other points concerning

Janne Saarela
Tue May 16 09:09:27 METDST 1995