stat.tex 4.56 KB
Newer Older
 ulrich_y committed Jul 03, 2020 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 %!TEX root=manual \subsection{Statistics} \label{sec:stat} \mcmule{} is a Monte Carlo program. This means it samples the integrand at $N$ (pseudo-)random points to get an estimate for the integral. However, because it uses the adaptive Monte Carlo integration routine {\tt vegas}~\cite{Lepage:1980jk}, we split $N=i\times n$ into $i$ iterations ({\tt itmx}), each with $n$ points ({\tt nenter}). After each iteration, {\tt vegas} changes the way it will sample the next iteration based on the results of the previous one. Hence, the performance of the integration is a subtle interplay between $i$ and $n$ -- it is not sufficient any more to consider their product $N$. Further, we always perform the integration in two steps: a pre-conditioning with $i_\text{ad}\times n_\text{ad}$ ({\tt nenter\_ad} and {\tt itmx\_ad}, respectively), that is used to optimise the integration strategy and after which the result is discarded, and a main integration that benefits from the integrator's understanding of the integrand. Of course there are no one-size-fits-all rules of how to choose the $i$ and $n$ for pre-conditioning and main run. However, the following heuristics have proven helpful: \begin{itemize} \item $n$ is always much larger than $i$. For very simple integrands, $n=\mathcal{O}(10\cdot 10^3)$ and $i=\mathcal{O}(10)$. \item Increasing $n$ reduces errors that can be thought of as systematic because it allows the integrator to discover' new features of the integrand. Increasing $i$ on the other hand will rarely have that effect and only improves the statistical error. This is especially true for distributions. \item There is no real limit on $n$, except that it has to fit into the datatype used -- integrations with $n=\mathcal{O}(2^{31}-1)$ are not too uncommon -- while $i$ is rarely (much) larger than 100. \item For very stringent cuts it can happen that that typical values of $n_\text{ad}$ are too small for any point to pass the cuts. % This situation is refered to as a \aterm{Scarcity Condition due % to Restrictions in the Evaluation With User Problems}{SCREW-UP}. In this case {\tt vegas} will return {\tt NaN}, indicating that no events were found. Barring mistakes in the definition of the cuts, a pre-pre-conditioning with extremely large $n$ but $i=1\!-\!2$ can be helpful. \item $n$ also needs to be large enough for {\tt vegas} to reliably find all features of the integrand. It is rarely obvious that it did, though sometimes it becomes clear when increasing $n$ or looking at intermediary results as a function of the already-completed iterations. \item The main run should always have larger $i$ and $n$ than the pre-conditioning. Judging how much more is a delicate game though $i/i_\text{ad} = \mathcal{O}(5)$ and $n/n_\text{ad} = \mathcal{O}(10\!-\!50)$ have been proven helpful. \item If, once the integration is completed, the result is unsatisfactory, take into account the following strategies \begin{itemize} \item A large $\chi^2/\rm{d.o.f.}$ indicates a too small $n$. Try to increase $n_\text{ad}$ and, to a perhaps lesser extent, $n$. \item Increase $i$. Often it is a good idea to consciously set $i$ to a value so large that the integrator will never reach it and to keep looking at intermediary' results. \item If the error is small enough for the application but the result seems incorrect (for example because the $\xc$ dependence does not vanish), massively increase $n$. \end{itemize} \item Real corrections need much more statistics in both $i$ and $n$ ($\mathcal{O}(10)$ times more for $n$, $\mathcal{O}(2)$ for $i$) than the corresponding \ac{LO} calculations because of the higher-dimensional phase-space. \item Virtual corrections have the same number of dimensions as the \ac{LO} calculation and can go by with only a modest increase to account for the added functional complexity. \item {\tt vegas} tends to underestimate the numerical error. \end{itemize} These guidelines are often helpful but should not be considered infallible as they are just that -- guidelines. \mcmule{} is not parallelised; however, because Monte Carlo integrations require a random seed anyway, it is possible to calculate multiple estimates of the same integral using different random seeds $z_1$ and combining the results obtained this way. This also allows to for a better, more reliable understanding of the error estimate.