### Added general section

parent b440399b
Pipeline #1203 passed with stage
in 12 seconds
general.tex 0 → 100644
 %!TEX root=manual \section{General aspects of using \mcmule} \label{sec:general} In this section, we will collect a few general points of interest regarding \mcmule{}. In particular, we will discuss heuristics on how much statistics is necessary for different contributions in Section~\ref{sec:stat}. This is followed by a more in-depth discussion of the analysis strategy in Section~\ref{sec:analysis}. \input{general/stat} \input{general/analysis}
 %!TEX root=manual \subsection{Analysis} \label{sec:analysis}\setcounter{enumi}{4} Once the Monte Carlo has run, an offline analysis of the results is required. This entails loading, averaging, and combining the data. This is automatised in {\tt pymule} but the basic steps are \begin{enumerate} \setcounter{enumi}{-1} \item Load the data into a suitable analysis framework such as {\tt python}. \item Combine the different random seeds into one result per contribution and $\xc$. The $\chi^2/{\rm d.o.f.}$ of this merging must be small. Otherwise, try to increase the statistics or choose of different phase-space parametrisation. \item Add all contributions that combine into one of the physical contributions~\eqref{eq:nellocomb:b}. This includes any partitioning done in Section~\ref{sec:ps}. \item (optional) At N$^\ell$LO, perform a fit\footnote{Note that it is important to perform the fit after combining the phase-space partitionings (cf. Section~\ref{sec:ps}) but before adding~\eqref{eq:nellocomb:a} as this model is only valid for the terms of~\eqref{eq:nellocomb:b}} \begin{align} \sigma_{n+j}^{(\ell)} = c_0^{(j)} + c_1^{(j)} \log\xc + c_2^{(j)} \log^2\xc + \cdots + c_\ell^{(j)} \log^\ell = \sum_{i=0}^\ell c_i^{(j)}\log^i\xc\,. \label{eq:xifit} \end{align} This has the advantage that it very clearly quantifies any residual $\xc$ dependence. We will come back to this issue in Section~\ref{sec:xicut}. \item Combine all physical contributions of~\eqref{eq:nellocomb:a} into $\sigma^{(\ell)}(\xc)$ which has to be $\xc$ independent. \item Perform detailed checks on $\xc$ independence. This is especially important on the first time a particular configuration is run. Beyond \ac{NLO}, it is also extremely helpful to check whether the sum of the fits~\eqref{eq:xifit} is compatible with a constant, i.e. whether for all $1\le i\le\ell$ \begin{align} \Bigg| \frac{\sum_{j=0}^\ell c_i^{(j)} } {\sum_{j=0}^\ell \delta c_i^{(j)} } \Bigg| < 1\,, \label{eq:xifitsum} \end{align} where $\delta c_i^{(j)}$ is the error estimate on the coefficient $c_i^{(j)}$.\footnote{Note that the error estimate on the sum of the total coefficients in \eqref{eq:xifitsum} is rather poor and does not include correlations between different $c_i$.} {\tt pymule}'s {\tt mergefkswithplot} can be helpful here. If \eqref{eq:xifitsum} is not satisfied or only very poorly, try to run the Monte Carlo again with an increased $n$. \item Merge the different estimates of~\eqref{eq:nellocomb:a} from the different $\xc$ into one final number $\sigma^{(\ell)}$. The $\chi^2/{\rm d.o.f.}$ of this merging must be small. \item Repeat the above for any distributions produced, though often bin-wise fitting as in Point 3 is rarely necessary or helpful. If a total cross section is $\xc$ independent but the distributions (or a cross section obtained after applying cuts) are not, this is a hint that the distribution (or the applied cuts) is not IR safe. \end{enumerate} These steps have been almost completely automatised in {\tt pymule} and Mathematica. Though all steps of this pipeline could be easily implemented in any other language by following the specification of the file format below (Section~\ref{sec:vegasff}).
general/stat.tex 0 → 100644
 %!TEX root=manual \subsection{Statistics} \label{sec:stat} \mcmule{} is a Monte Carlo program. This means it samples the integrand at $N$ (pseudo-)random points to get an estimate for the integral. However, because it uses the adaptive Monte Carlo integration routine {\tt vegas}~\cite{Lepage:1980jk}, we split $N=i\times n$ into $i$ iterations ({\tt itmx}), each with $n$ points ({\tt nenter}). After each iteration, {\tt vegas} changes the way it will sample the next iteration based on the results of the previous one. Hence, the performance of the integration is a subtle interplay between $i$ and $n$ -- it is not sufficient any more to consider their product $N$. Further, we always perform the integration in two steps: a pre-conditioning with $i_\text{ad}\times n_\text{ad}$ ({\tt nenter\_ad} and {\tt itmx\_ad}, respectively), that is used to optimise the integration strategy and after which the result is discarded, and a main integration that benefits from the integrator's understanding of the integrand. Of course there are no one-size-fits-all rules of how to choose the $i$ and $n$ for pre-conditioning and main run. However, the following heuristics have proven helpful: \begin{itemize} \item $n$ is always much larger than $i$. For very simple integrands, $n=\mathcal{O}(10\cdot 10^3)$ and $i=\mathcal{O}(10)$. \item Increasing $n$ reduces errors that can be thought of as systematic because it allows the integrator to discover' new features of the integrand. Increasing $i$ on the other hand will rarely have that effect and only improves the statistical error. This is especially true for distributions. \item There is no real limit on $n$, except that it has to fit into the datatype used -- integrations with $n=\mathcal{O}(2^{31}-1)$ are not too uncommon -- while $i$ is rarely (much) larger than 100. \item For very stringent cuts it can happen that that typical values of $n_\text{ad}$ are too small for any point to pass the cuts. % This situation is refered to as a \aterm{Scarcity Condition due % to Restrictions in the Evaluation With User Problems}{SCREW-UP}. In this case {\tt vegas} will return {\tt NaN}, indicating that no events were found. Barring mistakes in the definition of the cuts, a pre-pre-conditioning with extremely large $n$ but $i=1\!-\!2$ can be helpful. \item $n$ also needs to be large enough for {\tt vegas} to reliably find all features of the integrand. It is rarely obvious that it did, though sometimes it becomes clear when increasing $n$ or looking at intermediary results as a function of the already-completed iterations. \item The main run should always have larger $i$ and $n$ than the pre-conditioning. Judging how much more is a delicate game though $i/i_\text{ad} = \mathcal{O}(5)$ and $n/n_\text{ad} = \mathcal{O}(10\!-\!50)$ have been proven helpful. \item If, once the integration is completed, the result is unsatisfactory, take into account the following strategies \begin{itemize} \item A large $\chi^2/\rm{d.o.f.}$ indicates a too small $n$. Try to increase $n_\text{ad}$ and, to a perhaps lesser extent, $n$. \item Increase $i$. Often it is a good idea to consciously set $i$ to a value so large that the integrator will never reach it and to keep looking at intermediary' results. \item If the error is small enough for the application but the result seems incorrect (for example because the $\xc$ dependence does not vanish), massively increase $n$. \end{itemize} \item Real corrections need much more statistics in both $i$ and $n$ ($\mathcal{O}(10)$ times more for $n$, $\mathcal{O}(2)$ for $i$) than the corresponding \ac{LO} calculations because of the higher-dimensional phase-space. \item Virtual corrections have the same number of dimensions as the \ac{LO} calculation and can go by with only a modest increase to account for the added functional complexity. \item {\tt vegas} tends to underestimate the numerical error. \end{itemize} These guidelines are often helpful but should not be considered infallible as they are just that -- guidelines. \mcmule{} is not parallelised; however, because Monte Carlo integrations require a random seed anyway, it is possible to calculate multiple estimates of the same integral using different random seeds $z_1$ and combining the results obtained this way. This also allows to for a better, more reliable understanding of the error estimate.
 ... ... @@ -47,6 +47,7 @@ is based on~\cite{Ulrich:2020phd} for detailed background information.} \input{introduction} \input{structure} \input{run} \input{general} \bibliographystyle{JHEP} ... ...
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment