Commit 96bcd867 authored by ulrich_y's avatar ulrich_y
Browse files

Added technical section

parent 9d726720
Pipeline #1204 passed with stage
in 13 seconds
%!TEX root=thesis
FUNCTION SIGMA_1(x, wgt, ndim)
! The first random number x(1) is xi.
arr = x
! Generate momenta for the event using the function pointer ps
call gen_mom_fks(ps, x, masses(1:nparticle), vecs, weight)
! Whether unphysical or not, take the value of xi
xifix = xiout
! Check if the event is physical ...
if(weight > zero ) then
! and whether is passes the cuts
var = quant(vecs(:,1), vecs(:,2), vecs(:,3), vecs(:,4), ...)
cuts = any(pass_cut)
if(cuts) then
! Calculate the xi**2 * M_{n+1}^0 using the pointer matel
mat = matel(vecs(:,1), vecs(:,2), vecs(:,3), vecs(:,4), ...)
mat = xifix*weight*mat
sigma_1 = mat
end if
end if
! Check whether soft subtraction is required
if(xifix < xicut1) then
! Implement the delta function and regenerate events
arr(1) = 0._prec
call gen_mom_fks(ps, arr, masses(1:nparticle), vecs, weight)
! Check whether to include the counter event
if(weight > zero) then
var = quant(vecs(:,1), vecs(:,2), vecs(:,3), vecs(:,4), ...)
cuts = any(pass_cut)
if(cuts) then
mat = matel_s(vecs(:,1), vecs(:,2), vecs(:,3), vecs(:,4), ...)
mat = weight*mat/xifix
sigma_1 = sigma_1 - mat
%!TEX root=thesis
xi5 = ra(1)
y2 = 2*ra(2) - 1.
! generate electron q2 and photon q5 s.t. that the
! photon goes into z diractions
eme = energy*ra(3)
pme = sqrt(eme**2-m2**2)
q2 = (/ 0., pme*sqrt(1. - y2**2), pme*y2, eme /)
q5 = (/ 0., 0. , 1. , 1. /)
q5 = 0.5*energy*xi5*q5
! generate euler angles and rotate all momenta
euler_mat = get_euler_mat(ra(4:6))
q2 = matmul(euler_mat,q2)
q5 = matmul(euler_mat,q5)
qq34 = q1-q2-q5
minv34 = sqrt(sq(qq34))
! The event weight, note that a factor xi5**2 has been ommited
weight = energy**3*pme/(4.*(2.*pi)**4)
! generate remaining neutrino momenta
call pair_dec(ra(7:8),minv34,q3,m3,q4,m4,enough_energy)
weight = weight*0.125*sq_lambda(minv34**2,m3,m4)/minv34**2/pi
q3 = boost_back(qq34, q3)
q4 = boost_back(qq34, q4)
%!TEX root=thesis
! use a random number to decide how much energy should
! go into the first particle
minv3 = ra(1)*energy
! use two random numbers to generate the momenta of
! particles 1 and the remainder in the CMS frame
call pair_dec(ra(2:3),energy,q2,m2,qq3,minv3)
! adjust the Jacobian
weight = minv3*energy/pi
weight = weight*0.125*sq_lambda(energy**2,m2,minv3)/energy**2/pi
! use a random number to decide how much energy should
! go into the second particle
minv4 = ra(4)*energy
! use two random numbers to generate the momenta of
! particles 2 and the remainder in their rest frame
call pair_dec(ra(5:6),minv3,q3,m3,qq4,minv4)
! adjust the Jacobian
weight = weight*minv4*energy/pi
weight = weight*0.125*sq_lambda(minv3**2,m3,minv4)/minv3**2/pi
! repeat this process until all particles are generated
! boost all generated particles back into the CMS frame
q4 = boost_back(qq4, q4)
q5 = boost_back(qq4, q5)
q3 = boost_back(qq3, q3)
q4 = boost_back(qq3, q4)
q5 = boost_back(qq3, q5)
%!TEX root=thesis
\bf Off & \bf Len & \bf Type & \bf Var. & \bf Comment\\\hline
\tt 0030 &\tt 000C & \tt integer & \tt it & the current iteration\\\hline
\tt 003C &\tt 000C & \tt integer & \tt ndo & subdiv. on an axis\\\hline
\tt 0048 &\tt 0010 & \tt real & \tt si &$\sigma/(\delta\sigma)^2$\\\hline
\tt 0058 &\tt 0010 & \tt real & \tt swgt &$1/(\delta\sigma)^2$\\\hline
\tt 0068 &\tt 0010 & \tt real & \tt schi &$(1-{\tt it})\chi + \sigma^2/(\delta\sigma)^2$\\\hline
\tt 0078 &\tt 1A98 & \tt real(50,17) & \tt xi & the integration grid\\\hline
\tt 1B10 &\tt 000C & \tt integer & \tt randy & the current random number seed\\\hline
\tt 1B1C &\tt 0014 & \tt integer & $n_q$ & number of histograms\\
\tt & & \tt integer & $n_b$ & number of bins\\
\tt & & \tt integer & $n_s$ & len. histogram name\\\hline
\tt 1B30 &$10n_q+8$ & \tt real($n_q$) & \tt minv & lower bounds \\
\tt & & \tt real($n_q$) & \tt maxv & upper bounds \\\hline
\tt &$n_sn_q+8$ & \tt character($n_s$,$n_q$) & \tt names & names of $S$\\\hline
\tt &$10n_q(n_b+2)+8$& \tt real($n_q$,$n_b$+2) & \tt quantsum & accu. histograms\\
\tt & & \tt real($n_q$,$n_b$+2) & \tt quantsumsq& accu. histograms squared\\\hline
\tt-0144 &\tt 0010 & \tt real & \tt time & current runtime in seconds\\\hline
\tt-0134 &\tt 0134 & \tt character(300) & \tt msg & any message\\\hline\hline
\tt-0000 &\multicolumn{4}{c}{\tt EOF}
%!TEX root=thesis
\def\at{\textbackslash t}
\def\an{\textbackslash n}
\bf offset &00 &01 &02 &03 &04 &05 &06 &07 &08 &09 &0A &0B &0C &0D &0E &0F\\\hline
\bf hex &09 &00 &00 &00 &20 &4D &63 &4D &75 &6C &65 &20 &20 &09 &00 &00\\
\bf ASCII &\at& & & &' '&M &c &M &u &l &e &' '&' '&\at& &
\bf offset &10 &11 &12 &13 &14 &15 &16 &17 &18 &19 &1A &1B &1C &1D &1E &1F\\\hline
\bf hex &00 &0A &00 &00 &00 &76 &xx &xx &20 &20 &20 &20 &20 &20 &20 &0A\\
\bf ASCII & &\an& & & &v &\v1&\v2&' '&' '&' '&' '&' '&' '&' '&\an
\bf offset &20 &21 &22 &23 &24 &25 &26 &27 &28 &29 &2A &2B &2C &2D &2E &2F\\\hline
\bf hex &00 &00 &00 &05 &00 &00 &00 &xx &xx &xx &xx &xx &05 &00 &00 &00\\
\bf ASCII & & & &\af& & & &\s1&\s2&\s3&\s4&\s5&\af& & &
......@@ -48,6 +48,7 @@ is based on~\cite{Ulrich:2020phd} for detailed background information.}
%!TEX root=manual
\section{Technical aspects of \mcmule{}}
In this section, we will review the very technical details of the
implementation. This is meant for those readers, who wish to truly
understand the nuts and bolts holding the code together. We begin by
discussing the phase-space generation and potential pitfalls in
Section~\ref{sec:ps}. Next, in Section~\ref{sec:fksfor}, we discuss
how the \ac{FKS} scheme~\cite{Frixione:1995ms, Frederix:2009yq,
Ulrich:2019fks, Engel:2019nfw, Ulrich:2020phd} (cf.
Appendix~\ref{sec:fks} for a review) is implemented in Fortran code.
This is followed by a brief review of the random number generator used
in \mcmule{} in Section~\ref{sec:rng}. Finally, we give an account of
how the statefiles work and how they are used to store distributions
in Section~\ref{sec:vegasff}.
%!TEX root=manual
\subsection{Basics of containerisation}
\mcmule{} is Docker-compatible. Production runs should be performed
with Docker~\cite{Merkel:2014}, or its user-space complement {\sl
udocker}~\cite{Gomes:2017hct}, to facilitate reproducibility and data
retention. On Linux, Docker uses {\tt chroot} to simulate an operating
system with \mcmule{} installed. In our case, the underlying system is
Alpine Linux, a Linux distribution that is approximately 5MB in size.
To understand Docker, we need to introduce some terms
An image is a representation of the system's 'hard disk'. One host
system can have multiple images. In (u)Docker, the images can be
listed with {\tt docker image ls} ({\tt udocker images}).
Images can be have names, called tags, otherwise Docker assigns a
name as the SHA256 hash.
Because keeping multiple full file systems is rather wasteful,
images are split into layers that can be shared among images. In
{\sl uDocker}, these are tar files containing the changes made to
the file system.
To execute an image, a container needs to be generated.
Essentially, this involves uncompressing all layers into a
directory an {\tt chroot}ing into said directory.
It is important to note, that containers are ephemeral, i.e. changes
made to the container are not stored unless explicitly requested. This
is usually not required anyway.
For external interfacing, folders of the host system are mounted into
the container.
\subsubsection{Building images}
Docker images are built using Dockerfiles, a set of instruction on how
to create the image from external information and a base image. To
speed up building of the image, \mcmule{} uses a custom base image
called {\tt mcmule-pre} that is constructed as follows
FROM alpine:3.11
LABEL maintainer=""
LABEL version="1.0"
LABEL description="The base image for the full McMule suite"
# Install a bunch of things
RUN apk add py3-numpy py3-scipy ipython py3-pip git tar gfortran gcc make curl musl-dev
RUN echo "" >> /etc/apk/repositories && \
apk add py3-matplotlib && \
sed -i '$ d' /etc/apk/repositories
On top of this, \mcmule{} is build
FROM yulrich/mcmule-pre:1.0.0
LABEL maintainer=""
LABEL version="1.0"
LABEL description="The full McMule suite"
RUN pip3 install
COPY . /monte-carlo
WORKDIR /monte-carlo
RUN ./configure
RUN make
To build this image, run
monte-carlo$ docker build -t $mytagname . # Using Docker
monte-carlo$ udocker build -t=$mytagname . # Using udocker
The CI system uses {\sl udocker} to perform builds after each push.
Note that using {\sl udocker} for building requires a patched version
of the code that is available from the \ac{MMCT}.
\subsubsection{Creating containers and running}
In Docker, containers are usually created and run in one command
$ docker run --rm $imagename $cmd
The flag {\tt --rm} makes sure the container is deleted after it is
completed. If the command is a shell (usually {\tt ash}), the flag
{\tt -i} also needs to be provided.
For {\sl udocker}, creation and running can be done in two steps
$ udocker create $imagename
# this prints the container id
$ udocker run $containerid $cmd
# work in container
$ udocker rm $containerid
or in one step
$ udocker run --rm $imagename $cmd
Running containers can be listed with {\tt udocker ps} and {\tt docker
ps}. For further details, the reader is pointed to the manuals of
Docker and {\sl udocker}.
%!TEX root=manual
\caption{An example implementation of the \ac{FKS} scheme in Fortran.
Not shown are various checks performed, the binning as well as
initialisation blocks.}
\subsection{Implementation of FKS schemes}
Now that we have a phase-space routine that has $\xi_i$ as variables
of the integration, we can start implementing the relevant
\pref1\ \cdis{\xi_1}
= \D\xi_1\frac1{\xi_1}\Bigg(
\pref1\ \Big(\xi_1^2\M{n+1}0\Big)
-\pref1\ \Big(\eik\M{n}0\Big) \ \theta(\xc-\xi_1)
We refer to the first term as the \term{event} and the second as the
Note that, due to the presence of $\delta(\xi_1)$ in the counter-event
(that is implemented through the eikonal factor $\eik$) the momenta
generated by the phase-space $\pref1$ are different. Thus, it is
possible that the momenta of the event pass the cuts or on-shell
conditions, while those of the counter event fail, or vice versa.
This subtlety is extremely important to properly implement the
\ac{FKS} scheme and many problems fundamentally trace back to this.
Finally, we should note that, in order to increase numerical
stability, we introduce cuts on $\xi$ and sometimes also on a
parameter that encodes the \ac{PCS} such as $y={\tt y2}$
in~\eqref{eq:kdef} and Listing~\ref{lst:psmu}. Events that have values
of $\xi$ smaller than this \term{soft cut} are discarded immediately
and no subtraction is considered. The dependence on this slicing
parameter is not expected to drop out completely and hence, the soft
cut has to be chosen small enough to not influence the result.
An example implementation can be found in Listing~\ref{lst:fks}.
%!TEX root=manual
\subsection{Phase-space generation}\label{sec:ps}
We use the {\tt vegas} algorithm for numerical
integration~\cite{Lepage:1980jk}. As {\tt vegas} only works on the
hypercube, we need a routine that maps $[0,1]^{3n-4}$ to the momenta
of an $n$-particle final state, including the corresponding Jacobian.
The simplest way to do this uses iterative two-particle phase-spaces
and boosting the generated momenta all back into the frame under
consideration. An example of how this is done is shown in
As soon as we start using \ac{FKS}, we cannot use this simplistic
approach any longer. The $c$-distributions of \ac{FKS} require the
photon energies $\xi_i$ to be variables of the integration. We can fix
this by first generating the photon explicitly as
k_1 = p_{n+1} = \frac{\sqrt s}2\xi_1 (1,\sqrt{1-y_1 ^2}\vec e_\perp,y_1 )\,,
where $\vec e_\perp$ is a $(d-2)$ dimensional unit vector and the
ranges of $y_1$ (the cosine of the angle) and $\xi_1$ (the scaled
energy) are $-1\le y_1 \le 1$ and $0\le\xi_1 \le\xi_\text{max}$,
respectively. The upper bound $\xi_\text{max}$ depends on the masses
of the outgoing particles. Following \cite{Frederix:2009yq} we
\xi_\text{max} = 1-\frac{\Big(\sum_i m_i\Big)^2}{s}\,.
Finally, the remaining particles are generated iteratively again. This
can always be done and is guaranteed to work.
For processes with one or more \ac{PCS}s this approach is suboptimal.
The numerical integration can be improved by orders of magnitude by
aligning the pseudo-singular contribution to one of the variables of
the integration, as this allows {\tt vegas} to optimise the
integration procedure accordingly. As an example, consider once again
$\mu\to\nu\bar\nu e\gamma$. The \ac{PCS} comes from
\M{n+1}\ell \propto \frac{1}{q\cdot k} = \frac1{\xi^2}\frac1{1-y\beta}
where $y$ is the angle between photon ($k$) and electron ($q$). For
large velocities $\beta$ (or equivalently small masses), this becomes
almost singular as $y\to1$. If now $y$ is a variable of the
integration this can be mediated. An example implementation is shown
in Listing~\ref{lst:psmu}.
The approach outlined above is very easy to do in the case of the muon
decay as the neutrinos can absorb any timelike four-momentum. This is
because the $\delta$ function of the phase-space was solved through
the neutrino's {\tt pair\_dec}. However, for scattering processes
where all final state leptons could be measured, this fails. Writing a
routine for $\mu$-$e$-scattering
e(p_1)+\mu(p_2) \to e(p_3)+\mu(p_4) + \gamma(p_5)\,,
that optimises on the incoming electron is rather trivial because its
direction stays fixed s.t. the photon just needs to be generated
according to~\eqref{eq:kdef}. The outgoing electron $p_3$ is more
complicated. Writing the $p_4$-phase-space four- instead of
\D\Phi_5 &= \delta^{(4)}(p_1+p_2-p_3-p_4-p_5)
\delta(p_4^2-M^2) \Theta(E_4)
\frac{\D^4\vec p_4}{(2\pi)^4}
\frac{\D^3\vec p_3}{(2\pi)^32E_3}
\frac{\D^3\vec p_5}{(2\pi)^32E_5}\,,
we can solve the four-dimensional $\delta$ function for $p_4$ and
proceed for the generation $p_3$ and $p_5$ almost as for the muon
decay above. Doing this we obtain for the final $\delta$ function
\delta(p_4^2-M^2) =
When solving this for $E_3$, we need to take care to avoid extraneous
solutions of this radical equation~\cite{Gkioulekas:2018}. We have now
obtained our phase-space parametrisation, albeit with one caveat: for
anti-collinear photons, i.e. $-1<y<0$ with energies
\xi_1 = 1-\frac{m}{\sqrt{s}}+\frac{M^2}{\sqrt{s}(m-\sqrt{s}} < \xi <
there are still two solutions. One of these corresponds to very
low-energy electron that are almost produced at rest. This is rather
fortunate as most experiments will have an electron detection
threshold higher that this. Otherwise, phase-spaces optimised this way
also define a {\tt which\_piece} for this \term{corner region}.
There is one last subtlety when it comes to these type of phase-space
optimisations. Optimising the phase-space for emission from one leg
often has adverse effects on terms with dominant emission from another
leg. In other words, the numerical integration works best if there is
only one \ac{PCS} on which the phase-space is tuned. As most
processes have more than one \ac{PCS} we need to resort to something
that was already discussed in the original \ac{FKS}
paper~\cite{Frixione:1995ms}. Scattering processes that involve
multiple massless particles have overlapping singular regions. The
\ac{FKS} scheme now mandates that the phase-space is partitioned in
such a way as to isolate at most one singularity per region with each
region having its own phase-space parametrisation. Similarly we have
to split the phase-space to contain at most one \ac{PCS} as well as
the soft singularity. In \mcmule{} $\mu$-$e$ scattering for instance
is split as follows\footnote{When implementing this, care must be
taken to ensure that the split is also well defined if the photon is
soft, i.e. if $\xi=0$.}
1 = \theta\big( s_{15} > s_{35} \big)
+ \theta\big( s_{15} < s_{35} \big)\,,
with $s_{ij} = 2p_i\cdot p_j$ as usual. The integrand of the first
$\theta$ function has a final-state \ac{PCS} and hence we use the
parametrisation obtained by solving~\eqref{eq:psdel}. The second
$\theta$ function, on the other hand, has an initial-state \ac{PCS}
which can be treated by just directly parametrising the photon in the
centre-of-mass frame as per~\eqref{eq:kdef}. This automatically makes
$s_{15}\propto(1-\beta_\text{in}y_1)$ a variable of the integration.
For the double-real corrections of $\mu$-$e$ scattering, we proceed
along the same lines except now the argument of the $\delta$ function
is more complicated.
\caption{Example implementation of iterative phase-space. Not shown
are the checks to make sure that all particles have at least enough
energy for their mass, i.e. that $E_i\ge m_i$.}
\caption{Example implementation of a so-called \ac{FKS} phase-space
where the fifth particle is an \ac{FKS} photon that may becomes soft.
Not shown are checks whether $E_i\ge m_i$.}
%!TEX root=manual
\subsection{Random number generation}
A Monte Carlo integrator relies on a (pseudo) \emph{random number
generator} (\ac{RNG} or PRNG) to work. The pseudo-random numbers need
to be of high enough quality, i.e. have no discernible pattern and a
long period, to consider each point of the integration independent but
the \ac{RNG} needs to be simple enough to be called many billion
times without being a significant source of runtime. \ac{RNG}s used
in Monte Carlo applications are generally poor in quality and often
predictable s.t. they could not be used for cryptographic
A commonly used trade-off between unpredictability and simplicity,
both in speed and implementation, is the Park-Miller \ac{RNG}, also
known as {\tt minstd}~\cite{Park:1988RNG}. As a linear congruential
generator, its $(k+1)$th output $x_{k+1}$ can be found as
z_{k+1} = a\cdot z_k \ \text{mod}\ m = a^{k+1} z_1 \ \text{mod}\ m
x_k = z_k / m \in (0,1)\,,
where $m$ is a large, preferably prime, number and $2<a<m-1$ an
integer. The initial value $z_1$ is called the \term{random seed} and
is chosen integer between 1 and $m-1$. It can easily be seen that any
such \ac{RNG} has a fixed period\footnote{Note that, because of the
simple recursion the \ac{RNG} will not repeat any number until the
full period is complete} $p<m$ s.t. $z_{k+p} = z_k$ because any
$z_{k+1}$ only depends on $z_k$ and there are finitely many possible
$z_k$. We call the \ac{RNG} attached to $(m,a)$ to be of \term{full
period} if $p=m-1$, i.e. all integers between 1 and $m-1$ appear in
the sequence $z_k$.
Assuming $z_1=1$ then the existence of $p$ s.t. $z_{p+1}=1$ is
guaranteed by Fermat's Theorem\footnote{If $p$ is prime, for any
integer $a$, $a^p-a$ is a multiple of $p$.}. This means that the
\ac{RNG} is of full period iff $a$ is a primitive root modulo $m$,
\forall g\ \text{co-prime to $m$}\quad
\exists k\in\mathbb{Z}
a^k\equiv g\ (\text{mod}\ m)\,.
Park and Miller suggest to use the Mersenne prime $m=2^{31}-1$, noting
that there are 534,600,000 primitive roots of which 7 is the smallest.
Because $7^b\ \text{mod}\ m$ is also a primitive root as long as $b$
is co-prime to $(m\!-\!1)$, \cite{Park:1988RNG} settled on $b=5$, i.e.
$a=16807$ as a good choice for the multiplier that, per construction,
has full period and passes certain tests of randomness.
The points generated by any such \ac{RNG} will fall into
$\sqrt[n]{n!\cdot m}$ hyperplanes if scattered in an $n$ dimensional
space~\cite{Marsaglia25}. However, for bad choices of the multiplier
$a$ the number of planes can be a lot smaller\footnote{An infamous
example is {\tt randu} that used $a=2^{16}+3$ and $m=2^{31}$ that in
three dimension produces only 15 planes instead of the maximum 2344.}.
Presently, the period length of $p=m-1=2^{31}-2$ is believed to be
sufficient though detailed studies quantifying this would be welcome.
%!TEX root=manual
\subsection{Differential distributions and intermediary state files}
Distributions are always calculated as histograms by binning each
event according to its value for the observable $S$. This is done by
having an $(n_b\times n_q)$-dimensional array\footnote{To be precise,
the actual dimensions are $(n_b+2)\times n_q$ to accommodate under-
and overflow bins} {\tt quant} where $n_q$ is the number of histograms
to be calculated ({\tt nr\_q}) and $n_b$ is the number of bins used
({\tt nr\_bins}). The weight of each event $\D\Phi\times\mathcal{M}
\times w$ is added to the correct entry in {\tt bit\_it} where $w={\tt
wgt}$ is the event weight assigned by {\tt vegas}.
After each iteration of {\tt vegas} we add {\tt quant} (${\tt
quant}^2$) to an accumulator of the same dimensions called {\tt
quantsum} ({\tt quantsumsq}). After $i$ iterations, we can calculate
the value and error as
\frac{\D\sigma}{\D S} \approx \frac{\tt quantsum}{\Delta\times i}
\delta\bigg(\frac{\D\sigma}{\D S}\bigg)\approx \frac1\Delta \sqrt{
\frac{{\tt quantsumsq}-{\tt quantsum}^2/i}{i(i-1)}
where $\Delta$ is the bin-size.
Related to this discussion is the concept of intermediary state
files. Their purpose is to record the complete state of the integrator
after every iteration in order to recover should the program crash --
or more likely be interrupted by a batch system. \mcmule{} uses a
custom file format {\tt .vegas} for this purpose which uses
Fortran's record-based (instead of stream- or byte-based) format. This
means that each entry starts with 32bit unsigned integer, i.e. 4 byte,
indicating the record's size and ends with the same 32bit integer. As
this is automatically done for each record, it minimises the amount of
metadata that have to be written.
The current version ({\tt v3}) must begin with the magic header and
version self-identification shown in Figure~\ref{fig:vegf:head}. The
latter includes file version information and the first five characters
the source tree's \ac{SHA1} hash, obtained using {\tt make hash}.
The header is followed by records describing the state of the
integrator as shown in Figure~\ref{fig:vegf:body}. Additionally to
information required to continue integration such as the current value
and grid information, this file also has 300 bytes for a message. This
is usually set by the routine to store information on the fate of the
integration such as whether it was so-far uninterrupted or whether
there is reason to believe it to be inconsistent.