Commit acea809e authored by muntwiler_m's avatar muntwiler_m
Browse files

update public distribution

based on internal repository c9a2ac8 2019-01-03 16:04:57 +0100
tagged rev-master-2.0.0
parent bbd16d0f
......@@ -13,3 +14,4 @@ lib/*
......@@ -27,7 +27,7 @@ Highlights
PMSCO is written in Python 2.7.
PMSCO is written in Python 3.6 and compatible with Python 2.7.
The code will run in any recent Linux environment on a workstation or in a virtual machine.
Scientific Linux, CentOS7, [Ubuntu](
and [Lubuntu]( (recommended for virtual machine) have been tested.
......@@ -61,7 +61,7 @@ Matthias Muntwiler, <>
Copyright 2015-2017 by [Paul Scherrer Institut](
Copyright 2015-2018 by [Paul Scherrer Institut](
Release Notes
......@@ -120,7 +120,7 @@ python -m compileall projects
if [ -n "$PMSCO_SCAN_FILES" ]; then
......@@ -3,9 +3,10 @@
# submission script for PMSCO calculations on the Ra cluster
if [ $# -lt 1 ]; then
echo ""
echo " NOSUB (optional): do not submit the script to the queue. default: submit."
echo " DESTDIR: destination directory. must exist. a sub-dir \$JOBNAME is created."
echo " JOBNAME (text): name of job. use only alphanumeric characters, no spaces."
echo " NODES (integer): number of computing nodes. (1 node = 24 or 32 processors)."
echo " do not specify more than 2."
......@@ -20,7 +21,7 @@ if [ $# -lt 1 ]; then
echo " MODE: PMSCO calculation mode (single|swarm|gradient|grid)."
echo " ARGS (optional): any number of further PMSCO or project arguments (except mode and time)."
echo ""
echo "the job script complete with the program code and input/output data is generated in ~/jobs/\$JOBNAME"
echo "the job script is written to \$DESTDIR/\$JOBNAME which is also the destination of calculation output."
exit 1
......@@ -37,6 +38,9 @@ else
......@@ -73,11 +77,7 @@ PMSCO_LOGLEVEL=""
# set up working directory
cd ~
if [ ! -d "jobs" ]; then
mkdir jobs
cd jobs
cd "$DEST_DIR"
if [ ! -d "$PMSCO_JOBNAME" ]; then
......@@ -38,13 +38,13 @@ PROJECT_NAME = "PEARL MSCO"
# could be handy for archiving the generated documentation or if some version
# control system is used.
# Using the PROJECT_BRIEF tag one can provide an optional one line description
# for a project that appears at the top of each page and should give viewer a
# quick idea about the purpose of the project. Keep the description short.
PROJECT_BRIEF = "PEARL multiple scattering calculations and optimizations"
PROJECT_BRIEF = "PEARL multiple scattering calculation and optimization"
# With the PROJECT_LOGO tag one can specify a logo or an icon that is included
# in the documentation. The maximum height of the logo should not exceed 55
......@@ -228,7 +228,7 @@ TAB_SIZE = 4
# "Side Effects:". You can put \n's in the value part of an alias to insert
# newlines.
ALIASES = "raise=@exception"
# This tag can be used to specify a number of word-keyword mappings (TCL only).
# A mapping has the form "name=value". For example adding "class=itcl::class"
......@@ -597,19 +597,19 @@ STRICT_PROTO_MATCHING = NO
# list. This list is created by putting \todo commands in the documentation.
# The default value is: YES.
# The GENERATE_TESTLIST tag can be used to enable (YES) or disable (NO) the test
# list. This list is created by putting \test commands in the documentation.
# The default value is: YES.
# The GENERATE_BUGLIST tag can be used to enable (YES) or disable (NO) the bug
# list. This list is created by putting \bug commands in the documentation.
# The default value is: YES.
# The GENERATE_DEPRECATEDLIST tag can be used to enable (YES) or disable (NO)
# the deprecated list. This list is created by putting \deprecated commands in
......@@ -761,9 +761,12 @@ WARN_LOGFILE =
src/introduction.dox \
src/concepts.dox \
src/concepts-tasks.dox \
src/concepts-emitter.dox \
src/installation.dox \
src/execution.dox \
src/commandline.dox \
src/optimizers.dox \
../pmsco \
../projects \
......@@ -859,7 +862,7 @@ EXAMPLE_RECURSIVE = NO
# that contain images that are to be included in the documentation (see the
# \image command).
IMAGE_PATH = src/images
# The INPUT_FILTER tag can be used to specify a program that doxygen should
# invoke to filter for each input file. Doxygen will invoke the filter program
......@@ -876,7 +879,7 @@ IMAGE_PATH =
# code is scanned, but not when the output code is generated. If lines are added
# or removed, the anchors will not be placed correctly.
INPUT_FILTER = /usr/bin/doxypy
# The FILTER_PATTERNS tag can be used to specify filters on a per file pattern
# basis. Doxygen will compare the file name with each pattern and apply the
......@@ -885,7 +888,7 @@ INPUT_FILTER = /usr/bin/doxypy
# filters are used. If the FILTER_PATTERNS tag is empty or if none of the
# patterns match the file name, INPUT_FILTER is applied.
FILTER_PATTERNS = *.py=/usr/bin/doxypy
# If the FILTER_SOURCE_FILES tag is set to YES, the input filter (if set using
# INPUT_FILTER) will also be used to filter the input files that are used for
......@@ -2328,7 +2331,7 @@ DIAFILE_DIRS =
# generate a warning when it encounters a \startuml command in this case and
# will not generate output for the diagram.
# When using plantuml, the specified paths are searched for files specified by
# the !include statement in a plantuml block.
......@@ -2,6 +2,11 @@ SHELL=/bin/sh
# makefile for PMSCO documentation
# requirements
# 1) doxygen
# 2) /usr/bin/doxypy
# 3) PLANTUML_JAR_PATH environment variable must point to plantUML jar.
.SUFFIXES: .c .cpp .cxx .exe .f .h .i .o .py .pyf .so .html
......@@ -11,6 +16,9 @@ DOX=doxygen
REVISION=$(shell git describe --always --tags --dirty --long || echo "unknown, "`date +"%F %T %z"`)
all: docs
docs: doxygen pdf
......@@ -22,5 +30,6 @@ pdf: doxygen
-rm -rf latex/*
-rm -rf html/*
-rm -r latex/*
-rm -r html/*
......@@ -11,14 +11,19 @@ it is recommended to adhere to the standard syntax described below.
The basic command line is as follows:
[mpiexec -np NPROCESSES] python [common args] [project args]
[mpiexec -np NPROCESSES] python path/to/pmsco path/to/ [common args] [project args]
Include the first portion between square brackets if you want to run parallel processes.
Specify the number of processes as the @c -np option.
@c should be the path and name to your project module.
@c path/to/pmsco is the directory where <code>__main.py__</code> is located.
Do not include the extension <code>.py</code> or a trailing slash.
@c path/to/ should be the path and name to your project module.
Common args and project args are described below.
Note: In contrast to earlier versions, the project module is not executed directly any more.
Rather, it is loaded by the main pmsco module as a 'plug-in'.
\subsection sec_common_args Common Arguments
......@@ -30,7 +35,7 @@ The following table is ordered by importance.
| Option | Values | Description |
| --- | --- | --- |
| -h , --help | | Display a command line summary and exit. |
| -m , --mode | single (default), grid, swarm | Operation mode. |
| -m , --mode | single (default), grid, swarm, genetic | Operation mode. |
| -d, --data-dir | file system path | Directory path for experimental data files (if required by project). Default: current working directory. |
| -o, --output-file | file system path | Base path and/or name for intermediate and output files. Default: pmsco_data |
| -t, --time-limit | decimal number | Wall time limit in hours. The optimizers try to finish before the limit. Default: 24.0. |
......@@ -39,6 +44,8 @@ The following table is ordered by importance.
| --log-file | file system path | Name of the main log file. Under MPI, the rank of the process is inserted before the extension. Default: output-file + log, or pmsco.log. |
| --log-disable | | Disable logging. By default, logging is on. |
| --pop-size | integer | Population size (number of particles) in swarm optimization mode. The default value is the greater of 4 or two times the number of calculation processes. |
| --seed-file | file system path | Name of the population seed file. Population data of previous optimizations can be used to seed a new optimization. The file must have the same structure as the .pop or .dat files. See @ref pmsco.project.Project.seed_file. |
| --table-file | file system path | Name of the model table file in table scan mode. |
| -c, --code | edac (default) | Scattering code. At the moment, only edac is supported. |
......@@ -49,13 +56,14 @@ Multiple names can be specified and must be separated by spaces.
| Category | Description | Default Action |
| --- | --- | --- |
| all | shortcut to include all categories | |
| input | raw input files for calculator, including cluster and phase files in custom format | delete |
| output | raw output files from calculator | delete |
| phase | phase files in portable format for report | delete |
| cluster | cluster files in portable XYZ format for report | keep |
| debug | debug files | delete |
| model | output files in ETPAI format: complete simulation (a_-1_-1_-1_-1) | keep |
| scan | output files in ETPAI format: scan (a_b_-1_-1_-1) | delete |
| scan | output files in ETPAI format: scan (a_b_-1_-1_-1) | keep |
| symmetry | output files in ETPAI format: symmetry (a_b_c_-1_-1) | delete |
| emitter | output files in ETPAI format: emitter (a_b_c_d_-1) | delete |
| region | output files in ETPAI format: region (a_b_c_d_e) | delete |
......@@ -84,36 +92,11 @@ This way, the file names and photoelectron parameters are versioned with the cod
whereas command line arguments may easily get forgotten in the records.
\subsection sec_project_example Example Argument Handling
An example for handling the command line in a project module can be found in the demo project.
The following code snippet shows how the common and project arguments are separated and handled.
def main():
# have the pmsco module parse the common arguments.
args, unknown_args = pmsco.pmsco.parse_cli()
# pass any arguments not handled by pmsco
# to the project-defined parse_project_args function.
# unknown_args can be passed to argparse.ArgumentParser.parse_args().
if unknown_args:
project_args = parse_project_args(unknown_args)
project_args = None
\subsection sec_project_example Argument Handling
# create the project object
project = create_project()
# apply the common arguments on the project
pmsco.pmsco.set_common_args(project, args)
# apply the specific arguments on the project
set_project_args(project, project_args)
# run the project
To handle command line arguments in a project module,
the module must define a <code>parse_project_args</code> and a <code>set_project_args</code> function.
An example can be found in the demo project.
\section sec_slurm Slurm Job Submission
......@@ -122,23 +105,24 @@ The command line of the Slurm job submission script for the Ra cluster at PSI is
This script is specific to the configuration of the Ra cluster but may be adapted to other Slurm-based queues.
Here, the first few arguments are positional and their order must be strictly adhered to.
After the positional arguments, optional arguments of the PMSCO project command line can be added in arbitrary order.
If you execute the script without arguments, it displays a short summary.
The job script is written to @c ~/jobs/\$JOBNAME.
The job script is written to @c $DESTDIR/$JOBNAME which is also the destination of calculation output.
| Argument | Values | Description |
| --- | --- | --- |
| NOSUB (optional) | NOSUB or omitted | If NOSUB is present as the first argument, create the job script but do not submit it to the queue. Otherwise, submit the job script. |
| DESTDIR | file system path | destination directory. must exist. a sub-dir $JOBNAME is created. |
| JOBNAME | text | Name of job. Use only alphanumeric characters, no spaces. |
| NODES | integer | Number of computing nodes. (1 node = 24 or 32 processors). Do not specify more than 2. |
| TASKS_PER_NODE | 1...24, or 32 | Number of processes per node. 24 or 32 for full-node allocation. 1...23 for shared node allocation. |
| WALLTIME:HOURS | integer | Requested wall time. 1...24 for day partition, 24...192 for week partition, 1...192 for shared partition. This value is also passed on to PMSCO as the @c --time-limit argument. |
| PROJECT | file system path | Python module (file path) that declares the project and starts the calculation. |
| MODE | single, swarm, grid | PMSCO operation mode. This value is passed on to PMSCO as the @c --mode argument. |
| MODE | single, swarm, grid, genetic | PMSCO operation mode. This value is passed on to PMSCO as the @c --mode argument. |
| ARGS (optional) | | Any further arguments are passed on verbatim to PMSCO. You don't need to specify the mode and time limit here. |
\ No newline at end of file
/*! @page pag_concepts_emitter Emitter configurations
\section sec_emitters Emitter configurations
\subsection sec_emit_intro Introduction
Since emitters contribute incoherently to the diffraction pattern,
it should make no difference how the emitters are grouped and calculated.
This fact can be used to distribute a calculation over multiple parallel processes
if each process calculates the diffraction pattern coming from one particular emitter atom.
In effect, some calculation codes are implemented for a single emitter per calculation.
With PMSCO, it is easy to distribute the emitters over parallel processes.
The project just declares the number of emitters and returns one specific cluster per emitter.
In the simplest case, this means that the emitter attribute of the cluster atoms is set differently,
while the atomic coordinates are the same for all clusters generated.
PMSCO takes care of dispatching the clusters to multiple calculation processes
depending on the number of allocated MPI processes
as well as summing up the resulting diffraction patterns.
In addition, the emitter framework also supports that clusters are tailored to a specific emitter configuration.
Suppose that the unit cell contains a large number of inequivalent emitters.
If all emitters had to be included in a single calculation,
the cluster would grow very large and the calculation would include many long scattering paths
that effectively did not contribute intensity to the final result.
Splitting a large cluster into small ones built locally around one emitter
can provide a significant performance gain in complex systems.
Note that the emitter framework does not require that an emitter _configuration_ contains only one emitter _atom_.
It is up to the project to define how many emitter configurations there are and what they encompass.
This should, however, normally not be necessary.
To avoid confusion, it is recommended to declare exactly one emitter atom per configuration.
\subsection sec_emit_implement Implementation
There are several implementation routes with varying complexity.
Which route to take can depend on the complexity of the system and/or the programming skills of the user.
The following class diagram illustrates the classes and packages involved in cluster generation.
@startuml "class diagram for cluster generation"
package pmsco {
class Project {
abstract class ClusterGenerator {
{abstract} count_emitters()
{abstract} create_cluster()
class LegacyClusterGenerator {
package "user project" {
class UserClusterGenerator {
note bottom : for complex cluster
class UserProject {
note bottom : for simple cluster
Project <|-- UserProject
ClusterGenerator <|-- LegacyClusterGenerator
ClusterGenerator <|-- UserClusterGenerator
Project *-- ClusterGenerator
UserProject .> LegacyClusterGenerator
UserProject .> UserClusterGenerator
In general, the cluster is generated by calls to the project's cluster_generator object.
This can be either a custom generator class derived from pmsco.cluster.ClusterGenerator
or the default pmsco.cluster.LegacyClusterGenerator which calls the UserProject.
For simple clusters, it may be sufficient to implement the cluster directly in the user project class
(UserProject in the diagram).
For more complex systems, it is recommended to implement a custom cluster generator class
\subsubsection sec_emit_implement_legacy Static cluster implemented in project methods
This is the most simple route as it requires the implementation of one or two methods of the user project class.
It can be used for single-emitter and multi-emitter problems.
This implementation is active while a pmsco.cluster.LegacyClusterGenerator
is assigned to the project's cluster_generator attribute.
1. Implement a count_emitters method in your project class
if the project uses more than one emitter configurations.
It must have same method contract as pmsco.cluster.ClusterGenerator.count_emitters.
Specifically, it must return the number of emitter configurations of a given model, scan and symmetry.
If there is only one configuration, the method does not need to be implemented.
2. Implement a create_cluster method in your project class.
It must have same method contract as pmsco.cluster.ClusterGenerator.create_cluster.
Specifically, it must return a cluster.Cluster object for the given model, scan, symmetry and emitter configuration.
The emitter atoms must be marked according to the emitter configuration specified by the index argument.
Note that, depending on the index.emit argument, all emitter atoms must be marked
or only the ones of the corresponding emitter configuration.
3. (Optionally) override the pmsco.project.Project.combine_emitters method
if the emitters should be added with non-uniform weights.
Although it's possible to produce emitter-dependent clusters using this approach,
this is usually not recommended.
Rather, the generator approach described below should be followed in this case.
\subsubsection sec_emit_implement_generator Static cluster implemented by generator class
The preferred way of creating clusters is to implement a _generator_ class
because it is the most scalable way from simple to complex systems.
In addition, one cluster generator class can be quickly exchanged for another
if there are multiple possibilities.
1. Implement a cluster generator class which inherits from pmsco.cluster.ClusterGenerator
in your project module.
2. Implement the create_cluster and count_emitters methods of the generator.
The method contracts are the same as the ones described in the previous paragraph,
just in the context of a separate class.
3. Initialize an instance of the generator and assign it to the project.cluster_generator attribute
in the initialization of your project.
\subsubsection sec_emit_implement_local Local clusters implemented by generator class
The basic method contract outlined in the previous paragraph is equally applicable to the case
where a local cluster is generated for each emitter configuration.
Again, the generator class with the two methods (count_emitters and create_cluster) is the minimum requirement.
However, for ease of code maintenance and/or for improved performance of large clusters,
some internal structure may be helpful.
Suppose that the system consists of a large supercell containing many emitters
and that a small cluster shall be built for each emitter configuration.
During the calculations, the generator will receive several calls to the count_emitters and create_cluster methods.
Every time the model and index are the same, the functions must return the same result.
Thus, most importantly, the implementation must make sure that the results are fully deterministic.
Second, depending on the complexity, it could be more efficient to cache a cluster for later use.
One way to reduce the complexity is to introduce a _master cluster_
from which the emitter configurations and individual clusters are derived.
1. Implement a master_cluster method with the same arguments and result types as create_cluster.
The method returns a full cluster of the supercell and its neighbouring cells.
All inequivalent emitters are marked (which determines the number of emitter configurations).
2. Decorate the master_cluster with pmsco.dispatch.CachedCalculationMethod.
This pre-defined decorator transparently caches the cluster
so that subsequent calls with the same arguments do not re-create the cluster but return the cached one.
3. The count_emitters method can simply return the emitter count of the master cluster.
4. The create_cluster method calls master_cluster() and extracts the region
corresponding to the requested emitter configuration.
\subsection sec_emit_report Reporting
The pmsco.project.Project class implements a method that saves a cluster to two XYZ files,
one containing the coordinates of all atoms
and one containing only the coordinates of the emitters.
The method is called for each cluster that is passed to the calculator, i.e., each emitter index.
You may override the method in your project to alter the reporting.
\ No newline at end of file
/*! @page pag_concepts_model Model
/*! @page pag_concepts_region Region
\ No newline at end of file
/*! @page pag_concepts_scan Scans
\section sec_scanning Scanning
PMSCO with EDAC currently supports the following scan axes.
- kinetic energy E
- polar angle theta T
- azimuthal angle phi P
- analyser angle alpha A
The following combinations of these scan axes are allowed (see
- E
- E-T
- E-A
- T-P (hemispherical or hologram scan)
@attention The T and A axes cannot be combined.
If a scan of one of them is specified, the other is assumed to be fixed at zero!
This assumption may change in the future,
so it is best to explicitly set the fixed angle to zero in the scan file.
@remark According to the measurement geometry at PEARL,
alpha scans are implemented in EDAC as theta scans at phi = 90 in fixed cluster mode.
The switch to fixed cluster mode is made by PMSCO internally,
no change of angles or other parameters is necessary in the scan or project files
besides filling the alpha instead of the theta column.
/*! @page pag_concepts_symmetry Symmetry
\section sec_symmetry Symmetry and Domain Averaging
A _symmetry_ under PMSCO is a discrete variant of a set of calculation parameters (including the atomic cluster)
that is derived from the same set of model parameters
and that contributes incoherently to the measured diffraction pattern.
A symmetry may be represented by a special symmetry parameter which is not subject to optimization.
For instance, a real sample may have additional rotational domains that are not present in the cluster,
increasing the symmetry from three-fold to six-fold.
Or, an adsorbate may be present in a number of different lateral configurations on the substrate.
In the first case, it may be sufficient to fold calculated data in the proper way to generate the same symmetry as in the measurement.
In the latter case, it may be necessary to execute a scattering calculation for each possible orientation or a representative number of possible orientations.
PMSCO provides the basic framework to spawn multiple calculations according to the number of symmetries (cf. \ref sec_tasks).
The actual data reduction from multiple symmetries to one measurement needs to be implemented on the project level.
This section explains the necessary steps.
1. Your project needs to populate the pmsco.project.Project.symmetries list.
For each symmetry, add a dictionary of symmetry parameters, e.g. <code>{'angle_azi': 15.0}</code>.
There must be at least one symmetry in a project, otherwise no calculation is executed.
2. The project may apply the symmetry of a task to the cluster and parameter file if necessary.
The pmsco.project.Project.create_cluster and pmsco.project.Project.create_params methods receive the index of the particular symmetry in addition to the model parameters.
3. The project combines the results of the calculations for the various symmetries into one dataset that can be compared to the measurement.
The default method implemented in pmsco.project.Project just adds up all calculations with equal weight.
If you need more control, you need to override the pmsco.project.Project.combine_symmetries method and implement your own algorithm.
/*! @page pag_concepts_tasks Task concept
\section sec_tasks Calculation tasks
A _calculation task_ defines a concrete set of model parameters, atomic coordinates, emitter configuration,
experimental reference and meta-data (such as file names)
that completely defines how to produce the input data for the scattering program (the _calculator_).
For each task, the calculator is executed once and produces one result dataset.
In a typical optimization project, however, the calculator is executed multiple times for various reasons
mandated by the project but also efficient calculations in a multi-process environment:
1. The calculation must be repeated under variation of parameters.
A concrete set of parameters is called @ref sec_task_model.
2. The sample was measured multiple times or under different conditions (initial states, photon energy, emission angle).
Each contiguous measured dataset is called a @ref sec_task_scan.
3. The measurement averages over multiple inequivalent domains, cf. @ref sec_task_symmetry.
4. The measurement includes multiple geometrically inequivalent emitters, cf. @ref sec_task_emitter.
5. The calculation should be distributed over multiple processes that run in parallel to reduce the wall time, cf. @ref sec_task_region.
In PMSCO, these aspects are modelled as attributes of a calculation task
as shown schematically in the following diagram.
@startuml "attributes of a calculation task"
class CalculationTask {
class Model {