High-Performance I/O Library for Particle-based Simulations
Figure 1. A Beam-beam collision
Motivation & Introduction
Particle-based simulations running on large high-performance computing systems over many time steps can generate an enormous amount of particle- and field-based data for post-processing and analysis. Achieving high-performance I/O for this data, effectively managing it on disk, and interfacing it with analysis and visualization tools can be challenging, especially for domain scientists who do not have I/O and data management expertise. The H5hut library is an implementation of several data models for particle-based simulations that encapsulates the complexity of parallel HDF5 and is simple to use, yet does not compromise performance.
H5hut is tuned for writing collectively from all processors to a single, shared file. Although collective I/O performance is typically (but not always) lower than that of file-per- processor, having a shared file simplifies scientific workflows in which simulation data needs to be analyzed or visualized. In this scenario, the file-per-processor approach leads to data management headaches because large collections of files are unwieldy to manage from a file system standpoint. On a parallel file system like Lustre, even the ls utility will break when presented with tens of thousands of files, and performance begins to degrade with this number of files because of contention at the metadata server. Often a post-processing step is necessary to refactor file-per-processor data into a format that is readable by the analysis tool. In contrast, H5hut files can be directly loaded in parallel by visualization tools like VisIt and ParaView.
H5hut is a veneer API for HDF5: H5hut files are also valid HDF5 files and are compatible with other HDF5-based interfaces and tools. For example, the h5dump tool that comes standard with HDF5 can export H5hut files to ASCII or XML for additional portability. H5hut also includes tools to convert H5hut data to the Visualization ToolKit (VTK) format and to generate scripts for the GNUplot data plotting tool.
H5Part: Variable-length 1D particle arrays.
H5Block: Rectilinear 3D scalar and vector fields.
H5Fed: Adaptively refined tetrahedral and triangle meshes.