IO Unit
Flash-X uses parallel input/output (IO) libraries to simplify and manage the output of the large amounts of data usually produced. In addition to keeping the data output in a standard format, the parallel IO libraries also ensure that files will be portable across various platforms. The mapping of Flash-X data-structures to records in these files is controlled by the Flash-X IO unit. Flash-X can output data with HDF5 parallel IO library. Various techniques can be used to write the data to disk when running a parallel simulation. The first is to move all the data to a single processor for output; this technique is known as serial IO. Secondly, each processor can write to a separate file, known as direct IO. As a third option, each processor can use parallel access to write to a single file in a technique known as parallel IO. Finally, a hybrid method can be used where clusters of processors write to the same file, though different clusters of processors output to different files. In general, parallel access to a single file will provide the best parallel IO performance unless the number of processors is very large. On some platforms, such as Linux clusters, there may not be a parallel file system, so moving all the data to a single processor is the only solution. Therefore Flash-X supports HDF5 libraries in both serial and parallel forms, where the serial version collects data to one processor before writing it, while the parallel version has every processor writing its data to the same file.
IO Implementations
Flash-X supports multiple IO implementations: direct, serial and
parallel implementations as well as support for different parallel
libraries. In addition, Flash-X also supports multiple () Grid
implementations. As a consequence, there are many permutations of the IO
API implementation, and the selected implementation must match not only
the correct IO library, but also the correct grid. Although there are
many IO options, the setup
script in Flash-X is quite ‘smart’ and
will not let the user setup a problem with incompatible IO
and
Grid
unit implementations. summarizes the different implementation
of the Flash-X IO unit in the current release.
p2.5inp3.5in
IO/IOMain/HDF5/parallel/PM
&Hierarchical Data Format (HDF) 5 output. A single HDF5 file is
created, with each processor writing its data to the same file
simultaneously. This particular implementation only works with the
PARAMESH
grid package.IO/IOMain/HDF5/parallel/AM
&Hierarchical Data Format (HDF) 5
output. A single HDF5 file is created, with each processor writing
its data to the same file simultaneously. This particular
implementation only works with the |amrex|
grid package.IO/IOMain/hdf5/parallel/UG
&Hierarchical Data Format (HDF) 5
output. A single HDF5 file is created, with each processor writing
its data to the same file simultaneously. This implementation only
works with the Uniform Grid.IO/IOMain/hdf5/parallel/NoFbs
&Hierarchical Data Format (HDF) 5
output. A single HDF5 file is created, with each processor writing
its data to the same file simultaneously. All data is written out
as one block. This implementation only works with the Uniform Grid.IO/IOMain/hdf5/serial/PM
& Hierarchical Data Format (HDF) 5
output. Each processor passes its data to processor 0 through
explicit MPI sends and receives. Processor 0 does all of the
writing. The resulting file format is identical to the parallel
version; the only difference is how the data is moved during the
writing. This implementation only works with the PARAMESH
grid
package.IO/IOMain/hdf5/serial/AM
& Hierarchical Data Format (HDF) 5
output. Each processor passes its data to processor 0 through
explicit MPI sends and receives. Processor 0 does all of the
writing. The resulting file format is identical to the parallel
version; the only difference is how the data is moved during the
writing. This implementation only works with the |amrex|
grid
package.IO/IOMain/hdf5/serial/UG
& Hierarchical Data Format (HDF) 5
output. Each processor passes its data to processor 0 through
explicit MPI sends and receives. Processor 0 does all of the
writing. The resulting file format is identical to the parallel
version; the only difference is how the data is moved during the
writing. This particular implementation only works with the Uniform
Grid.Flash-X also comes with some predefined setup shortcuts which make
choosing the correct IO significantly easier; see for more details about
shortcuts. In Flash-X HDF5 serial IO is included by default. Since
PARAMESH
4.0 is the default grid, the included IO implementations
will be compatible with PARAMESH
4.0. For clarity, a number or
examples are shown below.
An example of a basic setup with HDF5 serial IO and the PARAMESH
grid, (both defaults) is:
./setup Sod -2d -auto
To include a parallel implementation of HDF5 for a PARAMESH
grid the
setup
syntax is:
./setup Sod -2d -auto -unit=IO/IOMain/hdf5/parallel/PM
using the already defined shortcuts the setup
line can be shortened
to
./setup Sod -2d -auto +parallelio
To set up a problem with the Uniform Grid and HDF5 serial IO, the
setup
line is:
./setup Sod -2d -auto -unit=Grid/GridMain/UG -unit=IO/IOMain/hdf5/serial/UG
using the already defined shortcuts the setup
line can be shortened
to
./setup Sod -2d -auto +ug
To set up a problem with the Uniform Grid and HDF5 parallel IO, the
complete setup
line is:
./setup Sod -2d -auto -unit=Grid/GridMain/UG -unit=IO/IOMain/hdf5/parallel/UG
using the already defined shortcuts the setup
line can be shortened
to
./setup Sod -2d -auto +ug +parallelio
If you do not want to use IO, you need to explicitly specify on the
setup
line that it should not be included, as in this example:
./setup Sod -2d -auto +noio
To setup a problem using the Parallel-NetCDF library the user should include either
-unit=IO/IOMain/pnetcdf/PM or -unit=IO/IOMain/pnetcdf/UG
to the setup line. The predefined shortcut for including the Parallel-NetCDF library is
+pnetcdf
Note that Parallel-NetCDF IO unit does not have a serial implementation.
If you are using non-fixedblocksize the shortcut
+nofbs
will bring in both Uniform Grid,set the mode to nonfixed blocksize, and choose the appropriate IO.
In keeping with the Flash-X code architecture, the F90 module
IO_data
stores all the data with IO
unit scope. The routine
IO/IO_init
is called once by Driver/Driver_initFlash
and
initializes IO
data and stores any runtime parameters. See .
Output Files
The IO unit can output 4 different types of files: checkpoint files, plotfiles, particle files and flash.dat, a text file holding the integrated grid quantities. Flash-X also outputs a logfile, but this file is controlled by the Logfile Unit; see for a description of that format.
There are a number of runtime parameters that are used to control the
output and frequency of IO files. A list of all the runtime parameters
and their descriptions for the IO
unit can be found online
````IO/all of them. Additional description is located in for
checkpoint parameters, for plotfile parameters, for particle file
parameters, for flash.dat parameters, and for genereal IO parameters.
Checkpoint files - Restarting a Simulation
Checkpoint files are used to restart a simulation. In a typical production run, a simulation can be interrupted for a number of reasons— e.g., if the machine crashes, the present queue window closes, the machine runs out of disk space, or perhaps (gasp) there is a bug in Flash-X. Once the problem is fixed, a simulation can be restarted from the last checkpoint file rather than the beginning of the run. A checkpoint file contains all the information needed to restart the simulation. The data is stored at full precision of the code (8-byte reals) and includes all of the variables, species, grid reconstruction data, scalar values, as well as meta-data about the run.
The API routine for writing a checkpoint file is
IO/IO_writeCheckpoint
. Users usually will not need to call this
routine directly because the Flash-X IO unit calls
IO_writeCheckpoint
from the routine IO/IO_output
which checks
the runtime parameters to see if it is appropriate to write a checkpoint
file at this time. There are a number of ways to get Flash-X to produce
a checkpoint file for restarting. Within the flash.par, runtime
parameters can be set to dump output. A checkpoint file can be dumped
based on elapsed simulation time, elapsed wall clock time or the number
of timesteps advanced. A checkpoint file is also produced when the
simulation ends, when the max simulation time Driver/tmax
, the
minimum cosmological redshift, or the total number of steps
Driver/nend
has been reached. A user can force a dump to a
checkpoint file at another time by creating a file named
.dump_checkpoint
in the output directory of the master processor.
This manual action causes Flash-X to write a checkpoint in the next
timestep. Checkpoint files will continue to be dumped after every
timestep as long as the code finds a .dump_checkpoint
file in the
output directory, so the user must remember to remove the file once all
the desired checkpoint files have been dumped. Creating a file named
.dump_restart
in the output directory will cause Flash-X to output a
checkpoint file and then stop the simulation. This technique is useful
for producing one last checkpoint file to save time evolution since the
last checkpoint, if the machine is going down or a queue window is about
to end. These different methods can be combined without problems. Each
counter (number of timesteps between last checkpoint, amount of
simulation time single last checkpoint, the change in cosmological
redshift, and the amount of wall clock time elapsed since the last
checkpoint) is independent of the others, and are not influenced by the
use of a .dump_checkpoint
or .dump_restart
.
Runtime Parameters used to control checkpoint file output include:
Parameter |
Type |
Default value |
Description |
---|---|---|---|
Type |
Default value |
Description |
|
|
|
|
The number of the initial checkpoint file. This number is appended to the end of the filename and incremented at each subsequent output. When restarting a simulation, this indicates which checkpoint file to use. |
|
|
|
The number of timesteps desired between subsequent checkpoint files. |
|
|
|
The amount of simulation time desired between subsequent checkpoint files. |
|
|
|
The amount of cosmological redshift change that is desired between subsequent checkpoint files. |
|
|
10000 |
The number of
checkpoint files
to keep
available at any
point in the
simulation. If a
checkpoint
number is
greater than
|
|
|
The maximum
amount of wall
clock time
(seconds) to
elapse between
checkpoints.
When the
simulation is
started, the
current time is
stored. If
|
|
|
|
|
A logical
variable
indicating
whether the
simulation is
restarting from
a checkpoint
file
( |
Flash-X is capable of restarting from any of the checkpoint files it
produces. The user should make sure that the checkpoint file is valid
(e.g., the code did not stop while outputting). To tell Flash-X to
restart, set the Driver/restart
runtime parameter to .true.
in
the flash.par
. Also, set IO/checkpointFileNumber
to the number
of the file from which you wish to restart. If plotfiles or particle
files are being produced set IO/plotfileNumber
and
IO/particleFileNumber
to the number of the next plotfile and
particle file you want Flash-X to output. In Flash-X plotfiles and
particle file outputs are forced whenever a checkpoint file is written.
Sometimes several plotfiles may be produced after the last valid
checkpoint file. Resetting plotfileNumber
to the first plotfile
produced after the checkpoint from which you are restarting will ensure
that there are no gaps in the output. See
for more details on plotfiles.
Plotfiles
A plotfile contains all the information needed to interpret the grid
data maintained by Flash-X. The data in plotfiles, including the grid
metadata such as coordinates and block sizes, are stored at single
precision to preserve space. This can, however, be overridden by setting
the runtime parameters plotfileMetadataDP
and/or
plotfileGridQuantityDP
to true to set the grid metadata and the
quantities stored on the grid (dens, pres, temp, etc.) to use double
precision, respectively. Users must choose which variables to output
with the runtime parameters IO/plot_var_1
, IO/plot_var_2
,
etc., by setting them in the flash.par
file. For example:
plot_var_1 = “dens” plot_var_2 = “pres”
Currently, we support a number of plotvars named plot_var_n
up to
the number of UNKVARS
in a given simulation. Similarly, scratch
variables may be output to plot files . At this time, the plotting of
face centered quantities is not supported.
In Flash-X a few variables like density and pressure were output to
the plotfiles by default. Because Flash-X supports a wider range of
simulations, it makes no assumptions that density or pressure
variables are even included in the simulation. In Flash-X a user
must define plotfile variables in the flash.par
file, otherwise
the plotfiles will not contain any variables.
IO/IO_writePlotfile
. As with checkpoint files, the user will not
need to call this routine directly because it is invoked indirectly
through calling IO/IO_output
when, based on runtime parameters,
Flash-X needs to write a plotfile. Flash-X can produce plotfiles in
much the same manner as it does with checkpoint files. They can be
dumped based on elapsed simulation time, on steps since the last
plotfile dump or by forcing a plotfile to be written by hand by
creating a.dump_plotfile
in the output directory. A plotfile
will also be written at the termination of a simulation as well.nend
, zFinal
, or tmax
. This option can be disabled by
setting ignoreForcedPlot
to true in a simulations flash.par
file. The following runtime parameters pertain to controlling
plotfiles:Parameter |
Type |
Default value |
Description |
---|---|---|---|
Type |
Default value |
Description |
|
`` plotFileNumber`` |
|
|
The number of the starting (or restarting) plotfile. This number is appended to the filename. |
|
|
|
The amount of simulation time desired between subsequent plotfiles. |
|
|
|
The number of timesteps desired between subsequent plotfiles. |
|
|
|
The change in cosmological redshift desired between subsequent plotfiles. |
|
|
A logical variable indicating whether to interpolate the data to cell corners before outputting. This option only applies to plotfiles. |
|
|
|
Name of the variables to store in a plotfile. Up to 12 variables can be selected for storage, and the standard 4-character variable name can be used to select them. |
|
|
|
|
A logical variable indicating whether or not to denote certain plotfiles as forced. |
|
|
|
An integer that sets the starting number for a forced plotfile. |
|
|
|
A logical
variable
indicating
whether or or
not to output
the normally
single-precision
grid metadata
fields as double
precision in
plotfiles. This
specifically
affects
|
|
|
|
A logical
variable that
sets whether or
not quantities
stored on the
grid, such as
those stored in
|
Particle files
When Lagrangian particles are included in a simulation, the ParticleIO
subunit controls input and output of the particle information. The
particle files are stored in double precision. Particle data is written
to the checkpoint file in order to restart the simulation, but is not
written to plotfiles. Hence analysis and metadata about particles is
also written to the particle files. The particle files are intended for
more frequent dumps. The interface for writing the particle file is
IO/IO_writeParticles
. Again the user will not usually call this
function directly because the routine IO_output
controls particle
output based on the runtime parameters controlling particle files. They
are controlled in much of the same way as the plotfiles or checkpoint
files and can be dumped based on elapsed simulation time, on steps since
the last particle dump or by forcing a particle file to be written by
hand by creating a .dump_particle_file
in the output directory. The
following runtime parameters pertain to controlling particle files:
Parameter |
Type |
Default value |
Description |
---|---|---|---|
Type |
Default value |
Description |
|
|
|
|
The number of the starting (or restarting) particle file. This number is appended to the end of the filename. |
|
|
|
The amount of simulation time desired between subsequent particle file dumps. |
|
|
|
The number of timesteps desired between subsequent particle file dumps. |
|
|
|
The change in cosmological redshift desired between subsequent particle file dumps. |
All the code necessary to output particle data is contained in the
IO
subunit called IOParticles
. Whenever the Particles
unit
is included in a simulation the correct IOParticles
subunit will
also be included. For example as setup:
./setup IsentropicVortex -2d -auto -unit=Particles +ug
will include the IO
unit IO/IOMain/hdf5/serial/UG
and the
correct IOParticles
subunit IO/IOParticles/hdf5/serial/UG
. The
shortcuts +parallelio
, +pnetcdf
, +ug
will also cause the
setup script to pick up the correct IOParticles
subunit as long as a
Particles
unit is included in the simulation.
Integrated Grid Quantities – flash.dat
At each simulation time step, values which represent the overall state
(e.g., total energy and momentum) are computed by calculating over all
cells in the computations domain. These integral quantities are written
to the ASCI file flash.dat
. A default routine
IO/IO_writeIntegralQuantities
is provided to output standard
measures for hydrodynamic simulations. The user should copy and modify
the routine IO_writeIntegralQuantities
into a given simulation
directory to store any quantities other than the default values. Two
runtime parameters pertaining to the flash.dat
file are listed in
the table below.
Parameter |
Type |
Default value |
Description |
---|---|---|---|
Type |
Default value |
Description |
|
|
|
|
Name of the file to which the integral quantities are written. |
|
|
|
The number of
timesteps to
elapse between
outputs to the
scalar/integral
data file
( |
General Runtime Parameters
There are several runtime parameters that pertain to the general IO unit or multiple output files rather than one particular output file. They are listed in the table below.
p1.7inllp2.7in
basenm
& STRING
& "flash_"
& The main part of the
output filenames. The full filename consists of the base name, a
series of three-character abbreviations indicating whether it is
a plotfile, particle file or checkpoint file, the file format,
and a 4-digit file number. See for a description of how Flash-X
output files are named.output_directory
& STRING
& ""
& Output directory
for plotfiles, particle files and checkpoint files. The default
is the directory in which the executable sits.
output_directory
can be an absolute or relative path.memory_stat_freq
& INTEGER
& 100000
& The number of
timesteps to elapse between memory statistic dumps to the log
file (flash.log
).useCollectiveHDF5
&BOOLEAN
&.true.
& When using
the parallel HDF5 implementation of IO, will enable collective
mode for HDF5.summaryOutputOnly
&BOOLEAN
&.false.
& When set
to .true. write an integrated grid quantities file only.
Checkpoint, plot and particle files are not written unless the
user creates a .dump_plotfile, .dump_checkpoint, .dump_restart
or .dump_particle file.Restarts and Runtime Parameters
Flash-X outputs the runtime parameters of a simulation to all checkpoint
files. When a simulation is restarted, these values are known by the
RuntimeParameters
unit while the code is running. On a restart, all
values from the checkpoint used in the restart are stored as previous
values in the lists kept by the RuntimeParameters
unit. All current
values are taken from the defaults used by Flash-X and any simulation
parameter files (e.g., flash.par
). If needed, the previous values
from the checkpoint file can be obtained using the routines
RuntimeParameters/RuntimeParameters_getPrev
.
Output Scalars
In Flash-X, each unit has the opportunity to request scalar data to be
output to checkpoint or plotfiles. Because there is no central database,
each unit “owns” different data in the simulation. For example, the
Driver
unit owns the timestep variable dt
, the simulation
variable simTime
, and the simulation step number nStep
. The
Grid
unit owns the sizes of each block, nxb
, nyb
, and
nzb
. The IO
unit owns the variable checkpointFileNumber
.
Each of these quantities are output into checkpoint files. Instead of
hard coding the values into checkpoint routines, Flash-X offers a more
flexible interface whereby each unit sends its data to the IO
unit.
The IO
unit then stores these values in a linked list and writes
them to the checkpoint file or plotfile. Each unit has a routine called
“Unit_sendOutputData
”, e.g., Driver/Driver_sendOutputData
and Grid/Grid_sendOutputData
. These routines in turn call
IO/IO_setScalar
. For example, the routine
Grid/Grid_sendOutputData
calls
IO_setScalar(“nxb”, NXB) IO_setScalar(“nyb”, NYB) IO_setScalar(“nzb”, NZB)
To output additional simulation scalars in a checkpoint file, the user
should override one of the “Unit_sendOutputData
” or
Simulation_sendOutputData
.
After restarting a simulation from a checkpoint file, a unit might call
IO/IO_getScalar
to reset a variable value. For example, the
Driver
unit calls IO_getScalar("dt", dr_dt)
to get the value of
the timestep dt
reinitialized from the checkpoint file. A value from
the checkpoint file can be obtained by calling IO/IO_getPrevScalar
.
This call can take an optional argument to find out if an error has
occurred in finding the previous value, most commonly because the value
was not found in the checkpoint file. By using this argument, the user
can then decide what to do if the value is not found. If the scalar
value is not found and the optional argument is not used, then the
subroutine will call Driver/Driver_abortFlash
and terminate the run.
Output User-defined Arrays
Often in a simulation the user needs to output additional information to
a checkpoint or plotfile which is not a grid scope variable. In Flash-X
any additional information had to be hard coded into the simulation. In
Flash-X, we have provided a general interface IO/IO_writeUserArray
and IO/IO_readUserArray
which allows the user to write and read any
generic array needed to be stored. The above two functions do not have
any implementation and it is up to the user to fill them in with the
needed calls to the HDF5 or pnetCDF C routines. We provide
implementation for reading and writing integer and double precision
arrays with the helper routines io_h5write_generic_iarr
,
io_h5write_generic_rarr
, io_ncmpi_write_generic_iarr
, and
io_ncmpi_write_generic_rarr
. Data is written out as a 1-dimensional
array, but the user can write multidimensional arrays simply by passing
a reference to the data and the total number of elements to write. See
these routines and the simulation StirTurb
for details on their
usage.
Output Scratch Variables
In Flash-X a user can allocate space for a scratch or temporary variable
with grid scope using one of the Config
keywords SCRATCHVAR
,
SCRATCHCENTERVAR
, SCRATCHFACEXVAR
,SCRATCHFACEYVAR
or
SCRATCHFACEZVAR
(see ). To output these scratch variables, the user
only needs to set the values of the runtime parameters
IO/plot_grid_var_1
, IO/plot_grid_var_2
, etc., by setting them
in the flash.par
file. For example to output the magnitude of
vorticity with a declaration in a Config
file of
SCRATCHVAR mvrt
:
plot_grid_var_1 = “mvrt”
Note that the post-processing routines like fidlr
do not display
these variables, although they are present in the output file. Future
implementations may support this visualization.
Face-Centered Data
Face-centered variables are now output to checkpoint files, when they are declared in a configuration file. Presently, up to nine face-centered variables are supported in checkpoint files. Plotfile output of face-centered data is not yet supported.
Output Filenames
Flash-X constructs the output filenames based on the user-supplied
basename, (runtime parameter basenm
) and the file counter that is
incremented after each output. Additionally, information about the file
type and data storage is included in the filename. The general
checkpoint filename is:
- ``basename_s0000_left{begin{array}{c}mathtt{hdf5}\ mathtt{ncmpi}\
end{array}right}_chk_0000``,
where hdf5
or ncmpi
(prefix for PnetCDF) is picked depending on
the particular IO implementation, the number following the “s” is the
split file number, if split file IO is in use, and the number at the end
of the filename is the current checkpointFileNumber. (The PnetCDF
function prefix “ncmpi
” derived from the serial NetCDF calls
beginning with “nc
”)
The general plotfile filename is:
- ``basename_s0000_left{begin{array}{c}
mathtt{hdf5}\ mathtt{ncmpi}\ end{array}right}_plt_left{ begin{array}{c}mathtt{crn}\ mathtt{cnt}\ end{array}right}_0000``,
where hdf5
or ncmpi
is picked depending on the IO implementation
used, crn
and cnt
indicate data stored at the cell corners or
centers respectively, the number following “s” is the split file number,
if used, and the number at the end of the filename is the current value
of plotfileNumber
. crn
is reserved, even though corner data
output is not presently supported by Flash-X’s IO.
Output Formats
HDF5 is our most most widely used IO library although Parallel-NetCDF is rapidly gaining acceptance among the high performance computing community. In Flash-X we also offer a serial direct FORTRAN IO which is currently only implemented for the uniform grid. This option is intended to provide users a way to output data if they do not have access to HDF5 or PnetCDF. Additionally, if HDF5 or PnetCDF are not performing well on a given platform the direct IO implementation can be used as a last resort. Our tools, fidlr and sfocu (), do not currently support the direct IO implementation, and the output files from this mode are not portable across platforms.
HDF5
HDF5 is supported on a large variety of platforms and offers large file support and parallel IO via MPI-IO. Information about the different versions of HDF can be found at https://support.hdfgroup.org/documentation/. The IO in Flash-X implementations require HDF5 1.4.0 or later. Please note that HDF5 1.6.2 requires IDL 1.6 or higher in order to use fidlr3.0 for post processing.
Implementations of the HDF5
IO
unit use the HDF application
programming interface (API) for organizing data in a database fashion.
In addition to the raw data, information about the data type and byte
ordering (little- or big-endian), rank, and dimensions of the dataset is
stored. This makes the HDF format extremely portable across platforms.
Different packages can query the file for its contents without knowing
the details of the routine that generated the data.
Flash-X provides different HDF5 IO unit implementations – the serial and
parallel versions for each supported grid, Uniform Grid and
PARAMESH
. It is important to remember to match the IO implementation
with the correct grid, although the setup
script generally takes
care of this matching. PARAMESH
2, PARAMESH
4.0, and
PARAMESH
4dev all work with the PARAMESH
(PM) implementation of
IO. Nonfixed blocksize IO has its own implementation in parallel, and is
presently not supported in serial mode. Examples are given below for the
five different HDF5 IO implementations.
./setup Sod -2d -auto -unit=IO/IOMain/hdf5/serial/PM (included by default) ./setup Sod -2d -auto -unit=IO/IOMain/hdf5/parallel/PM ./setup Sod -2d -auto -unit=Grid/GridMain/UG -unit=IO/IOMain/hdf5/serial/UG ./setup Sod -2d -auto -unit=Grid/GridMain/UG -unit=IO/IOMain/hdf5/parallel/UG ./setup Sod -2d -auto -nofbs -unit=Grid/GridMain/UG -unit=IO/IOMain/hdf5/parallel/NoFbs
The default IO implementation is IO/IOMain/hdf5/serial/PM
. It can be
included simply by adding -unit=IO
to the setup
line. In
Flash-X, the user can set up shortcuts See for more information about
creating shortcuts.
The format of the HDF5 output files produced by these various IO
implementations is identical; only the method by which they are written
differs. It is possible to create a checkpoint file with the parallel
routines and restart Flash-X from that file using the serial routines or
vice-versa. (This switch would require resetting up and compiling a code
to get an executable with the serial version of IO.) When outputting
with the Uniform Grid, some data is stored that isn’t explicitly
necessary for data analysis or visualization, but is retained to keep
the output format of PARAMESH
the same as with the Uniform Grid. See
for more information on output data formats. For example, the refinement
level in the Uniform Grid case is always equal to 1, as is the nodetype
array. A tree structure for the Uniform Grid is ‘faked’ for
visualization purposes. In a similar way, the non-fixedblocksize mode
outputs all of the data stored by the grid as though it is one large
block. This allows restarting with differing numbers of processors and
decomposing the domain in an arbitrary fashion in Uniform Grid.
Parallel HDF5 mode has two runtime parameters useful for debugging:
IO/chkGuardCellsInput
and IO/chkGuardCellsOutput
. When these
runtime parameters are true, the Flash-X input and output routines read
and/or output the guard cells in addition to the normal interior cells.
Note that the HDF5 files produced are not compatible with the
visualization and analysis tools provided with Flash-X.
Collective Mode
By default, the parallel mode of HDF5 uses an independent access pattern
for writing datasets and performs IO without aggregating the disk access
for writing. Parallel HDF5 can also be run so that the writes to the
file’s datasets are aggregated, allowing the data from multiple
processors to be written to disk in fewer operations. This can greatly
increase the performance of IO on filesystems that support this
behavior. Flash-X can make use of this mode by setting the runtime
parameter useCollectiveHDF5
to true.
Machine Compatibility
The HDF5 modules have been tested successfully on the ASC platforms and on a Linux clusters. Performance varies widely across the platforms, but the parallel version is usually faster than the serial version. Experience on performing parallel IO on a Linux Cluster using PVFS is reported in Ross et al. (2001). Note that for clusters without a parallel filesystem, you should not use the parallel HDF5 IO module with an NFS mounted filesystem. In this case, all of the information will still have to pass through the node from which the disk is hanging, resulting in contention. It is recommended that a serial version of the HDF5 unit be used instead.
HDF5 Data Format
The HDF5 data format for Flash-X is identical to Flash-X for all grid
variables and datastructures used to recreate the tree and neighbor data
with the exception that bounding box
, coordinates
, and
block size
are now sized as mdim
, or the maximum dimensions
supported by Flash-X’s grids, which is three, rather than ndim
.
PARAMESH
4.0 and PARAMESH
4dev, however, do requires a few
additional tree data structures to be output which are described below.
The format of the metadata stored in the HDF5 files has changed to
reduce the number of ‘writes’ required. Additionally, scalar data, like
time, dt, nstep
, etc., are now stored in a linked list and written
all at one time. Any unit can add scalar data to the checkpoint file by
calling the routine IO/IO_setScalar
. See for more details. The
Flash-X HDF5 format is summarized in .
Split File IO
On machines with large numbers of processors, IO may perform better if,
all processors write to a limited number of separate files rather than
one single file. This technique can help mitigate IO bottlenecks and
contention issues on these large machines better than even parallel-mode
IO can. In addition this technique has the benefit of keeping the number
of output files much lower than if every processor writes its own file.
Split file IO can be enabled by setting the IO/outputSplitNum
parameter to the number of files desired (i.e. if outputSplitNum
is
set to 4, every checkpoint, plotfile and particle file will be broken
into 4 files, by processor number). This feature is only available with
the HDF5 parallel IO mode, and is still experimental. Users should use
this at their own risk.
Parallel-NetCDF
Another implementation of the IO unit uses the Parallel-NetCDF library available at http://www.mcs.anl.gov/parallel-netcdf/. At this time, the Flash-X code requires version 1.1.0 or higher. Our testing shows performance of PNetCDF library to be very similar to HDF5 library when using collective I/O optimizations in parallel I/O mode.
There are two different PnetCDF IO unit implementations. Both are
parallel implementations, one for each supported grid, the Uniform Grid
and PARAMESH
. It is important to remember to match the IO
implementation with the correct grid. To include PnetCDF IO in a
simulation the user should add -unit=IO/IOMain/pnetcdf.....
to the
setup
line. See examples below for the two different PnetCDF IO
implementations.
./setup Sod -2d -auto -unit=IO/IOMain/pnetcdf/PM ./setup Sod -2d -auto -unit=Grid/GridMain/UG -unit=IO/IOMain/pnetcdf/UG
The paths to these IO implementations can be long and tedious to type, users are advised to set up shortcuts for various implementations. See for information about creating shortcuts.
To the end-user, the PnetCDF data format is very similar to the HDF5
format. (Under the hood the data storage is quite different.) In HDF5
there are datasets and dataspaces, in PnetCDF there are dimensions and
variables. All the same data is stored in the PnetCDF checkpoint as in
the HDF5 checkpoint file, although there are some differences in how the
data is stored. The grid data is stored in multidimensional arrays, as
it is in HDF5. These are unknown names, refine level, node type, gid,
coordinates, proc number, block size and bounding box. The particles
data structure is also stored in the same way. The simulation metadata,
like file format version, file creation time, command line, etc., are
stored as global attributes. The runtime parameters and the output
scalars are also stored as attributes. The unk
and particle labels
are also stored as global attributes. In PnetCDF, all global quantities
must be consistent across all processors involved in a write to a file,
or else the write will fail. All IO calls are run in a collective mode
in PnetCDF.
Direct IO
As mentioned above, the direct IO implementation has been added so users
can always output data even if the HDF5 or pnetCDF libraries are
unavailable. The user should examine the two helper routines
io_writeData
and io_readData
. Copy the base implementation to a
simulation directory, and modify them in order to write out specifically
what is needed. To include the direct IO implementation add the
following to your setup line:
-unit=IO/IOMain/direct/UG or -unit=IO/IOMain/direct/PM
Output Side Effects
In Flash-X when plotfiles or checkpoint files are output by
IO/IO_output
, the grid is fully restricted and user variables are
computed prior to writing the file. IO/IO_writeCheckpoint
and
IO/IO_writePlotfile
by default, do not do this step themselves. The
restriction can be forced for all writes by setting runtime parameter
IO/alwaysRestrictCheckpoint
to true and the user variables can
always be computed prior to output by setting
IO/alwaysComputeUserVars
to true.
Working with Output Files
The checkpoint file output formats offer great flexibility when
visualizing the data. The visualization program does not have to know
the details of how the file was written; rather it can query the file to
find the number of dimensions, block sizes, variable data etc that it
needs to visualize the data. IDL
routines for reading HDF5 and
PnetCDF formats are provided in tools/fidlr3/
. These can be used
interactively though the IDL
command line (see ). In addition, ViSit
version 10.0 and higher (see ) can natively read Flash-X HDF5 output
files by using the command line option -assume_format |flashx|
.
Unit Test
The IO
unit test is provided to test IO performance on various
platforms with the different Flash-X IO implementations and parallel
libraries. The unitTest
is setup like any other Flash-X simulation.
It can be run with any IO implementation as long as the correct Grid
implementation is included. This unitTest
writes a checkpoint file,
a plotfile, and if particles are included, a particle file. Particles IO
can be tested simply by including particles in the simulation. Variables
needed for particles should be uncommented in the Config
file.
Example setups:
#setup for PARAMESH Grid and serial HDF5 io ./setup unitTest/IO -auto
#setup for PARAMESH Grid with parallel HDF5 IO (see shortcuts docs for explanation) ./setup unitTest/IO -auto +parallelIO (same as) ./setup unitTest/IO -auto -unit=IO/IOMain/hdf5/parallel/PM
#setup for Uniform Grid with serial HDF5 IO, 3d problem, increasing default number of zones ./setup unitTest/IO -3d -auto +ug -nxb=16 -nyb=16 -nzb=16 (same as) ./setup unitTest/IO -3d -auto -unit=Grid/GridMain/UG -nxb=16 -nyb=16 -nzb=16
#setup for PM3 and parallel netCDF, with particles ./setup unitTest/IO -auto -unit=Particles +pnetcdf
#setup for UG and parallel netCDF ./setup unitTest/IO -auto +pnetcdf +ug
Run the test like any other Flash-X simulation:
mpirun -np numProcs flash3
There are a few things to keep in mind when working with the IO unit test:
The Config file in unitTest/IO declares some dummy grid scope variables which are stored in the unk array. If the user wants a more intensive IO test, more variables can be added. Variables are initialized to dummy values in
Driver_evolveFlash
.Variables will only be output to the plotfile if they are declared in the
flash.par
(see the exampleflash.par
in the unit test).The only units besides the simulation unit included in this simulation are
Grid, IO, Driver, Timers, Logfile, RuntimeParameters
andPhysicalConstants
.If the
PARAMESH
Grid implementation is being used, it is important to note that the grid will not refine on its own. The user should setlrefine_min
to a value \(>\) 1 to create more blocks. The user could also set the runtime parametersnblockx
,nblocky
,nblockz
to make a bigger problem.Just like any other simulation, the user can change the number of zones in a simulation using
-nxb=numZones
on the setup line.
Derived data type I/O
In Flash-X we introduced an alternative I/O implementation for both HDF5 and Parallel-NetCDF which is a slight spin on the standard parallel I/O implementations. In this new implementation we select the data from the mesh data structures directly using HDF5 hyperslabs (HDF5) and MPI derived datatypes (Parallel-NetCDF) and then write the selected data to datasets in the file. This eliminates the need for manually copying data into a Flash-X allocated temporary buffer and then writing the data from the temporary buffer to disk.
You can include derived data type I/O in your Flash-X application by
adding the setup shortcuts +hdf5TypeIO
for HDF5 and +pnetTypeIO
for Parallel-NetCDF to your setup line. If you are using the HDF5
implementation then you need a parallel installation of HDF5. All of the
runtime parameters introduced in this chapter should be compatible with
derived data type I/O.
A nice property of derived data type I/O is that it eliminates a lot of the I/O code duplication which has been spreading in the Flash-X I/O unit over the last decade. The same code is used for UG, NoFBS and Paramesh Flash-X applications and we have also shared code between the HDF5 and Parallel-NetCDF implementations. A technical reason for using the new I/O implementation is that we provide more information to the I/O libraries about the exact data we want to read from / write to disk. This allows us to take advantage of recent enhancements to I/O libraries such as the nonblocking APIs in the Parallel-NetCDF library. We discuss experimentation with this API and other ideas in the paper “A Case Study for Scientific I/O: Improving the Flash-X Astrophysics Code” www.mcs.anl.gov/uploads/cels/papers/P1819.pdf
The new I/O code has been tested in our internal Flash-X regression tests from before the Flash-X release and there are no known issues, however, it will probably be in the release following Flash-X when we will recommend using it as the default implementation. We have made the research ideas from our case study paper usable for all Flash-X applications, however, the code still needs a clean up and exhaustive testing with all the Flash-X runtime parameters introduced in this chapter.