`communication`

moment_kinetics.communication — Module

Communication functions and setup

Split the grid into 'blocks'. Each block can use shared memory (MPI shared memory arrays). At the moment only works with a single 'block' containing the whole grid - eventually add more MPI communication functions to communicate between blocks. A block should probably be a 'NUMA region' for maximum efficiency.

Note: charge-exchange collisions loop over neutral species for each ion species. At the moment this loop is not parallelised (although it could be, by introducing some more loop ranges), as at the moment we only run with 1 ion species and 1 neutral species.

moment_kinetics.communication.anysv_isubblock_index — Constant

moment_kinetics.communication.anysv_nsubblocks_per_block — Constant

moment_kinetics.communication.anysv_subblock_rank — Constant

moment_kinetics.communication.anysv_subblock_size — Constant

moment_kinetics.communication.block_rank — Constant

moment_kinetics.communication.block_size — Constant

moment_kinetics.communication.comm_anysv_subblock — Constant

Communicator for the local velocity-space subset of a shared-memory block in a 'anysv' region

The 'anysv' region is used to parallelise the collision operator. See moment_kinetics.looping.get_best_anysv_split.

Must use a Ref{MPI.Comm} to allow a non-const MPI.Comm to be stored. Need to actually assign to this and not just copy a pointer into the .val member because otherwise the MPI.Comm object created by MPI.Comm_split() would be deleted, which probably makes MPI.jl delete the communicator.

moment_kinetics.communication.comm_block — Constant

Communicator connecting a shared-memory region

Must use a Ref{MPI.Comm} to allow a non-const MPI.Comm to be stored. Need to actually assign to this and not just copy a pointer into the .val member because otherwise the MPI.Comm object created by MPI.Comm_split() would be deleted, which probably makes MPI.jl delete the communicator.

moment_kinetics.communication.comm_inter_block — Constant

Communicator connecting the root processes of each shared memory block

Must use a Ref{MPI.Comm} to allow a non-const MPI.Comm to be stored. Need to actually assign to this and not just copy a pointer into the .val member because otherwise the MPI.Comm object created by MPI.Comm_split() would be deleted, which probably makes MPI.jl delete the communicator.

moment_kinetics.communication.comm_world — Constant

Can use a const MPI.Comm for comm_world and just copy the pointer from MPI.COMM_WORLD because MPI.COMM_WORLD is never deleted, so pointer stays valid.

moment_kinetics.communication.global_Win_store — Constant

moment_kinetics.communication.global_rank — Constant

moment_kinetics.communication.global_size — Constant

moment_kinetics.communication.iblock_index — Constant

moment_kinetics.communication.n_blocks — Constant

moment_kinetics.communication.MPISharedArray — Type

Type used to declare a shared-memory array. When debugging is not active MPISharedArray is just an alias for Array, but when @debug_shared_array is activated, it is instead defined as an alias for DebugMPISharedArray.

moment_kinetics.communication.__init__ — Method

moment_kinetics.communication._anysv_subblock_synchronize — Method

Internal function called by anysv synchronization macros.

moment_kinetics.communication._block_synchronize — Method

Internal function to be called by @blocksynchronize() and @begin*region(). call_site will be either nothing or a hash of the file and line number of the calling site of the function.

Note: some debugging code currently assumes that if blocksynchronize() is called on one block, it is called simultaneously on all blocks. It seems likely that this will always be true, but if it ever changes (i.e. different blocks doing totally different work), the debugging routines need to be updated.

moment_kinetics.communication.allocate_shared — Method

Get a shared-memory array of mk_float (shared by all processes in a 'block')

Create a shared-memory array using MPI.Win_allocate_shared(). Pointer to the memory allocated is wrapped in a Julia array. Memory is not managed by the Julia array though. A reference to the MPI.Win needs to be freed - this is done by saving the MPI.Win into a Vector in the Communication module, which has all its entries freed by the finalize_comms!() function, which should be called when moment_kinetics is done running a simulation/test.

Arguments

dims - mkint or Tuple{mkint} Dimensions of the array to be created. Dimensions passed define the size of the array which is being handled by the 'block' (rather than the global array, or a subset for a single process). comm - MPI.Comm, default comm_block[] MPI communicator containing the processes that share the array. maybe_debug - Bool Can be set to false to force not creating a DebugMPISharedArray when debugging is active. This avoids recursion when including a shared-memory array as a member of a DebugMPISharedArray for debugging purposes.

Returns

Array{mk_float}

moment_kinetics.communication.finalize_comms! — Method

Clean up from communications

Do any needed clean-up for MPI, etc. Does not call MPI.Finalize() - this is called anyway when Julia exits, and we do not want to call it explicitly so that multiple runs can be done in a single Julia session.

Frees any shared-memory arrays.

moment_kinetics.communication.free_shared_arrays — Method

moment_kinetics.communication.halo_swap! — Function

halo_swap!(x::AbstractArray, r, z)

Enforce consistency of 'halo cells' - i.e. the grid points on block boundaries (in the $r$ and $z$ directions) that are shared by the grids owned by two processes (or more on block corners).

For consistency when adding random noise, just chooses the value from the inner/lower process rather than averaging.

moment_kinetics.communication.initialize_comms! — Method

Set up communications

Check that global variables are in the correct state (i.e. caches were emptied correctly if they were used before).

Also does some set up for debugging routines, if they are active.

moment_kinetics.communication.setup_distributed_memory_MPI — Method

Function to take information from user about r z grids and number of processes allocated to set up communicators notation definitions: - block: group of processes that share data with shared memory - z group: group of processes that need to communicate data for z derivatives - r group: group of processes that need to communicate data for r derivatives This routine assumes that the number of processes is selected by the user to match exactly the number the ratio

nblocks = (rnelementglobal/rnelementlocal)*(znelementglobal/znelementlocal)

This guarantees perfect load balancing. Shared memory is used to parallelise the other dimensions within each distributed-memory parallelised rz block.

moment_kinetics.communication.setup_distributed_memory_MPI_for_weights_precomputation — Method

Function to take information from user about vpa vperp grids and number of processes allocated to set up communicators for precomputation of the Rosenbluth potential integration weights notation definitions: - block: group of processes that share data with shared memory - vpa group: group of processes that need to communicate data for vpa derivatives/integrals - vperp group: group of processes that need to communicate data for vperp derivatives/integrals This routine assumes that the number of processes is selected by the user to match or be larger than the ratio

nblocks = (vpanelementglobal/vpanelementlocal)*(vperpnelementglobal/vperpnelementlocal)

We also need to know (from user input) the maximum number of cores per shared memory region. A fraction of the cores will not contribute to the calculation, as we cannot guarantee that the same number of cores is required for the rz parallelisation as the vpa vperp parallelisation

moment_kinetics.communication.setup_serial_MPI — Method

Used for post-processing when we want various communicators to be initialised, but always for serial operation.

moment_kinetics.communication.@_anysv_subblock_synchronize — Macro

Call an MPI Barrier for all processors in an 'anysv' sub-block.

The 'anysv' region is used to parallelise the collision operator. See moment_kinetics.looping.get_best_anysv_split.

Used to synchronise processors that are working on the same shared-memory array(s) between operations, to avoid race conditions. Should be even cheaper than @_block_synchronize because it only requires communication on a smaller communicator.

Note: _anysv_subblock_synchronize() may be called different numbers of times on different sub-blocks, depending on how the species and spatial dimensions are split up. @debug_detect_redundant_block_synchronize is not implemented (yet?) for _anysv_subblock_synchronize().

moment_kinetics.communication.@_block_synchronize — Macro

Call an MPI Barrier for all processors in a block.

Used to synchronise processors that are working on the same shared-memory array(s) between operations, to avoid race conditions. Should be (much) cheaper than a global MPI Barrier because it only requires communication within a single node.