communication

moment_kinetics.communicationModule

Communication functions and setup

Split the grid into 'blocks'. Each block can use shared memory (MPI shared memory arrays). At the moment only works with a single 'block' containing the whole grid - eventually add more MPI communication functions to communicate between blocks. A block should probably be a 'NUMA region' for maximum efficiency.

Note: charge-exchange collisions loop over neutral species for each ion species. At the moment this loop is not parallelised (although it could be, by introducing some more loop ranges), as at the moment we only run with 1 ion species and 1 neutral species.

source
moment_kinetics.communication.comm_anyv_subblockConstant

Communicator for the local velocity-space subset of a shared-memory block in a 'anyv' region

The 'anyv' region is used to parallelise the collision operator. See moment_kinetics.looping.get_best_anyv_split.

Must use a Ref{MPI.Comm} to allow a non-const MPI.Comm to be stored. Need to actually assign to this and not just copy a pointer into the .val member because otherwise the MPI.Comm object created by MPI.Comm_split() would be deleted, which probably makes MPI.jl delete the communicator.

source
moment_kinetics.communication.comm_blockConstant

Communicator connecting a shared-memory region

Must use a Ref{MPI.Comm} to allow a non-const MPI.Comm to be stored. Need to actually assign to this and not just copy a pointer into the .val member because otherwise the MPI.Comm object created by MPI.Comm_split() would be deleted, which probably makes MPI.jl delete the communicator.

source
moment_kinetics.communication.comm_inter_blockConstant

Communicator connecting the root processes of each shared memory block

Must use a Ref{MPI.Comm} to allow a non-const MPI.Comm to be stored. Need to actually assign to this and not just copy a pointer into the .val member because otherwise the MPI.Comm object created by MPI.Comm_split() would be deleted, which probably makes MPI.jl delete the communicator.

source
moment_kinetics.communication.MPISharedArrayType

Type used to declare a shared-memory array. When debugging is not active MPISharedArray is just an alias for Array, but when @debug_shared_array is activated, it is instead defined as an alias for DebugMPISharedArray.

source
moment_kinetics.communication._anyv_subblock_synchronizeMethod

Call an MPI Barrier for all processors in an 'anyv' sub-block.

The 'anyv' region is used to parallelise the collision operator. See moment_kinetics.looping.get_best_anyv_split.

Used to synchronise processors that are working on the same shared-memory array(s) between operations, to avoid race conditions. Should be even cheaper than _block_synchronize because it only requires communication on a smaller communicator.

Note: _anyv_subblock_synchronize() may be called different numbers of times on different sub-blocks, depending on how the species and spatial dimensions are split up. @debug_detect_redundant_block_synchronize is not implemented (yet?) for _anyv_subblock_synchronize().

source
moment_kinetics.communication._block_synchronizeMethod

Call an MPI Barrier for all processors in a block.

Used to synchronise processors that are working on the same shared-memory array(s) between operations, to avoid race conditions. Should be (much) cheaper than a global MPI Barrier because it only requires communication within a single node.

Note: some debugging code currently assumes that if blocksynchronize() is called on one block, it is called simultaneously on all blocks. It seems likely that this will always be true, but if it ever changes (i.e. different blocks doing totally different work), the debugging routines need to be updated.

source
moment_kinetics.communication.allocate_sharedMethod

Get a shared-memory array of mk_float (shared by all processes in a 'block')

Create a shared-memory array using MPI.Win_allocate_shared(). Pointer to the memory allocated is wrapped in a Julia array. Memory is not managed by the Julia array though. A reference to the MPI.Win needs to be freed - this is done by saving the MPI.Win into a Vector in the Communication module, which has all its entries freed by the finalize_comms!() function, which should be called when moment_kinetics is done running a simulation/test.

Arguments

dims - mkint or Tuple{mkint} Dimensions of the array to be created. Dimensions passed define the size of the array which is being handled by the 'block' (rather than the global array, or a subset for a single process). comm - MPI.Comm, default comm_block[] MPI communicator containing the processes that share the array. maybe_debug - Bool Can be set to false to force not creating a DebugMPISharedArray when debugging is active. This avoids recursion when including a shared-memory array as a member of a DebugMPISharedArray for debugging purposes.

Returns

Array{mk_float}

source
moment_kinetics.communication.finalize_comms!Method

Clean up from communications

Do any needed clean-up for MPI, etc. Does not call MPI.Finalize() - this is called anyway when Julia exits, and we do not want to call it explicitly so that multiple runs can be done in a single Julia session.

Frees any shared-memory arrays.

source
moment_kinetics.communication.initialize_comms!Method

Set up communications

Check that global variables are in the correct state (i.e. caches were emptied correctly if they were used before).

Also does some set up for debugging routines, if they are active.

source
moment_kinetics.communication.setup_distributed_memory_MPIMethod

Function to take information from user about r z grids and number of processes allocated to set up communicators notation definitions: - block: group of processes that share data with shared memory - z group: group of processes that need to communicate data for z derivatives - r group: group of processes that need to communicate data for r derivatives This routine assumes that the number of processes is selected by the user to match exactly the number the ratio

nblocks = (rnelementglobal/rnelementlocal)*(znelementglobal/znelementlocal)

This guarantees perfect load balancing. Shared memory is used to parallelise the other dimensions within each distributed-memory parallelised rz block.

source
moment_kinetics.communication.setup_distributed_memory_MPI_for_weights_precomputationMethod

Function to take information from user about vpa vperp grids and number of processes allocated to set up communicators for precomputation of the Rosenbluth potential integration weights notation definitions: - block: group of processes that share data with shared memory - vpa group: group of processes that need to communicate data for vpa derivatives/integrals - vperp group: group of processes that need to communicate data for vperp derivatives/integrals This routine assumes that the number of processes is selected by the user to match or be larger than the ratio

nblocks = (vpanelementglobal/vpanelementlocal)*(vperpnelementglobal/vperpnelementlocal)

We also need to know (from user input) the maximum number of cores per shared memory region. A fraction of the cores will not contribute to the calculation, as we cannot guarantee that the same number of cores is required for the rz parallelisation as the vpa vperp parallelisation

source