Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions benchmark/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,12 +29,12 @@ and `--gz` options define the number of GPUs to run on (should be equivalent to
and Z dimensions of the domain to perform the FFT. `--csvfile` is the name of the file for the benchmark runner script to record results.
The final positional option is the configuration name (from `benchmark_config.yaml`) to use for the run.

To visualize the benchmark results, a [`plot_heatmaps.py`](heatmap_scripts/plot_heatmaps.py) script to plot heatmaps from the data captured in the csv files. Running the script on the a csv file like the following:
To visualize the benchmark results, use the [`plot_heatmaps.py`](heatmap_scripts/plot_heatmaps.py) script to plot heatmaps from the data captured in the CSV files. Running the script on a CSV file like the following:
```
python plot_heatmaps.py --csvfile benchmark_c2c.dgxa100.8gpu.n1024.csv --output_prefix benchmark_c2c.dgxa100.8gpu.n1024
```
will generate one or more image files `benchmark_c2c.dgxa100.8gpu.n1024_*.png` that contain heatmap plots, with each file corresponding to a distinct set of options (e.g. precision, axis-contiguous settings, in-place or out-of-place, etc.), which are listed in the plot title.
Several sample csv files and generated heatmap plots for 2048^3 C2C FFTs on a DGX A100 (80GB) system using NVHPC SDK 22.5, can be found in the [samples](heatmap_scripts/samples) directory.
Several sample CSV files and generated heatmap plots for 2048^3 C2C FFTs on a DGX A100 (80GB) system using NVHPC SDK 22.5 can be found in the [sample](heatmap_scripts/sample) directory.

We can examine one of the sample plots ([`benchmark_c2c.dgxa100.8gpu.n2048_1.png`](heatmap_scripts/sample/benchmark_c2c.dgxa100.8gpu.n2048_1.png)), shown below, to explain the content.
![heatmap_example](heatmap_scripts/sample/benchmark_c2c.dgxa100.8gpu.n2048_1.png?raw=true)
Expand Down
4 changes: 2 additions & 2 deletions docs/api/c_api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ Communication Backends

.. _cudecompTransposeCommBackend_t-ref:

cudecompTranposeCommBackend_t
cudecompTransposeCommBackend_t
_____________________________
.. doxygenenum :: cudecompTransposeCommBackend_t
Expand Down Expand Up @@ -233,7 +233,7 @@ _____________________

.. _cudecompTransposeCommBackendToString-ref:

cudecompTranposeCommBackendToString
cudecompTransposeCommBackendToString
___________________________________
.. doxygenfunction:: cudecompTransposeCommBackendToString

Expand Down
24 changes: 12 additions & 12 deletions docs/api/f_api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -113,17 +113,17 @@ Communication Backends

.. _cudecompTransposeCommBackend_t-f-ref:

cudecompTranposeCommBackend
cudecompTransposeCommBackend
_____________________________
See documention for equivalent C enumerator, :ref:`cudecompTransposeCommBackend_t-ref`.
See documentation for equivalent C enumerator, :ref:`cudecompTransposeCommBackend_t-ref`.

------

.. _cudecompHaloCommBackend_t-f-ref:

cudecompHaloCommBackend
_________________________
See documention for equivalent C enumerator, :ref:`cudecompHaloCommBackend_t-ref`.
See documentation for equivalent C enumerator, :ref:`cudecompHaloCommBackend_t-ref`.

------

Expand All @@ -134,31 +134,31 @@ Additional Enumerators

cudecompDataType
__________________
See documention for equivalent C enumerator, :ref:`cudecompDataType_t-ref`.
See documentation for equivalent C enumerator, :ref:`cudecompDataType_t-ref`.

------

.. _cudecompAutotuneGridMode_t-f-ref:

cudecompAutotuneGridMode
__________________________
See documention for equivalent C enumerator, :ref:`cudecompAutotuneGridMode_t-ref`.
See documentation for equivalent C enumerator, :ref:`cudecompAutotuneGridMode_t-ref`.

------

.. _cudecompRankOrder_t-f-ref:

cudecompRankOrder
__________________
See documention for equivalent C enumerator, :ref:`cudecompRankOrder_t-ref`.
See documentation for equivalent C enumerator, :ref:`cudecompRankOrder_t-ref`.

------

.. _cudecompResult_t-f-ref:

cudecompResult
________________
See documention for equivalent C enumerator, :ref:`cudecompResult_t-ref`.
See documentation for equivalent C enumerator, :ref:`cudecompResult_t-ref`.

Functions
==========================
Expand Down Expand Up @@ -273,7 +273,7 @@ _________________________________
Queries the required transpose workspace size, in elements, for a provided grid descriptor.

This function queries the required workspace size, in elements, for transposition communication using a provided grid descriptor. This workspace is required to faciliate local transposition/packing/unpacking operations, or for use as a staging buffer.
This function queries the required workspace size, in elements, for transposition communication using a provided grid descriptor. This workspace is required to facilitate local transposition/packing/unpacking operations, or for use as a staging buffer.

:p cudecompHandle handle [in]: The initialized cuDecomp library handle
:p cudecompGridDesc grid_desc [in]: A cuDecomp grid descriptor.
Expand All @@ -288,9 +288,9 @@ cudecompGetHaloWorkspaceSize
____________________________
.. f:function:: cudecompGetHaloWorkspaceSize(handle, grid_desc, axis, halo_extents, workspace_size)
Queries the required transpose workspace size, in elements, for a provided grid descriptor.
Queries the required halo workspace size, in elements, for a provided grid descriptor.

This function queries the required workspace size, in elements, for transposition communication using a provided grid descriptor. This workspace is required to faciliate local transposition/packing/unpacking operations, or for use as a staging buffer.
This function queries the required workspace size, in elements, for halo communication using a provided grid descriptor. This workspace is required to facilitate local packing operations for halo regions that are not contiguous in memory, or for use as a staging buffer.

:p cudecompHandle handle [in]: The initialized cuDecomp library handle
:p cudecompGridDesc grid_desc [in]: A cuDecomp grid descriptor.
Expand Down Expand Up @@ -378,14 +378,14 @@ _____________________

.. _cudecompTransposeCommBackendToString-f-ref:

cudecompTranposeCommBackendToString
cudecompTransposeCommBackendToString
___________________________________

.. f:function:: cudecompTransposeCommBackendToString(comm_backend)
Function to get string name of transpose communication backend.

:p cudecompTransposeCommBackend comm_backend [in]: A cuDecompTranposeCommBackend value.
:p cudecompTransposeCommBackend comm_backend [in]: A cudecompTransposeCommBackend value.
:r character(:) res: A string representation of the transpose communication backend. Will return string “ERROR” if invalid backend value is provided.

------
Expand Down
6 changes: 3 additions & 3 deletions include/cudecomp.h
Original file line number Diff line number Diff line change
Expand Up @@ -322,8 +322,8 @@ cudecompResult_t cudecompGetPencilInfo(cudecompHandle_t handle, cudecompGridDesc
/**
* @brief Queries the required transpose workspace size, in elements, for a provided grid descriptor.
* @details This function queries the required workspace size, in elements, for transposition communication using
* a provided grid descriptor. This workspace is required to faciliate local transposition/packing/unpacking operations,
* or for use as a staging buffer.
* a provided grid descriptor. This workspace is required to facilitate local transposition/packing/unpacking
* operations, or for use as a staging buffer.
* @param[in] handle The initialized cuDecomp library handle
* @param[in] grid_desc A cuDecomp grid descriptor
* @param[out] workspace_size A pointer to a 64-bit integer to write the workspace size
Expand All @@ -336,7 +336,7 @@ cudecompResult_t cudecompGetTransposeWorkspaceSize(cudecompHandle_t handle, cude
/**
* @brief Queries the required halo workspace size, in elements, for a provided grid descriptor.
* @details This function queries the required workspace size, in elements, for halo communication using
* a provided grid descriptor. This workspace is required to faciliate local packing operations for halo regions that
* a provided grid descriptor. This workspace is required to facilitate local packing operations for halo regions that
* are not contiguous in memory, or for use as a staging buffer.
* @param[in] handle The initialized cuDecomp library handle
* @param[in] grid_desc A cuDecomp grid descriptor
Expand Down
Loading