Skip to content

Reduce the memory usage that is important for ne1024 simulation#4102

Open
sjsprecious wants to merge 6 commits into
ESCOMP:masterfrom
sjsprecious:reduce_init_memory
Open

Reduce the memory usage that is important for ne1024 simulation#4102
sjsprecious wants to merge 6 commits into
ESCOMP:masterfrom
sjsprecious:reduce_init_memory

Conversation

@sjsprecious

@sjsprecious sjsprecious commented Jun 25, 2026

Copy link
Copy Markdown

Description of changes

This PR introduces some changes in CDEPS that will be used in CTSM later and are critical to reduce memory usage of a simulation at ne1024 resolution. All the changes are done by Claude under my supervisory.

This PR requires a new tag from CDEPS once my PR (ESCOMP/CDEPS#414) is merged.


The goal is to cut CTSM initialization memory (and some init time) at high resolution (like ne1024), where per-rank data replication and duplicate ESMF mesh construction dominate startup cost.

The detailed edits:

  1. New per-node shared-memory helper: clm_shmem_mod.F90

A MPI-3 shared-memory module and specialized for CTSM's decomposition setup. The idea is that arrays that are otherwise allocated identically on every MPI rank instead get one physical copy per shared-memory node, mapped into every rank on that node — freeing ranks_per_node − 1 copies per node.

  • clm_shmem_alloc_i4_1d(ptr, win, n) — allocate a node-shared default-integer rank-1 array (only the node leader requests storage via MPI_Win_allocate_shared; peers map the leader's segment via MPI_Win_shared_query).

  • clm_shmem_leader_allreduce_sum_i4(ptr, win, n) — fence → node leaders sum partials across nodes over a leader-only communicator → fence to publish. Builds a globally-summed array in the shared buffer without every rank holding a global-sized copy.

  • clm_shmem_free / clm_shmem_fence / clm_shmem_is_leader / clm_shmem_leader_comm / clm_shmem_npes_per_node — lifecycle and query helpers; lazily build node-local and node-leader communicators via mpi_comm_split_type(MPI_COMM_TYPE_SHARED).

  1. lnd_set_decomp_and_domain.F90 — apply the shmem helper to the global land mask

The global land mask lndmask_glob(gsize) was previously allocated on every rank and built with an all-rank ESMF_VMAllReduce into a second global-sized temporary (itemp_glob). Now, in both code paths (lnd_set_lndmask_from_maskmesh and lnd_set_lndmask_from_lndmesh):

  • lndmask_glob is allocated once per node via clm_shmem_alloc_i4_1d, with a new lndmask_win window handle threaded through both subroutine signatures.

  • Leader zeroes it, fence, each rank fills its disjoint local indices, then clm_shmem_leader_allreduce_sum_i4 replaces the ESMF_VMAllReduce + itemp_glob temporary (the temporary is deleted entirely).

  • Cleanup is now branch-aware: the cmeps driver paths free via clm_shmem_free(lndmask_glob, lndmask_win); the lilac path still uses plain deallocate (it uses a plain allocate).

This removes two global-sized integer arrays per rank (the mask copy + the all-reduce temp), replaced by one node-shared copy.

  1. NetCDF file-handle close fixes

Closing pio file handles that were opened but closed late or never — frees buffers earlier in init:

  • clm_instMod.F90: moves ncd_pio_closefile(params_ncid) earlier — to right after its last use (bgc_vegetation_inst%Init) instead of at the end of init_accflds.
  • initVerticalMod.F90: moves ncd_pio_closefile(ncid) to right after the last read (STD_ELEV) instead of the end of initVertical.
  • UrbanParamsType.F90: adds a missing ncd_pio_closefile(ncid) on the early-return path (nlevurb == 0) that previously leaked the handle.
  • organicFileMod.F90: adds ncd_pio_closefile(ncid) after reading ORGANIC.
    surfrdMod.F90: adds two ncd_pio_closefile(ncid) calls after dimension reads complete (after the pft/cft dims, and after nlevurb).
  1. reuse already-built model mesh for redist streams
  • PrigentRoughnessStreamType.F90 / UrbanTimeVarType.F90: the changes are now done in CDEPS whenever these streams use stream_mapalgo='redist'

Specific notes

Contributors other than yourself, if any:

  • (Replace this text and add more list items as needed)

CTSM issues resolved or otherwise addressed, if any:

Resolves #4103

Any user interface changes (namelist or namelist defaults changes)?

No.

Testing planned or performed, if any:

aux_clm

Requirements before merge:

  • The code in this PR branch builds with no errors.
  • The code in this PR branch runs with no errors. Briefly describe tested configuration(s): aux_clm
  • This either (a) does not change answers, (b) it only changes answers at roundoff level, or (c) I have performed a scientific evaluation of the answer changes. (a) does not change answers
  • I have reviewed relevant parts of the CLM documentation Tech Note or User's Guide to determine if anything needs to be changed or added. If it does, describe:
  • This PR either (a) does not create a need to update the documentation or (b) includes required documentation updates (see guidelines for contributing documentation). Which?:

@samsrabin samsrabin added blocked: dependency Wait to work on this until dependency is resolved next this should get some attention in the next week or two. Normally each Thursday SE meeting. performance idea or PR to improve performance (e.g. throughput, memory) labels Jun 26, 2026
@samsrabin

Copy link
Copy Markdown
Member

Thanks for this, @sjsprecious! A couple of questions:

  1. Do you have a date you need this in by?
  2. Do you expect this to give bit-for-bit identical results to the previous version?

@ekluzek, I'm assigning you for now given your recent work on our task decomposition, but I'm also adding Next so we can discuss in our SE meeting.

@samsrabin samsrabin requested review from ekluzek and removed request for ekluzek June 26, 2026 15:52
@sjsprecious

Copy link
Copy Markdown
Author

Thanks @samsrabin for your quick reply. To answer your questions:

  1. Do you have a date you need this in by?

We are waiting for a new tag for these CTSM changes so that our collaborators can start their scientific runs soon. Thus I would say no hard date, but the sooner, the better.

  1. Do you expect this to give bit-for-bit identical results to the previous version?

Yes, these changes should not change the answers for CTSM. I am happy to do some tests on Derecho if you can share the detailed instructions.

Let me know if you or Erik has any comments/suggestions about these code changes.

@samsrabin

Copy link
Copy Markdown
Member

If you could run the aux_clm test suite on Derecho to make sure it all works, that'd be awesome. From the top level of your CTSM checkout:

conda run -n ctsm_pylib ./run_sys_tests -s aux_clm --compare ctsm5.4.044 --skip-generate

If you don't have the ctsm_pylib conda environment installed, follow the instructions here.

@samsrabin

Copy link
Copy Markdown
Member

Also, please fill out the PR template. I've added it to the bottom of your PR description. Thanks!

@sjsprecious

Copy link
Copy Markdown
Author

Thanks @samsrabin for your detailed instructions. While I was still running the aux_clm test suite on Derecho, most of my tests finished successfully but hit the following error:

ERROR BFAIL baseline directory '/glade/campaign/cgd/tss/ctsm_baselines/ctsm5.4.044/SMS.f45_f45_mg37.I2000Clm60FatesSpRsGs.derecho_nvhpc.clm-FatesColdSatPhen' does not exist

I think this is because I did not have access to the directory /glade/campaign/cgd/tss/ctsm_baselines/ctsm5.4.044. Can you help grant me the access to that directory or can you help check whether my output is BFB?

@samsrabin

Copy link
Copy Markdown
Member

Ah, sorry, I probably should have expected that. If you send me the path of your test directory, I can check myself. Thanks so much for doing this!

@sjsprecious

Copy link
Copy Markdown
Author

Thanks @samsrabin . My test output is at /glade/derecho/scratch/sunjian/tests_0630-094247de and let me know what you find out later.

I saw some tests failed at runtime. However, I am not familiar with those setups and not sure how my changes affect them. Can you provide more details about those failed tests, too?

@samsrabin

Copy link
Copy Markdown
Member

It looks like the only test failing RUN was ERP_D_P64x2_Ld3.f10_f10_mg37.I2000Clm50BgcCru.derecho_intel.clm-noFUN_flexCN--clm-matrixcnOn_ignore_warnings. There's a known issue about that (#3817); it might take a few resubmissions (using ./case.submit in the test directory), but it should eventually work. If it doesn't work after two resubmissions, just note that here and don't worry about it.

Looks like all the other tests pass as expected!

@sjsprecious

Copy link
Copy Markdown
Author

It looks like the only test failing RUN was ERP_D_P64x2_Ld3.f10_f10_mg37.I2000Clm50BgcCru.derecho_intel.clm-noFUN_flexCN--clm-matrixcnOn_ignore_warnings. There's a known issue about that (#3817); it might take a few resubmissions (using ./case.submit in the test directory), but it should eventually work. If it doesn't work after two resubmissions, just note that here and don't worry about it.

Looks like all the other tests pass as expected!

Thanks @samsrabin for checking it. Indeed after resubmitting the same test a few times, it finished successfully. The output is available at /glade/derecho/scratch/sunjian/tests_0630-094247de/ERP_D_P64x2_Ld3.f10_f10_mg37.I2000Clm50BgcCru.derecho_intel.clm-noFUN_flexCN--clm-matrixcnOn_ignore_warnings.C.0630-094247de_int/run. Can you please help check whether it is BFB against the baseline?

@samsrabin

Copy link
Copy Markdown
Member

Yup, looks good!

@samsrabin

Copy link
Copy Markdown
Member

SE meeting:

  • This is blocked by the required CDEPS update.
  • Since this has submodule updates, we will merge this to master just to be safe—even though it should be bit-for-bit.
  • @ekluzek will handle this.

@samsrabin samsrabin removed the next this should get some attention in the next week or two. Normally each Thursday SE meeting. label Jul 2, 2026
@samsrabin samsrabin requested a review from ekluzek July 2, 2026 16:41
@sjsprecious

Copy link
Copy Markdown
Author

SE meeting:

  • This is blocked by the required CDEPS update.
  • Since this has submodule updates, we will merge this to master just to be safe—even though it should be bit-for-bit.
  • @ekluzek will handle this.

Thanks @samsrabin . My CDEPS PR was merged and I just updated the CDEPS tag here. Let me know if you or @ekluzek have any questions or comments before merging it.

@samsrabin

Copy link
Copy Markdown
Member

@briandobbins I might have misremembered. Is there also a CMEPS PR this depends on?

@sjsprecious

Copy link
Copy Markdown
Author

My CMEPS PR is independent of my CTSM PR here.

@briandobbins

Copy link
Copy Markdown
Contributor

How critical is your CMEPS PR to the memory reduction you need to run, Jian? (On a conversation about it right now - there's some fundamental complexities in CMEPS that this is highlighting, so I would just like your assessment.)

@sjsprecious

Copy link
Copy Markdown
Author

How critical is your CMEPS PR to the memory reduction you need to run, Jian? (On a conversation about it right now - there's some fundamental complexities in CMEPS that this is highlighting, so I would just like your assessment.)

Thanks @briandobbins . We have a very tight memory headroom on Derecho so I need all the changes in my CDEPS, CMEPS, CTSM, and future CAM PRs to get the ne1024 simulation to work with more ranks per node. Bill mentioned some concerns about CMEPS but I think that is unrelated to my changes directly.

My CMEPS PR seems to be BFB so if Bill or you can merge it, I can update the tag here as well.

@briandobbins

Copy link
Copy Markdown
Contributor

Unfortunately it's not that simple -- Bill, Bob and Mariana are speaking this afternoon about the CMEPS issues, and I'll know more after that, but I can't commit to the CMEPS PR yet.

@sjsprecious

Copy link
Copy Markdown
Author

Unfortunately it's not that simple -- Bill, Bob and Mariana are speaking this afternoon about the CMEPS issues, and I'll know more after that, but I can't commit to the CMEPS PR yet.

I see. Just want to clarify again: my CTSM PR is independent of my CMEPS PR, so it should not affect the review or merge process here.

@briandobbins

briandobbins commented Jul 2, 2026 via email

Copy link
Copy Markdown
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

blocked: dependency Wait to work on this until dependency is resolved performance idea or PR to improve performance (e.g. throughput, memory)

Development

Successfully merging this pull request may close these issues.

Out of memory issue in CTSM at ne1024 resolution

4 participants