Reduce the memory usage that is important for ne1024 simulation#4102
Reduce the memory usage that is important for ne1024 simulation#4102sjsprecious wants to merge 6 commits into
Conversation
|
Thanks for this, @sjsprecious! A couple of questions:
@ekluzek, I'm assigning you for now given your recent work on our task decomposition, but I'm also adding Next so we can discuss in our SE meeting. |
|
Thanks @samsrabin for your quick reply. To answer your questions:
We are waiting for a new tag for these CTSM changes so that our collaborators can start their scientific runs soon. Thus I would say no hard date, but the sooner, the better.
Yes, these changes should not change the answers for CTSM. I am happy to do some tests on Derecho if you can share the detailed instructions. Let me know if you or Erik has any comments/suggestions about these code changes. |
|
If you could run the conda run -n ctsm_pylib ./run_sys_tests -s aux_clm --compare ctsm5.4.044 --skip-generateIf you don't have the |
|
Also, please fill out the PR template. I've added it to the bottom of your PR description. Thanks! |
|
Thanks @samsrabin for your detailed instructions. While I was still running the I think this is because I did not have access to the directory |
|
Ah, sorry, I probably should have expected that. If you send me the path of your test directory, I can check myself. Thanks so much for doing this! |
|
Thanks @samsrabin . My test output is at I saw some tests failed at runtime. However, I am not familiar with those setups and not sure how my changes affect them. Can you provide more details about those failed tests, too? |
|
It looks like the only test failing RUN was Looks like all the other tests pass as expected! |
Thanks @samsrabin for checking it. Indeed after resubmitting the same test a few times, it finished successfully. The output is available at |
|
Yup, looks good! |
|
SE meeting:
|
Thanks @samsrabin . My CDEPS PR was merged and I just updated the CDEPS tag here. Let me know if you or @ekluzek have any questions or comments before merging it. |
|
@briandobbins I might have misremembered. Is there also a CMEPS PR this depends on? |
|
My CMEPS PR is independent of my CTSM PR here. |
|
How critical is your CMEPS PR to the memory reduction you need to run, Jian? (On a conversation about it right now - there's some fundamental complexities in CMEPS that this is highlighting, so I would just like your assessment.) |
Thanks @briandobbins . We have a very tight memory headroom on Derecho so I need all the changes in my CDEPS, CMEPS, CTSM, and future CAM PRs to get the ne1024 simulation to work with more ranks per node. Bill mentioned some concerns about CMEPS but I think that is unrelated to my changes directly. My CMEPS PR seems to be BFB so if Bill or you can merge it, I can update the tag here as well. |
|
Unfortunately it's not that simple -- Bill, Bob and Mariana are speaking this afternoon about the CMEPS issues, and I'll know more after that, but I can't commit to the CMEPS PR yet. |
I see. Just want to clarify again: my CTSM PR is independent of my CMEPS PR, so it should not affect the review or merge process here. |
|
Right, we're not worried about the CTSM one, just the CMEPS one -- and I'll
know more about that later.
- Brian
…On Thu, Jul 2, 2026 at 11:41 AM Jian Sun ***@***.***> wrote:
*sjsprecious* left a comment (ESCOMP/CTSM#4102)
<#4102 (comment)>
Unfortunately it's not that simple -- Bill, Bob and Mariana are speaking
this afternoon about the CMEPS issues, and I'll know more after that, but I
can't commit to the CMEPS PR yet.
I see. Just want to clarify again: my CTSM PR is independent of my CMEPS
PR, so it should not affect the review or merge process here.
—
Reply to this email directly, view it on GitHub
<#4102?email_source=notifications&email_token=ACL2HPLLE7J67YB647NFLYT5C2NC7A5CNFSNUABFM5UWIORPF5TWS5BNNB2WEL2JONZXKZKDN5WW2ZLOOQXTIOBWHA4DENZTHE22M4TFMFZW63VHNVSW45DJN5XKKZLWMVXHJLDGN5XXIZLSL5RWY2LDNM#issuecomment-4868827395>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACL2HPNOGHNN6AFG5TDXP235C2NC7AVCNFSNUABFKJSXA33TNF2G64TZHMYTCNBUHA3TEMJQHNEXG43VMU5TINZUG42TANZQGYY2C5QC>
.
Triage notifications, keep track of coding agent tasks and review pull
requests on the go with GitHub Mobile for iOS
<https://github.com/notifications/mobile/ios/ACL2HPLELMIAPWQZZWZBMAD5C2NC7A5CNFSNUABFM5UWIORPF5TWS5BNNB2WEL2JONZXKZKDN5WW2ZLOOQXTIOBWHA4DENZTHE22M4TFMFZW63VHNVSW45DJN5XKKZLWMVXHJKTGN5XXIZLSL5UW64Y>
and Android
<https://github.com/notifications/mobile/android/ACL2HPLN4XZ5YDW7TZNWTYT5C2NC7A5CNFSNUABFM5UWIORPF5TWS5BNNB2WEL2JONZXKZKDN5WW2ZLOOQXTIOBWHA4DENZTHE22M4TFMFZW63VHNVSW45DJN5XKKZLWMVXHJLTGN5XXIZLSL5QW4ZDSN5UWI>.
Download it today!
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Description of changes
This PR introduces some changes in CDEPS that will be used in CTSM later and are critical to reduce memory usage of a simulation at ne1024 resolution. All the changes are done by Claude under my supervisory.
This PR requires a new tag from CDEPS once my PR (ESCOMP/CDEPS#414) is merged.
The goal is to cut CTSM initialization memory (and some init time) at high resolution (like
ne1024), where per-rank data replication and duplicate ESMF mesh construction dominate startup cost.The detailed edits:
A MPI-3 shared-memory module and specialized for CTSM's decomposition setup. The idea is that arrays that are otherwise allocated identically on every MPI rank instead get one physical copy per shared-memory node, mapped into every rank on that node — freeing ranks_per_node − 1 copies per node.
clm_shmem_alloc_i4_1d(ptr, win, n) — allocate a node-shared default-integer rank-1 array (only the node leader requests storage via
MPI_Win_allocate_shared; peers map the leader's segment viaMPI_Win_shared_query).clm_shmem_leader_allreduce_sum_i4(ptr, win, n) — fence → node leaders sum partials across nodes over a leader-only communicator → fence to publish. Builds a globally-summed array in the shared buffer without every rank holding a global-sized copy.
clm_shmem_free / clm_shmem_fence / clm_shmem_is_leader / clm_shmem_leader_comm / clm_shmem_npes_per_node — lifecycle and query helpers; lazily build node-local and node-leader communicators via
mpi_comm_split_type(MPI_COMM_TYPE_SHARED).The global land mask
lndmask_glob(gsize)was previously allocated on every rank and built with an all-rankESMF_VMAllReduceinto a second global-sized temporary (itemp_glob). Now, in both code paths (lnd_set_lndmask_from_maskmeshandlnd_set_lndmask_from_lndmesh):lndmask_globis allocated once per node viaclm_shmem_alloc_i4_1d, with a newlndmask_winwindow handle threaded through both subroutine signatures.Leader zeroes it, fence, each rank fills its disjoint local indices, then
clm_shmem_leader_allreduce_sum_i4replaces theESMF_VMAllReduce + itemp_globtemporary (the temporary is deleted entirely).Cleanup is now branch-aware: the cmeps driver paths free via
clm_shmem_free(lndmask_glob, lndmask_win); thelilacpath still uses plaindeallocate(it uses a plain allocate).This removes two global-sized integer arrays per rank (the mask copy + the all-reduce temp), replaced by one node-shared copy.
Closing pio file handles that were opened but closed late or never — frees buffers earlier in init:
ncd_pio_closefile(params_ncid)earlier — to right after its last use (bgc_vegetation_inst%Init) instead of at the end ofinit_accflds.ncd_pio_closefile(ncid)to right after the last read (STD_ELEV) instead of the end ofinitVertical.ncd_pio_closefile(ncid)on the early-return path (nlevurb == 0) that previously leaked the handle.ncd_pio_closefile(ncid)after reading ORGANIC.surfrdMod.F90: adds two
ncd_pio_closefile(ncid)calls after dimension reads complete (after the pft/cft dims, and after nlevurb).rediststreamsstream_mapalgo='redist'Specific notes
Contributors other than yourself, if any:
CTSM issues resolved or otherwise addressed, if any:
Resolves #4103
Any user interface changes (namelist or namelist defaults changes)?
No.
Testing planned or performed, if any:
aux_clm
Requirements before merge: