New compasUtils file by reinhold-willcox · Pull Request #1467 · TeamCOMPAS/COMPAS

reinhold-willcox · 2026-03-18T13:23:10Z

Added my own compasUtils file, which I use for lots of post-processing and testing. It was previously in misc/unsupported_utils, so I am hoping to get it elevated to the level of the other utils.

This contains two major functions that help a lot with inspecting COMPAS data.

printCompasDetails: takes the hdf5 output (optionally with specific seeds or masks) and prints a pandas df of the whole output (including units). Makes it visually much easier to analyze the output from a jupyter notebook.
getEventStrings: takes the hdf5 output and generates a string per seed that summarizes the events in the binary. Events include SMT, CEE, SNe, and mergers, and these are listed chronologically with a funny but simple syntax (described in the function). Combined with np.unique(), this is a great way to quickly see which binary channels dominate and which are rare.

I expect that there will be changes requested, so it is probably not yet ready to be merged in, but happy to hear feedback on how to improve / align these scripts with the other python utilities.

github-actions · 2026-03-18T13:30:58Z

✅ COMPAS Build Successful!

Item	Value
Commit	`98df732`
Logs	View workflow

Detailed Evolution Plot

Click to view evolution plot

_{Generated by COMPAS CI}

ilyamandel

I leave the review to folks who use python (@SimonStevenson ? @avigna ? ).

Consider including supported python tools in the documentation in https://compas.readthedocs.io/en/latest/pages/User%20guide/Post-processing/post-processing.html

avivajpeyi · 2026-04-03T02:19:32Z

Looks cool! I would suggest adding some more doctrings at the top, adding some pytests to ensure that this is maintained/doesn't break in the future.

Also, how is this different from h5view?

Can they be merged together?

Finally, if you can add it to the setup.py entry points, its quite easy to get the auto-documentation tools to build docs like so

https://compas.readthedocs.io/en/latest/pages/User%20guide/Post-processing/hdf5/post-processing-h5view.html

I have some time, im happy to help with this if youd like :)

nrsegovia · 2026-04-03T07:40:57Z

As I suggested in @ilyamandel's post on Slack, I think it might be useful to add type hinting and docstring styling (similar to numpy) to bring the python code style closer to what we have for C++ files.

Perhaps we can start here by just adding type hints?

nrsegovia · 2026-04-03T07:59:05Z

+########################################################################
+# ## Function to print the data from a given COMPAS HDF5 group in a readable pandas template
+
+def printCompasDetails(data, *seeds, mask=()):


Do I understand correctly that this function name might be a bit misleading? Outside the jupyter environment it would just return the data frame, not print it.

I also wonder if *seeds could be replaced for an input list, it might be more readable for large sets of seeds.

Hey, finally getting around to this. The *seeds input is particularly handy because you can supply a single seed, multiple comma-separated seeds, or a list / array of seeds, so it's very flexible in this format. But I'm happy to make that clearer in the docstrings.

nrsegovia · 2026-04-03T08:01:17Z

+            mask = np.ones_like(allSeeds).astype(bool)
+        mask &= seedsMask
+
+        df = pd.DataFrame.from_dict({param: data[param][()][mask] for param in list(data.keys())}).set_index(seedVariableName).T


There are a couple of values being repeated, e.g., ('SEED' in list_of_keys) and list(data.keys())

nrsegovia · 2026-04-03T08:03:18Z

+    else: # No seed parameter, so do custom print for Run Details
+
+        # Get just the keys without the -Derivation suffix - those will be a second column
+        keys_not_derivations = []


Can be swapped for a one line definition without affecting readability, e.g., keys_not_derivations = [x for x in key_list if "-Derivation" not in x] or similar

ilyamandel · 2026-04-16T20:51:21Z

@reinhold-willcox -- please see the comments above.

reinhold-willcox · 2026-04-17T13:00:29Z

Ya, I'll take care of it (Nicolas has also written to me directly with some pointers).

This is a fairly low priority for me, so I'll try to address this after the semester ends.

ilyamandel · 2026-06-12T07:35:20Z

@reinhold-willcox , any progress on this PR?

reinhold-willcox · 2026-06-15T10:23:39Z

@ilyamandel, sorry, haven't had a chance to dive into this yet, but I should have a window in the next few weeks.

nrsegovia

@reinhold-willcox it all looks great, I just left a couple of comments/suggestions. I'm still not entirely convinced by *seeds but it seems to work just fine, so I won't dwell on that. I do wonder why some sections of the code seem to prefer lists over numpy arrays, though. And as a final style-related comment, the PEP8 has some recommendations regarding line length:

Limit all lines to a maximum of 79 characters.
For flowing long blocks of text with fewer structural restrictions (docstrings or comments), the line length should be limited to 72 characters.

(I have never cared much about this until I recently started adding a length limit to my own code, and it looks much tidier)

nrsegovia · 2026-06-29T05:45:21Z

+
+    Examples
+    --------
+    >>> mt_data = h5.File('COMPAS_Output.h5')['BSE_RLOF']) 


I think that ) at the end should be removed

nrsegovia · 2026-06-29T05:50:25Z

+    >>> print_compas_details_dataframe(mt_data, mt_seeds[:50], mask=cee_events)
+    [output of all Common Envelope events occuring in the first 50 seeds]
+    """
+    list_of_keys = list(data.keys())


This might be unnecessary. It seems that most of the time you do x in list_of_keys, but x in data would produced the same behavior if I remember correctly

nrsegovia · 2026-06-29T05:55:09Z

+        # If `seeds` or `mask` arguments supplied, create the relevant mask
+        all_seeds = data[seed_variable_name][()]
+        seeds_mask = np.isin(all_seeds, seeds)
+        if len(seeds) == 0: # If `seeds` argument is not supplied, set the default mask


seeds does not have a default value, so shouldn't it always be supplied? Otherwise you would get an error

*seeds works the same as *args, so they are all optional. Not sure what the best way to indicate this is, but it is meant to run fine if it is not supplied.

nrsegovia · 2026-06-29T06:02:02Z

+        keys_not_derivations = [key for key in list_of_keys  if '-Derivation' not in key] # Get just the keys without the -Derivation suffix
+
+        # Some parameter values are string types, formatted as np.bytes_, need to convert back
+        def convert_strings(param_array):


I wonder if this must be defined within the function. I think that at some point you (or the user) could need this utility, so exposing it would be better. Also, I'm being picky but perhaps you could change the check from isinstance to something like np.issubdtype(param_array.dtype, np.bytes_), though this assumes that param_array is a numpy array (I'm not entirely sure it is the case). This way you wouldn't check the first element, but the array itself.

nrsegovia · 2026-06-29T06:16:23Z

+        if mask == ():
+            mask = np.ones_like(all_seeds).astype(bool)
+        mask &= seeds_mask
+        df = pd.DataFrame.from_dict({param: data[param][()][mask] for param in list_of_keys}).set_index(seed_variable_name).T


If the default is to include all seeds, wouldn't this result in a "wide" table? Which also impacts pandas performance. I guess this is required to add units as a string column, but it might be worth to consider returning a separate dictionary with units in case the user wants to check them.

In my experience, it's extremely valuable to have the units right there in the same dataframe. This is meant to be a debugging tool, so having that information right in sight as you look at the other columns has been really revealing. It does return a very wide table if you include all the seeds from some large run, but then the pandas (or possibly jupyter) width limit in practice replaces many of these with an ellipsis.

nrsegovia · 2026-06-29T06:28:29Z

+###########################################
+# ## Produce strings of the event histories
+
+stellar_type_dict = {


I see that this is a dictionary for simplified stellar type names, so perhaps you want to modify the name for something obvious like simplify_types_dict. Not critical, though.

nrsegovia · 2026-06-29T06:32:45Z

+
+    Notes
+    -----
+    Exactly one of data or all_events must be included. If neither, nothign is returned.


typo in nothing

reinhold-willcox · 2026-06-29T10:04:30Z

@reinhold-willcox it all looks great, I just left a couple of comments/suggestions. I'm still not entirely convinced by *seeds but it seems to work just fine, so I won't dwell on that. I do wonder why some sections of the code seem to prefer lists over numpy arrays, though. And as a final style-related comment, the PEP8 has some recommendations regarding line length:

Limit all lines to a maximum of 79 characters.
For flowing long blocks of text with fewer structural restrictions (docstrings or comments), the line length should be limited to 72 characters.

(I have never cared much about this until I recently started adding a length limit to my own code, and it looks much tidier)

Thanks @nrsegovia . I've gone through and implemented your changes, except where I've raised discussions above. As for lists vs arrays, this often depends on the use case. It's easier to append to lists when you don't know how long they will be at the outset, while the arrays are nicer for speed. If there's a specific place you think I should swap one for the other, I'm happy to take a look.

reinhold-willcox · 2026-06-29T10:16:17Z

Looks cool! I would suggest adding some more doctrings at the top, adding some pytests to ensure that this is maintained/doesn't break in the future.

Also, how is this different from h5view?

Can they be merged together?

Finally, if you can add it to the setup.py entry points, its quite easy to get the auto-documentation tools to build docs like so

https://compas.readthedocs.io/en/latest/pages/User%20guide/Post-processing/hdf5/post-processing-h5view.html

I have some time, im happy to help with this if youd like :)

h5view is a command line tool, this one is for use in jupyter notebooks. They are similar conceptually in that they both provide a view into the data, but they are very different under the hood so I'm not sure it makes sense to combine them.

I'm also not sure what to do for the setup.py functionality. I added the file to the setup.py entry points, but all the other files seem to allow for a command line argument parser, which is not relevant for this file. Can you advise on what I should be putting in main()?

added compasUtils file

5a80172

ilyamandel reviewed Mar 21, 2026

View reviewed changes

ilyamandel requested review from SimonStevenson, avivajpeyi, brcekadam and pauldisberg March 21, 2026 01:34

nrsegovia reviewed Apr 3, 2026

View reviewed changes

big cleanup of post processing utils

b401059

nrsegovia reviewed Jun 29, 2026

View reviewed changes

reinhold-willcox added 2 commits June 29, 2026 12:05

addressed nicolas' concerns

11caeb8

starting the process of adding unit testing

ec95efc

Uh oh!

Conversation

reinhold-willcox commented Mar 18, 2026

Uh oh!

github-actions Bot commented Mar 18, 2026

✅ COMPAS Build Successful!

Detailed Evolution Plot

Uh oh!

ilyamandel left a comment

Choose a reason for hiding this comment

Uh oh!

avivajpeyi commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nrsegovia commented Apr 3, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ilyamandel commented Apr 16, 2026

Uh oh!

reinhold-willcox commented Apr 17, 2026

Uh oh!

ilyamandel commented Jun 12, 2026

Uh oh!

reinhold-willcox commented Jun 15, 2026

Uh oh!

nrsegovia left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

reinhold-willcox commented Jun 29, 2026

Uh oh!

reinhold-willcox commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

avivajpeyi commented Apr 3, 2026 •

edited

Loading