Skip to content

docs: add utility doctest examples#804

Open
EKtheSage wants to merge 8 commits into
casact:mainfrom
EKtheSage:docs/704-utility-examples
Open

docs: add utility doctest examples#804
EKtheSage wants to merge 8 commits into
casact:mainfrom
EKtheSage:docs/704-utility-examples

Conversation

@EKtheSage

@EKtheSage EKtheSage commented May 16, 2026

Copy link
Copy Markdown
Contributor

Summary: Add Sphinx doctest examples for the PatsyFormula utility docs. Split from the larger #792 work and intentionally excludes .github/workflows/sync-main-to-docs.yml. Refs #704


Note

Low Risk
Documentation and test-only changes; utility implementations are unchanged aside from expanded docstrings.

Overview
Adds Sphinx doctest examples to utility API docstrings in utility_functions.py: read_pickle, read_json, concat, minimum, maximum, and PatsyFormula (including TweedieGLM / DevelopmentML workflows). Each block uses .. testsetup::, .. testcode::, and .. testoutput:: so docs can execute and verify the snippets.

Tests in test_utilities.py back related behavior: pickle round-trip for a fitted Development (test_to_pickle_read_pickle), cl.maximum vs NumPy (test_maximum), and Friedland USPP sample keys loading with the expected three value columns (test_load_sample_uspp).

Reviewed by Cursor Bugbot for commit f23fe84. Bugbot is set up for automated code reviews on this repo. Configure here.

Comment thread chainladder/utils/utility_functions.py Outdated
@codecov

codecov Bot commented May 16, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 88.91%. Comparing base (b24f209) to head (f23fe84).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #804      +/-   ##
==========================================
+ Coverage   88.79%   88.91%   +0.11%     
==========================================
  Files          89       89              
  Lines        5060     5060              
  Branches      646      646              
==========================================
+ Hits         4493     4499       +6     
+ Misses        423      417       -6     
  Partials      144      144              
Flag Coverage Δ
unittests 88.91% <ø> (+0.11%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@henrydingliu

Copy link
Copy Markdown
Collaborator

please pull main and incorporate recent changes

@EKtheSage EKtheSage force-pushed the docs/704-utility-examples branch from 0a7c2f9 to 9175ae7 Compare May 16, 2026 20:31
Comment thread chainladder/utils/utility_functions.py Outdated

.. testcode::

clrd = cl.load_sample("clrd").groupby("LOB").sum().iloc[:2]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test demonstrates that concatting identical columns doesn't do anything, which doesn't match the example text.

def minimum(x1, x2):
"""Element-wise minimum of two triangles (delegates to ``Triangle.minimum``).

Examples

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need more basic docstring before a doctest. what's x1? what's x2?

Comment thread chainladder/utils/utility_functions.py Outdated

Examples
--------
Cap a triangle cell-by-cell by comparing it with another triangle of limits.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we certain this is true? can x2 be a scalar?

Comment thread chainladder/utils/utility_functions.py
def read_json(json_str, array_backend=None):
"""Deserialize JSON produced by ``to_json`` (triangle, estimator, or pipeline).

Examples

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this example feels empty without seeing the actual json string. please follow the example from pandas

print(round(float(by_dev.ldf_.values[0, 0, 0, 0]), 6))
print(round(float(by_both.ldf_.values[0, 0, 0, 0]), 6))

.. testoutput::

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we be showing all the numbers?

…henrydingliu

- read_pickle: show fitted Development estimator round-trip via pickle, verify transform works after restore
- read_json: show full Pipeline serialization round-trip with step names and params
- concat: show paid+incurred column join enabling MunichAdjustment directly
- minimum: compare volume vs simple CL ultimates, pick element-wise lower for low-side scenario
- maximum: same comparison, pick element-wise higher for high-side scenario
- PatsyFormula: clarify when to use custom DevelopmentML pipeline vs TweedieGLM; show ldf_ output instead of coefficient count
Comment thread chainladder/utils/utility_functions.py Outdated
import chainladder as cl

tri = cl.load_sample("raa")
dev = cl.Development(average="volume").fit(tri)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to demonstrate that to_pickle does something, we should use non-default parameters. something like avg = simple, n = 4.

Comment thread chainladder/utils/utility_functions.py Outdated
dev.to_pickle(p)
restored = cl.read_pickle(p)
os.remove(p)
print(restored.transform(tri).ldf_.values[0, 0, 0, :4].round(4))

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we print the full ldf_ from both the original and the restored estimators?

Comment thread chainladder/utils/utility_functions.py Outdated
combined = cl.concat([paid, incurred], axis=1)
adj = cl.MunichAdjustment(paid_to_incurred=("CumPaidLoss", "IncurLoss"))
result = adj.fit_transform(combined)
print(result.ldf_["CumPaidLoss"].values[0, 0, 0, :4].round(4))

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good use case for concat. can we focus the test output around concat only?

@kennethshsu

Copy link
Copy Markdown
Collaborator

@EKtheSage are you interested in finishing up this PR?

- read_pickle: use non-default params (average=simple, n_periods=4),
  print ldf_ from both original and restored estimators, and call
  .transform() on restored to prove it is still functional
- read_json: show the full serialized JSON string before round-tripping,
  following pandas docstring style
- concat: remove MunichAdjustment output; focus on concat result only
  by printing combined.columns
- minimum/maximum: add prose descriptions for x1 and x2 parameters,
  confirming x2 can be a scalar
- maximum: trim testoutput to show only high_side result
@EKtheSage

Copy link
Copy Markdown
Contributor Author

@henrydingliu thanks for the detailed review. All comments have been addressed in the latest commit. Summary below:

to_pickle / read_pickle (lines 291, 301, 307)

  • Used a Development transformer with non-default params (average='simple', n_periods=4) to demonstrate pickling does something meaningful
  • Now prints ldf_ from both the original and restored estimators side-by-side to show parameters are preserved
  • Added an explicit restored.transform(tri) call to prove the restored estimator is still functional as a transformer

read_json (line 451)

  • Replaced the Pipeline round-trip with a Development example that prints the full serialized JSON string before reconstructing, following pandas docstring style

concat (lines 678, 696)

  • Removed the MunichAdjustment code and output; the example now focuses on concat itself by printing list(combined.columns) to show the two columns were merged into one triangle

minimum / maximum parameters (lines 793, 795)

  • Added prose descriptions for x1 and x2 in both functions, clarifying that x2 can be a scalar (element-wise comparison against a constant value)

maximum output (line 891)

  • Removed the intermediate ult_vol and ult_sim print lines; testoutput now shows only the high_side result

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 13c3d48. Configure here.

Comment thread chainladder/utils/tests/test_utilities.py Outdated
@henrydingliu

Copy link
Copy Markdown
Collaborator

@EKtheSage I don't think all the edits came through on this one either

…um test

- read_pickle: print the full ldf_ from the original, restored, and
  transformed estimators instead of the first four factors
- PatsyFormula: print the full ldf_ patterns for both GLM formulations
  and for the custom DevelopmentML pipeline
- test_maximum: assert the element-wise maximum directly instead of the
  NaN-tolerant inequalities flagged in review
@EKtheSage

Copy link
Copy Markdown
Contributor Author

@henrydingliu thanks for checking. The earlier commit did land, but two of your comments were implemented incorrectly in it: the read_pickle example printed only the first four factors instead of the full ldf_, and your "should we be showing all the numbers" comment on the PatsyFormula example was misread as a request to trim output when you meant the opposite. The latest commit prints the full ldf_ arrays in the read_pickle example and both PatsyFormula examples, and also strengthens the new test_maximum assertion that bugbot flagged. Doctests pass locally.

assert not missing, f"sdist is missing sample CSVs: {sorted(missing)}"


def test_load_sample_uspp() -> None:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we did a sizable refactor on load_sample a couple of weeks back. can you please review the current load_sample test(s) in here and see if this new test is redundant?

@henrydingliu

Copy link
Copy Markdown
Collaborator

everything looks great. small nitpick on one potentially redundant test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants