ci: run pytest on push and PR by b-erdem · Pull Request #19 · specula-org/SysMoBench

b-erdem · 2026-06-22T19:36:08Z

What this does

Follow-up to #18. Adds a GitHub Actions workflow that runs the test suite on push to main and on pull requests. Until now the only workflow was docker-publish.yml, so nothing ran the tests automatically.

The job installs the package with dev extras, sets up Temurin JDK 17, and fetches tla2tools.jar (pinned to the same v1.8.0 that scripts/setup_tools.py uses), so the TLC-gated behavioral tests in tests/test_languages actually run in CI instead of skipping.

Scope

Runs tests/test_languages. tests/test_models is intentionally left out for now:

test_model_connection.py makes live provider API calls (reads OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.), so it needs secrets and a network and is not safe to run in CI.
test_litellm_adapter.py has two failures that already exist on main and are unrelated to this workflow (test_resolve_model_name_for_custom_openai_compatible_api and test_model_factory_reports_litellm_as_recommended_provider). They look like the model adapter and its tests drifted apart. Happy to send a separate PR to sort those out, after which this workflow can be widened to the whole tests/ tree.

Notes

tla2tools.jar is fetched at the pinned version rather than via sysmobench-setup, to keep the CI step independent of the broader environment checks in that script.
Python is pinned to 3.11. Easy to turn into a matrix later if you want to cover the requires-python >=3.8 range.

Add a GitHub Actions workflow that installs the package, sets up Java, fetches tla2tools.jar (pinned to the same version as scripts/setup_tools.py), and runs the test suite. The TLA+ tests, including the TLC-gated behavioral ones, run in CI rather than skipping. Scoped to tests/test_languages for now. tests/test_models is left out: test_model_connection.py makes live provider API calls that need secrets, and test_litellm_adapter.py has pre-existing failures unrelated to this workflow.

b-erdem · 2026-06-22T19:42:55Z

@Qian-Cheng-nju heads up from setting this up: two tests in tests/test_models/test_litellm_adapter.py fail on current main, unrelated to this PR, which is why I scoped CI to tests/test_languages here.

test_model_factory_reports_litellm_as_recommended_provider raises KeyError: 'legacy_providers' (the dict from list_available_models() has no such key).
test_resolve_model_name_for_custom_openai_compatible_api expects openai/deepseek-reasoner but gets deepseek/deepseek-reasoner (saw this on Python 3.14, might be litellm-version sensitive).

Are these known or expected? If not I'm happy to open an issue with a repro, or send a fix once we know whether the tests are stale or the behavior changed. Then the workflow here can widen to the whole tests/ tree.

…eds 3.10)

Qian-Cheng-nju · 2026-06-23T05:29:33Z

Thanks for flagging! @b-erdem Both are just stale tests, already fixed and merged in #20.
I think we can add tests/test_models/test_litellm_adapter.py to the workflow. We can keep test_model_connection.py excluded, it makes live provider calls that need secrets/network, so CI can cover those two files but not the whole tests/ tree.

Qian-Cheng-nju added 2 commits June 23, 2026 04:01

ci: add python matrix, concurrency cancel, and pyproject cache key

b2dd618

ci: narrow matrix to 3.10/3.11 and set requires-python >=3.10 (mcp ne…

d30e9b6

…eds 3.10)

Qian-Cheng-nju merged commit 5db32f3 into specula-org:main Jun 23, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ci: run pytest on push and PR#19

ci: run pytest on push and PR#19
Qian-Cheng-nju merged 3 commits into
specula-org:mainfrom
b-erdem:ci/pytest-workflow

b-erdem commented Jun 22, 2026

Uh oh!

b-erdem commented Jun 22, 2026

Uh oh!

Uh oh!

Qian-Cheng-nju commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

b-erdem commented Jun 22, 2026

What this does

Scope

Notes

Uh oh!

b-erdem commented Jun 22, 2026

Uh oh!

Uh oh!

Qian-Cheng-nju commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants