RFC: Estimated-performance regression tests#454
Conversation
Thread the Result object out of Heuristics.periodic (now a 5-tuple) so that optimize() and optimize_loop() can store SLOTHY's own performance estimate on slothy.last_result (cycles, stalls, codesize, codesize_with_bubbles). Previously the straightline optimize() path computed and then discarded the result.
Add a pytest suite under tests/pytest that pins SLOTHY's estimated cycle/stall counts for a given input snippet. Each case is an assembly file plus a YAML sidecar describing how to invoke SLOTHY and the expected estimate per target; the architecture is derived from the target. Because the estimate is the optimum of a minimization, it is deterministic, so pinning it turns a change in what SLOTHY believes it found into a test failure. Seeded with aarch64_simple0 and aarch64_ldst. The legacy OptimizationRunner suite in tests/naive is unaffected and is still driven by test.py.
| _apply_config(slothy.config, cfg) | ||
|
|
||
| # On Apple M1, x18 is reserved by the platform ABI. | ||
| if "m1" in target_name: |
There was a problem hiding this comment.
It's outside the scope of this PR but do you agree that we should have a mechanism to override arch model defaults from the uArch model? If so, I'd open an issue for that.
Setting this explicitly all the time feels brittle.
| slothy.load_source_from_file(str(source_path)) | ||
|
|
||
| # variable_size lets SLOTHY minimize stalls; a case may override it. | ||
| cfg = {"variable_size": True} |
There was a problem hiding this comment.
Possibly have a default config as yaml as well to make it more transparent?
|
I think the feature itself will be useful and the implementation looks fine. Kind of related to #277 and #226. Could also consider to hook this into tests/examples so we will not have three separate places in which we keep track of "examples". Also, inside examples/tests it would be nice to move the config away from the Python script into yaml files. This would make them much more easily readable and not as convoluted. |
Our tests/naive suite only checks that SLOTHY runs, not that the result is any good - so a regression that silently degrades a schedule would pass CI. This RFC proposes pinning SLOTHY's own performance estimate (predicted cycles/stalls) for a given snippet: since that estimate is the optimum of a minimization, it's deterministic and safe to assert exactly, turning "SLOTHY now thinks it found worse code" into a test failure. To enable it, optimize()/optimize_loop() now expose the result on slothy.last_result.
Concretely, a new pytest suite under tests/pytest/ reads per-case YAML sidecars - an assembly snippet plus the expected estimate per target (arch derived from target) and how to invoke SLOTHY - so adding a case is just dropping two files.
I drafted this for discussion and would like to hear your opinions. Is this going to be useful in finding regressions in SLOTHY/ortools in the future? If so we can easily scale this up to the full instruction set supported.
@dop-amin @hanno-becker