Armv7-M: Allow register overlap in ldm + ldrd#153
Conversation
mkannwischer
left a comment
There was a problem hiding this comment.
Thanks!
Can you please add a new example to test that this works.
A simple
ldrd r0, r1, [r0]
ldm r0, {r0-r3}
should do.
Or simply extend |
mkannwischer
left a comment
There was a problem hiding this comment.
Thanks for your changes. Almost there.
| obj.pre_index = 0 | ||
| obj.addr = obj.args_in[0] | ||
| obj.args_in_out_different = [(0,0)] # Can't have Rd==Ra | ||
| #obj.args_in_out_different = [(0,0)] # Can't have Rd==Ra |
There was a problem hiding this comment.
Please remove those, not comment them out.
Also we need to test if this affects any other examples in SLOTHY.
For that please make sure you have a clean copy of SLOTHY, and then run
python3 example.py --timeout 60 --only-target=slothy.targets.arm_v7m.cortex_m7
This is going to run for a few hours. Then zip up the output files in examples/opt/armv7m and attach them to this PR.
0e257f4 to
8871294
Compare
Previously ldm and ldrd fusion would break if the same register is used as address as one of the outputs (and it's not the last output). This commit fixes that by changing the fusion to re-order the ldr overwriting the address to the very end in case there is an overlap. Note that this is not needed for stm/strd as there you cannot have an overlap. Additionally, it removes unnecessary restrictions disallowing Rd=Ra for ldrb/ldrh/ldr.
|
I cleaned this up, but I still need to run a full test with something like (shorter timeout does not work for the dilithium ntt). |
|
I re-ran with larger timeout (300 is still not enough for the dilithiume ntt, but 600 seems fine on my machine). Unfortunately, fnt_257_dilithium_m7 fails the selftest: This need investigation before we can merge this. @dop-amin any ideas? |
Fixed the splitting of
ldrdandldmwhen the address register and output register overlap inldrd_imm_splitting_cbandldm_interval_splitting_cb.