S4 model modification, testing, and proper copyright compliance#447
S4 model modification, testing, and proper copyright compliance#447kyoon-mit wants to merge 7 commits into
Conversation
- Refactored `S4DKernel` to handle variable-length inputs. Removed fixed `length` buffer; length `L` is inferred dynamically in `forward(L)`. - Replaced `1j` with `torch.complex(...)` so that it can be compiled with `torch.compile`. - Removed `prenorm` from `S4D` and `S4Model` sinced it was unused. - Switched dropout function from 1-dimensional to N-dimensional, which is what the original S4 paper specifies. `1d` drops channels; `Nd` drops time steps. - Made `d_output` to be a requirement in `S4Model`. The default value of 10 was inherited from a CIFAR-10 demo and had no meaning here. Test file added under the name `test_s4d.py`.
deepchatterjeeligo
left a comment
There was a problem hiding this comment.
Hi @kyoon-mit thanks for starting this. At first glance this looks good, and its also pretty modular. However, I think the right place for this to live is under ml4gw: https://github.com/ML4GW/ml4gw/tree/main/ml4gw/nn
The design is that: anything that is common to multiple projects be refactored into ml4gw as that is common dependency for all of our projects. Based on your presentations and the discussion, this network seems to have promise on multiple fronts. The existing s4 implementation here in aframe may have been due to previous prototyping, and should be refactored. So here is my suggestion:
- let's port over this MR to ml4gw
- once we have that in a release, let's take the
s4.pyfrom aframe out.
There was a problem hiding this comment.
I see that you have attributed credit to the original authors in your code. This license file is not needed. We have similar adaptations in other places, and citing source as comments is sufficient.
There was a problem hiding this comment.
On the license part, understood. For the question of which repo exactly we want to have this state space model live in, I was going to ask you the same question. Since it's Sunday, the progress is slow, but Benedict & I were talking today about pushing more PRs by Tuesday. These would not only include these state space model architectures, but also the specific regression tasks that we've set up using the aframe dataloader. Yes, it's a lot to ask in terms of changes, especially when aframe is designed for a specific purpose.
Since I am not that familiar with ml4gw, would it be straightforward to have a similar aframe-style dataloader with a regression/classification task with state space models?
There was a problem hiding this comment.
It should be straightforward. So I will still suggest to port over this PR to ml4gw. I have created a https://github.com/ML4GW/aframe/tree/regression-datamodules branch, where we will update the ml4gw dependency to be the bleeding edge version.
There was a problem hiding this comment.
I will still suggest to keep this PR, and have it remove s4.py. We will get it merged once s4d.py is in ml4gw.
There was a problem hiding this comment.
PR modified such that it removes s4.py.
|
PR moved to |
|
@kyoon-mit let's make this PR against |
deepchatterjeeligo
left a comment
There was a problem hiding this comment.
@kyoon-mit can you fix the imports in the tests, and make this against regression-datamodules branch?
|
Closing PR as the code changes are incorporated into #503. |
Summary of Changes
s4.pytos4d.pyinlibs/architectures/architectures/networks. Updated__init__.pyaccordingly.s4d.pycode.test_s4d.pyunderlibs/architectures/tests/networks. All 6 tests pass. (Run the commanduv run pytest tests/fromlibs/architectures/)NOTE
This is the first in a series of PRs following the suggestion of Deep during the Thursday meeting to push small, incremental changes.
The context of the discussion was that we, Benedict and I, have been implementing our state space models in the aframe setup in order to make our results reproducible. We are not suggesting a definite integration of our changes to aframe at the moment; this can be a discussion at a later time. All we are doing is sharing our work for reproducibility, and hopefully, cross-checking. Please let us know if we should share our work in another way.