Skip to content

S4 model modification, testing, and proper copyright compliance#447

Closed
kyoon-mit wants to merge 7 commits into
ML4GW:mainfrom
kyoon-mit:ssm-pr1
Closed

S4 model modification, testing, and proper copyright compliance#447
kyoon-mit wants to merge 7 commits into
ML4GW:mainfrom
kyoon-mit:ssm-pr1

Conversation

@kyoon-mit

Copy link
Copy Markdown

Summary of Changes

  1. Renamed s4.py to s4d.py in libs/architectures/architectures/networks. Updated __init__.py accordingly.
  2. Ensured proper attribution to the original s4 repo under the Apache-2.0 license.
  3. Modified s4d.py code.
  4. Added test_s4d.py under libs/architectures/tests/networks. All 6 tests pass. (Run the command uv run pytest tests/ from libs/architectures/)
  5. Ran pre-commit hook.

NOTE

This is the first in a series of PRs following the suggestion of Deep during the Thursday meeting to push small, incremental changes.

The context of the discussion was that we, Benedict and I, have been implementing our state space models in the aframe setup in order to make our results reproducible. We are not suggesting a definite integration of our changes to aframe at the moment; this can be a discussion at a later time. All we are doing is sharing our work for reproducibility, and hopefully, cross-checking. Please let us know if we should share our work in another way.

kyoon-mit added 5 commits May 31, 2026 02:10
- Refactored `S4DKernel` to handle variable-length inputs. Removed fixed `length` buffer; length `L` is inferred dynamically in `forward(L)`.
- Replaced `1j` with `torch.complex(...)` so that it can be compiled with `torch.compile`.
- Removed `prenorm` from `S4D` and `S4Model` sinced it was unused.
- Switched dropout function from 1-dimensional to N-dimensional, which is what the original S4 paper specifies. `1d` drops channels; `Nd` drops time steps.
- Made `d_output` to be a requirement in `S4Model`. The default value of 10 was inherited from a CIFAR-10 demo and had no meaning here.

Test file added under the name `test_s4d.py`.

@deepchatterjeeligo deepchatterjeeligo left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @kyoon-mit thanks for starting this. At first glance this looks good, and its also pretty modular. However, I think the right place for this to live is under ml4gw: https://github.com/ML4GW/ml4gw/tree/main/ml4gw/nn

The design is that: anything that is common to multiple projects be refactored into ml4gw as that is common dependency for all of our projects. Based on your presentations and the discussion, this network seems to have promise on multiple fronts. The existing s4 implementation here in aframe may have been due to previous prototyping, and should be refactored. So here is my suggestion:

  • let's port over this MR to ml4gw
  • once we have that in a release, let's take the s4.py from aframe out.

Comment thread third_party/s4-LICENSE Outdated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that you have attributed credit to the original authors in your code. This license file is not needed. We have similar adaptations in other places, and citing source as comments is sufficient.

@kyoon-mit kyoon-mit Jun 1, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the license part, understood. For the question of which repo exactly we want to have this state space model live in, I was going to ask you the same question. Since it's Sunday, the progress is slow, but Benedict & I were talking today about pushing more PRs by Tuesday. These would not only include these state space model architectures, but also the specific regression tasks that we've set up using the aframe dataloader. Yes, it's a lot to ask in terms of changes, especially when aframe is designed for a specific purpose.

Since I am not that familiar with ml4gw, would it be straightforward to have a similar aframe-style dataloader with a regression/classification task with state space models?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be straightforward. So I will still suggest to port over this PR to ml4gw. I have created a https://github.com/ML4GW/aframe/tree/regression-datamodules branch, where we will update the ml4gw dependency to be the bleeding edge version.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will still suggest to keep this PR, and have it remove s4.py. We will get it merged once s4d.py is in ml4gw.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR modified such that it removes s4.py.

@kyoon-mit

Copy link
Copy Markdown
Author

PR moved to ml4gw:
ML4GW/ml4gw#311

@kyoon-mit kyoon-mit closed this Jun 3, 2026
@kyoon-mit kyoon-mit reopened this Jun 3, 2026
@deepchatterjeeligo

Copy link
Copy Markdown
Contributor

@kyoon-mit let's make this PR against regression-datamodules

@deepchatterjeeligo deepchatterjeeligo left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kyoon-mit can you fix the imports in the tests, and make this against regression-datamodules branch?

@kyoon-mit

Copy link
Copy Markdown
Author

Closing PR as the code changes are incorporated into #503.

@kyoon-mit kyoon-mit closed this Jun 24, 2026
@kyoon-mit kyoon-mit deleted the ssm-pr1 branch June 24, 2026 21:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants