2:4 sparsity with to_sparse_semi_structured method from pytorch results in memory issue

I am trying to reduce the memory footprint of the 2:4 sparsegpt pruned LLaMA2 model using to_sparse_semi_structured method from PyTorch.  However, when I apply this to modify the way the  sparse parameters are stored, I got out of memory. Please note that I did not get out of memory for the  original dense model.  
Below is the code I was running, where model_path is the path to the pruned model. 

``` 
from torch.sparse import to_sparse_semi_structured, SparseSemiStructuredTensor
model = AutoModelForCausalLM.from_pretrained(model_path)
model = model.to(device).half()

for fqn, module in model.named_modules():
    # print(fqn)
    if isinstance(module, nn.Linear):
        module.weight = nn.Parameter(to_sparse_semi_structured(module.weight))
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2:4 sparsity with to_sparse_semi_structured method from pytorch results in memory issue #28

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

2:4 sparsity with to_sparse_semi_structured method from pytorch results in memory issue #28

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions