GitHub - ispamm/GRAMformer: Official implementation for GRAMformer: Any-Order Modality Interactions\\via Volumetric Multimodal Cross-Attention

GRAMformer: Any-Order Modality Interactions via Volumetric Multimodal Cross-Attention

Giordano Cicchetti, Eleonora Grassucci, Danilo Comminiello

⚠️ This repository is under construction, if you experience any bugs, report them in the issues!

Example of Volumetric Multimodal Attention (VMA) for three modalities:

def compute_attention_scores_parallel_gram(query, key_1, key_2, eps=1e-8):
    """
    Compute 3x3 Gram volumes for all pairs of language_i with (video_j, audio_j).

    query: [B, N_lang, D]
    key_1:    [B, N_vid,  D]
    key_2:    [B, N_vid,  D]

    Returns:
        volume: [B, N_lang, N_vid]
    """

    B, N_lang, D = query.shape
    N_vid = key_1.shape[1]

    # Expand for broadcasting
    l = query[:, :, None, :]   # [B, N_lang, 1, D]
    v = key_1[:, None, :, :]      # [B, 1, N_vid, D]
    a = key_2[:, None, :, :]      # [B, 1, N_vid, D]

    # Pairwise dot products
    ll = (l * l).sum(-1).expand(-1, -1, N_vid)      # [B, N_lang, N_vid]
    vv = (v * v).sum(-1).expand(-1, N_lang, -1)
    aa = (a * a).sum(-1).expand(-1, N_lang, -1)

    lv = (l * v).sum(-1)                            # [B, N_lang, N_vid]
    la = (l * a).sum(-1)
    va = (v * a).sum(-1).expand(-1, N_lang, -1)

    # Analytical determinant of Gram matrix
    det = (
        ll * (vv * aa - va * va)
        - lv * (lv * aa - la * va)
        + la * (lv * va - la * vv)
    )

    return -torch.sqrt(torch.clamp(det, min=eps))

Cite

@article{cicchetti2026gramformer,
    title={GRAMformer: Any-Order Modality Interactions via Volumetric Multimodal Cross-Attention},
    author={Cicchetti, Giordano and Grassucci, Eleonora and Comminiello, Danilo},
    year={2026},
    journal={ArXiv preprint},
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LICENSE		LICENSE
README.md		README.md
gramformer.py.py		gramformer.py.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GRAMformer: Any-Order Modality Interactions via Volumetric Multimodal Cross-Attention

Giordano Cicchetti, Eleonora Grassucci, Danilo Comminiello

Cite

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GRAMformer: Any-Order Modality Interactions via Volumetric Multimodal Cross-Attention

Giordano Cicchetti, Eleonora Grassucci, Danilo Comminiello

Cite

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages