Why not use momentum model for clip?

Hi, I noticed that in the `encode_text_m` function of file `clip/model.py`, the call to `self.transformer_m(x)` was replaced with `self.transformer(x)`, which means the momentum model of text modality was not used. What is the intention behind this change? How does it affect the results?