Hi, I noticed that in the encode_text_m function of file clip/model.py, the call to self.transformer_m(x) was replaced with self.transformer(x), which means the momentum model of text modality was not used. What is the intention behind this change? How does it affect the results?
Hi, I noticed that in the
encode_text_mfunction of fileclip/model.py, the call toself.transformer_m(x)was replaced withself.transformer(x), which means the momentum model of text modality was not used. What is the intention behind this change? How does it affect the results?