In the current version, the agent only perform a single gradient update for every train_frequency, which means that gradient update is not really done frequently. This may results in slow training.
If we allow multiple gradient updates for every train_frequency, the agent may be able to learn more quickly. So, we should add this option. Especially, since SB3 also has this option.
However, we also cannot do too many gradient updates, as the experience buffer, where the training data are sampled from, keeps changing. Too many gradient updates would cause the agent to overfit to specific buffer versions.
In the current version, the agent only perform a single gradient update for every train_frequency, which means that gradient update is not really done frequently. This may results in slow training.
If we allow multiple gradient updates for every train_frequency, the agent may be able to learn more quickly. So, we should add this option. Especially, since SB3 also has this option.
However, we also cannot do too many gradient updates, as the experience buffer, where the training data are sampled from, keeps changing. Too many gradient updates would cause the agent to overfit to specific buffer versions.