`RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation`

the following code generates an error in some of the most recent versions of `py-torch`: https://github.com/microsoft/oac-explore/blob/cbc0333cc9b616f6bbca9d6d9cdd37fd29ef55e7/trainer/trainer.py#L146-L159

`RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation`

In order to solve it is necessary to move these lines 

https://github.com/microsoft/oac-explore/blob/cbc0333cc9b616f6bbca9d6d9cdd37fd29ef55e7/trainer/trainer.py#L120-L124

between the q networks gradient steps and the steps on the policy network as so: 

```py
"""
Update networks
"""
self.qf1_optimizer.zero_grad()
qf1_loss.backward(retain_graph=True)
self.qf1_optimizer.step()

self.qf2_optimizer.zero_grad()
qf2_loss.backward(retain_graph=True)
self.qf2_optimizer.step()

q_new_actions = torch.min(
    self.qf1(obs, new_obs_actions),
    self.qf2(obs, new_obs_actions),
)
policy_loss = (alpha * log_pi - q_new_actions).mean()

self.policy_optimizer.zero_grad()
policy_loss.backward(retain_graph=True)
self.policy_optimizer.step()
```

Be aware that if you simply use an old version of pytorch to solve this problem the behaviour might not be what you expect since the `policy_loss` was computed based on a network which no longer exists

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation` #27

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

	"""
	Update networks
	"""
	self.qf1_optimizer.zero_grad()
	qf1_loss.backward()
	self.qf1_optimizer.step()

	self.qf2_optimizer.zero_grad()
	qf2_loss.backward()
	self.qf2_optimizer.step()

	self.policy_optimizer.zero_grad()
	policy_loss.backward()
	self.policy_optimizer.step()

	q_new_actions = torch.min(
	self.qf1(obs, new_obs_actions),
	self.qf2(obs, new_obs_actions),
	)
	policy_loss = (alpha * log_pi - q_new_actions).mean()

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation #27

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

`RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation` #27