The IaaS setup makes sense on LDG, where we have at most 8 GPUs per node and none of any consequence in the condor pool. As we look to do inference on other clusters with more GPUs distributed over multiple nodes (OSG, Delta), it's not clear to me that this setup is still advantageous (particularly for OSG, where communication across nodes doesn't seem possible). We should benchmark throughput with just loading the model in-process.
The IaaS setup makes sense on LDG, where we have at most 8 GPUs per node and none of any consequence in the condor pool. As we look to do inference on other clusters with more GPUs distributed over multiple nodes (OSG, Delta), it's not clear to me that this setup is still advantageous (particularly for OSG, where communication across nodes doesn't seem possible). We should benchmark throughput with just loading the model in-process.