|
1)
Questions and Answers :
Issue Discussion :
GPU Utilization and Resource Requirements
(Message 1082)
Posted 7 Feb 2021 by cartoonman Post: Yea, I run ~12 other CPU tasks on my 16 thread CPU alongside the 2 parallel GPU tasks (each allocating 1 CPU thread each) I recalled seeing a throughput increase of ~1.5x-1.75x when using 2 parallel GPU mlds tasks versus one. Obviously the runtime for a single task got longer, but the throughput increased because the runtime didn't increase 2x with an additional WU. I did experiment with 3 parallel GPU WUs, but I saw no benefit. I think it varies with the GPU available and the amount of VRAM and bandwidth to the GPU. |
|
2)
Questions and Answers :
Issue Discussion :
GPU Utilization and Resource Requirements
(Message 1078)
Posted 7 Feb 2021 by cartoonman Post: My two cents: GPU Utilization is pretty low, but I have seen significant gains when running 2 GPU tasks per GPU. That being said on a single task I was witnessing GPU utilization around 40-60%, so adding another task has the benefit of using the additional headroom (as far as the kernel sizes allocate of course). My bet is that it has to do with PyTorch moreso than anything. Perhaps some optimization to the PyTorch configuration could help to make use of additional resource headroom, but in general it may just be the case that since training is a purely iterative process, that this is just the best we'll get. It may be useful to think of having GPU WU's be batches of multiple HDF5 runs, such that you can then have a 'supervisory' app that would spawn as many parallel pipelines as optimally efficient for the given GPU, but it could be an increase in complexity for little return. An option would be to do something similar to Einstein@Home where GPU utilization factor can be set in user preferences, instead of having to make our own `app_info.xml` file. In this way you offload the problem of resource scheduling to the BOINC client, for which it's quite well designed to do, and for free. |
©2022 MLC@Home Team
A project of the Cognition, Robotics, and Learning (CORAL) Lab at the University of Maryland, Baltimore County (UMBC)