GPU Utilization and Resource Requirements

Questions and Answers : Issue Discussion : GPU Utilization and Resource Requirements
Message board moderation

To post messages, you must log in.

AuthorMessage
crashtech

Send message
Joined: 25 Dec 20
Posts: 2
Credit: 10,970,512
RAC: 0
Message 1077 - Posted: 6 Feb 2021, 20:19:34 UTC

After running the GPU app in both Windows and Linux, I have a couple of observations. I don't know if anything can be done about it, but it might be worth mentioning.

First, compared to most other GPU projects, GPU utilization is poor. Even trying to run two tasks concurrently does not seem to help a whole lot. Typical utilization ranges from 75-85%. There may be some performance being left on the table.

Second, even though the app can't saturate the GPU, it seems to need a great deal of other PC resources to run as well as it can. Typically using more than half the CPU cores for any other purpose while crunching MLC will result in a drastic performance hit. Some other GPU apps have this tendency as well, but not to the degree seen here. Usually leaving 2 physical cores open should be enough to satisfy most GPU apps. It doesn't appear that cores are the actual problem though, it may be that the MLC WUs are occupying a lot of L3 cache or something similar.

It's a bit disappointing that to run MLC effectively, one must put the host PC in a condition of outputting far less work than with other combinations of projects, which tends to make it less attractive as a go-to everyday project to run.

Just my two cents.
ID: 1077 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
cartoonman

Send message
Joined: 9 Jan 21
Posts: 2
Credit: 10,070,224
RAC: 0
Message 1078 - Posted: 7 Feb 2021, 4:44:11 UTC

My two cents:

GPU Utilization is pretty low, but I have seen significant gains when running 2 GPU tasks per GPU. That being said on a single task I was witnessing GPU utilization around 40-60%, so adding another task has the benefit of using the additional headroom (as far as the kernel sizes allocate of course).

My bet is that it has to do with PyTorch moreso than anything. Perhaps some optimization to the PyTorch configuration could help to make use of additional resource headroom, but in general it may just be the case that since training is a purely iterative process, that this is just the best we'll get. It may be useful to think of having GPU WU's be batches of multiple HDF5 runs, such that you can then have a 'supervisory' app that would spawn as many parallel pipelines as optimally efficient for the given GPU, but it could be an increase in complexity for little return.

An option would be to do something similar to Einstein@Home where GPU utilization factor can be set in user preferences, instead of having to make our own `app_info.xml` file. In this way you offload the problem of resource scheduling to the BOINC client, for which it's quite well designed to do, and for free.
ID: 1078 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
crashtech

Send message
Joined: 25 Dec 20
Posts: 2
Credit: 10,970,512
RAC: 0
Message 1079 - Posted: 7 Feb 2021, 16:07:00 UTC - in response to Message 1078.  

Hi cartoonman, thanks for the reply. Do you have any high core count CPUs, and if so, do you try to utilize them while also running the MLC GPU app?
ID: 1079 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
CosminZ

Send message
Joined: 26 Nov 20
Posts: 1
Credit: 11,279,629
RAC: 0
Message 1080 - Posted: 7 Feb 2021, 17:10:54 UTC

In linux the utilization is better than in windows. Running 2 workunits in linux results in double the runtime from 2100sec to 4200sec for the long workunits on an nvidia 2070 laptop with lower gpu utilization and pcie between 40 to 70%. For comparison using 1 workunits (configured with app_config to use 2 cpu cores) results in 87% gpu at 1875mhz, 6% pcie and 90-95w power (in windows was using 60-70w from what i remember).
Opening a program or folder the pcie utilization jumps to 60-70% for a few seconds and using the browser the runtimes increase from 35min to 45min or more, on other boinc projects there is a increase in runtimes but usually in tens of seconds not hundred of seconds or up to 50% longer.
The new workunits for gpu, rand_automata have the exact same behavior and are better suited for low end gpu like nvidia 1030 finishing in 1350sec and nvidia 2070m in 500sec.
ID: 1080 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
cartoonman

Send message
Joined: 9 Jan 21
Posts: 2
Credit: 10,070,224
RAC: 0
Message 1082 - Posted: 7 Feb 2021, 20:17:35 UTC - in response to Message 1079.  
Last modified: 7 Feb 2021, 20:18:23 UTC

Yea, I run ~12 other CPU tasks on my 16 thread CPU alongside the 2 parallel GPU tasks (each allocating 1 CPU thread each)

I recalled seeing a throughput increase of ~1.5x-1.75x when using 2 parallel GPU mlds tasks versus one. Obviously the runtime for a single task got longer, but the throughput increased because the runtime didn't increase 2x with an additional WU. I did experiment with 3 parallel GPU WUs, but I saw no benefit. I think it varies with the GPU available and the amount of VRAM and bandwidth to the GPU.
ID: 1082 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 462
Credit: 21,406,548
RAC: 0
Message 1083 - Posted: 7 Feb 2021, 20:46:32 UTC

I'm am positive the GPU client could be optimized, but it's a complex issue with a lot of variables (OS, environment, etc..) and unless there's someone who can tackle this quickly, it's lower priority than releasing the dataset, preparing for DS4, analyzing DS1/2/3 results, writing the paper.

I did open https://gitlab.com/mlcathome/mlds/-/issues/12 to track this though, if someone wants to tackle it.
ID: 1083 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dataman
Avatar

Send message
Joined: 1 Jul 20
Posts: 32
Credit: 22,436,564
RAC: 0
Message 1096 - Posted: 19 Feb 2021, 15:56:36 UTC - in response to Message 1078.  



An option would be to do something similar to Einstein@Home where GPU utilization factor can be set in user preferences, instead of having to make our own `app_info.xml` file. In this way you offload the problem of resource scheduling to the BOINC client, for which it's quite well designed to do, and for free.

Agree!!! That is why I do not run GPU work here very often as I do not customize my config. files for individual projects. The manner in which Einstein does it makes it very easy to tune # wu's/GPU. My 2 cents.
Cheers

ID: 1096 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
An0ma1y
Avatar

Send message
Joined: 3 Aug 20
Posts: 8
Credit: 7,650,164
RAC: 0
Message 1098 - Posted: 21 Feb 2021, 20:11:36 UTC - in response to Message 1096.  
Last modified: 21 Feb 2021, 20:12:02 UTC



An option would be to do something similar to Einstein@Home where GPU utilization factor can be set in user preferences, instead of having to make our own `app_info.xml` file. In this way you offload the problem of resource scheduling to the BOINC client, for which it's quite well designed to do, and for free.

Agree!!! That is why I do not run GPU work here very often as I do not customize my config. files for individual projects. The manner in which Einstein does it makes it very easy to tune # wu's/GPU. My 2 cents.
Cheers

Yes, please!
ID: 1098 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
wolfman1360

Send message
Joined: 7 Jul 20
Posts: 23
Credit: 39,708,780
RAC: 358
Message 1103 - Posted: 23 Feb 2021, 8:30:32 UTC

Hello,

I've got 2 GPUs here. From what I've read Polaris isn't supported, yet - so my RX570 will remain crunching other projects, for the time being.
I do however have a gtx 1050 ti in my laptop on Windows. What types of runtimes should I expect versus CPU? I recall Asteroids GPU tasks ran about equal to 3 CPU cores.
This has an i7-8750h, for reference.
ID: 1103 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Werinbert

Send message
Joined: 30 Nov 20
Posts: 14
Credit: 7,958,883
RAC: 16
Message 1116 - Posted: 5 Mar 2021, 15:53:56 UTC

So why is my 750Ti producing 4x the Cr/h as my 1660Ti? There is some serious problem somewhere in the implementation of the GPU app.
ID: 1116 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Pop Piasa
Avatar

Send message
Joined: 10 Mar 21
Posts: 2
Credit: 36,844,062
RAC: 22
Message 1137 - Posted: 7 Apr 2021, 18:24:05 UTC

Hi everybody, I'm camping out here while GPUGRID is between projects. I've noticed the low GPU utilization and have had good results running Folding@home Fahcore cuda WUs simultaneously with MLDG WUs. It appears to slow the Fahcore tasks some, but MLDGs hardly any. I've also had decent results multitasking Open CL tasks alongside of CUDA tasks.
Be aware though, that F@H credit does not apply to your BOINC stats. Cruel irony that my GPUs make twice the credit/hr there...

I also run only four of my eight threads on crunching tasks as I'm running twin 3GB GTX 1060's. My CPU runs around 90% usage and WU time vs CPU time ratios are fair for a windows machine.

Cheers
ID: 1137 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Questions and Answers : Issue Discussion : GPU Utilization and Resource Requirements

©2023 MLC@Home Team
A project of the Cognition, Robotics, and Learning (CORAL) Lab at the University of Maryland, Baltimore County (UMBC)