All my GPU applications have crushed.

Questions and Answers : Issue Discussion : All my GPU applications have crushed.
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Jim1348

Send message
Joined: 12 Jul 20
Posts: 44
Credit: 71,449,633
RAC: 668
Message 814 - Posted: 12 Nov 2020, 5:52:14 UTC - in response to Message 810.  

PS - I am supporting the GPU with two free cores of an i7-4771. That should be plenty. But just to check, I am suspending the WCG/OPN work on the other six cores to see if it makes a difference.
I will note back here if it does.
Remarkably, it does make a difference. Going from two to all eight CPU cores reduces the time from 80 to 50 minutes.
That is still not as good as Win10, but it may provide an explanation. Win10 may communicate between CPU and GPU better than Win7.
We will see how Linux does.
ID: 814 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zombie67 [MM]
Avatar

Send message
Joined: 1 Jul 20
Posts: 34
Credit: 26,028,988
RAC: 449
Message 816 - Posted: 12 Nov 2020, 6:18:59 UTC

Yeah, after more experimentation, using *any* of my i5-4590 (4 core, no HT) cores for other BOINC CPU tasks causes the GPU utilization to zero out. In other words, if I run only a single GPU task, GPU utilization is ~40% and CPU utilization is ~50%. If I allow even one other CPU tasks from another project to run, you would think the only thing that would change is that the CPU utilization would increase to ~75%. But that's not what happens. CPU goes up to ~75%, and GPU drops to single digits. Perhaps the chipset bus bandwidth is just maxed out with a single GPU task? This is a very old chipset, after all. I would like to compare this to a more modern chipset, but I can't, because all my new machines just error out (see other threads).
Reno, NV
Team: SETI.USA
ID: 816 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bozz4science

Send message
Joined: 9 Jul 20
Posts: 142
Credit: 11,533,376
RAC: 487
Message 818 - Posted: 12 Nov 2020, 14:25:56 UTC

Very interesting input from you guys. I tried to further experiment with various settings and I strangely I saw a somewhat similar behaviour. It seems after setting up the app config file to reserve 2 CPU threads for the GPU task, the ø compute load increased from an initial 65% to 85% immediately after reading in the config file in the manager while a WU was running. That sped up things again for me and I am now (strangely again) very close to runtimes I have observed for GTX 970/980 cards with my 750Ti at around the 1,000 sec mark. However, and that's what is confusing me, upon the change to reserve 2 CPU threads for the GPU task, 1 CPU WU immediately gets suspended but the overall CPU utilisation of the GPU task stays at a 1-thread level only (overall CPU utilisation is reduced by 1 thread). So it seems that you were right about the overall # of CPU tasks running concurrently in parallel with the GPU tasks seems to impede its performance somehow. After all the GPU tasks demands around 3 GB of RAM on my system and loads my 2 GB GPU memory at ø ~40% on top of that.

I also saw a speedup depending on what CPU tasks were running. When running 5 out of 6 threads with MLC CPU tasks, for the GPU tasks I observed ø runtimes of 1,250 sec. When changing to other projects such as TNGrid, the runtimes improved to ø 1,100 sec. After continuing to running 2 threads on GPU task and 4 threads of TNGrid CPU tasks, I continue seeing the GPU tasks coming in at ~1,000 sec reliably.

I came up with other variables that might explain some performance discrepancies between different cards:
- RAM speeds
- RAM type (ECC or non-ECC)
- PCIe bus width
ID: 818 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 456
Credit: 14,368,944
RAC: 939
Message 823 - Posted: 12 Nov 2020, 21:12:58 UTC

I'm shocked to hear that multiple CPU threads makes a difference. There is no (obvious) reason why that should be, since I explicitly disable multiple threads for both GPU and CPU. I'll look into this a bit more tonight....
ID: 823 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
LCB001

Send message
Joined: 1 Jul 20
Posts: 1
Credit: 22,252,083
RAC: 44,511
Message 828 - Posted: 13 Nov 2020, 2:41:56 UTC

Same GPU - Big difference in results;

Both GPUs are the exact same make and model.
Both Rigs are running mainly SiDock wu's on the cpu plus [1] WCG ARP wu.
#1 is my daily driver, #2 is Boinc only.

#1: Intel i7-3930k, Win 7 Ult., 12GB ram, GTX1070 [445.57] ~3800 secs
#2: Ryzen R7-2700, Win 10 Pro, 32GB ram, GTX 1070 [445.75] ~900 secs

#1 Shows ~8->14% GPU Load - Freeing cpu threads has a small effect on load.
#2 Shows ~30->51% GPU load - Freeing cpu threads lets load increase to 68 %+.

#1 GPU clocks show zero boost and remain at stock 1582 MHz.
#2 GPU clocks show boost to 1961 MHz.

Tried running 2x wu's on #1 still showed very little load with almost no rise in temps.

Not sure if it's the OS or something else but #1 acts like something is blocking the GPU from getting most of it's part of the workload if that makes any sense.
ID: 828 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sid

Send message
Joined: 22 Aug 20
Posts: 7
Credit: 18,863,732
RAC: 0
Message 829 - Posted: 13 Nov 2020, 9:12:33 UTC - in response to Message 828.  



Tried running 2x wu's on #1 still showed very little load with almost no rise in temps.

Not sure if it's the OS or something else but #1 acts like something is blocking the GPU from getting most of it's part of the workload if that makes any sense.


Talking about loads - the Task managers shows 3% for my GTX750-TI.
However GPU-Z shows 84% of "GPU load".
I have much more trust for GPU-Z.
ID: 829 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bozz4science

Send message
Joined: 9 Jul 20
Posts: 142
Credit: 11,533,376
RAC: 487
Message 830 - Posted: 13 Nov 2020, 11:18:22 UTC - in response to Message 829.  
Last modified: 13 Nov 2020, 11:20:32 UTC

task managers shows 3% for my GTX750-TI. However GPU-Z shows 84% of "GPU load".
Still haven't figured out what exactly the task manager is reporting in Windows. But if you were to click the GPU monitor you get the different sub-windows showing load values for various GPU features (such as 3D engine, copy engine, memory, etc.) And you can change any sub-window to display some different component. By clicking on it and changing it to "Compute_0", you can let task manager display and monitor the CUDA compute load which then should be similar to readings of MSI Afterburner, CPU-Z or the like.

3% load for a GTX 750Ti sounds unreasonable, but I just checked on my system and task manager reports in the "main side bar" also only 5% load but the compute_0 window shows ~85% which seems in line with the readings from the other software.

#1: Intel i7-3930k, Win 7 Ult., 12GB ram, GTX1070 [445.57] ~3800 secs
#2: Ryzen R7-2700, Win 10 Pro, 32GB ram, GTX 1070 [445.75] ~900 secs
I don't know whether the OS is partially causing this discrepancy in runtimes, but again I have several ideas (yet don't know if it makes sense)
- RAM speed --> might partly explain the speedup of work on your R2700
- RAM type (ECC/non-ECC) --> might play a role as RAM data is quickly and constantly overwritten as intermediate weight updates and loss information has a very short half live (only 1 epoch)
- Total RAM capacity
- Overhead / daily use besides BOINC --> from what you wrote, I guess the main rig is running other stuff than BOINC in the background that could further impede computational speed
- Overclock settings of GPU / temp or performance preference
- PCIe slot width --> guess you are running single GPU setups only and thus all are in a x16 (gen3) slot
- PCIe slot gen
- Bus width --> your i7-3930k supports 5 GT/s but your Ryzen 2700 boast with 8 GT/s as far as I can tell which might make a difference in memory/CPU transfer and might be relevant as RAM is loaded quite heavily
ID: 830 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 12 Jul 20
Posts: 44
Credit: 71,449,633
RAC: 668
Message 831 - Posted: 13 Nov 2020, 13:21:29 UTC - in response to Message 830.  

task managers shows 3% for my GTX750-TI. However GPU-Z shows 84% of "GPU load".
Still haven't figured out what exactly the task manager is reporting in Windows.
The Task Manager is reporting the load on the CPU core that is supporting the GPU. So it can be quite low. Or it can be high, depending on how CUDA is implemented.
But the 3% number does not say anything about the GPU load. It is GPU-Z that tells you that. The 84% is quite reasonable.
ID: 831 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bozz4science

Send message
Joined: 9 Jul 20
Posts: 142
Credit: 11,533,376
RAC: 487
Message 832 - Posted: 13 Nov 2020, 13:32:44 UTC - in response to Message 831.  
Last modified: 13 Nov 2020, 13:34:05 UTC

Task Manager is reporting the load on the CPU core that is supporting the GPU
Thanks for the explanation, though I doubt that this is really what it shows. If this were true, it should show the same % of CPU util as the task manager shows in the corresponding process. And this is 8.6% which is roughly 1/12 out of the 6 hyper threaded cores. If this were to be the metric the GPU side panel window would show, it had to be the same 8.6% but is showing 5% instead...

I agree on the ~85% overall compute load on the 750Ti being reasonable. You could also try to run 2 units in tandem to increase load to 100% and efficiency as well.
ID: 832 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 12 Jul 20
Posts: 44
Credit: 71,449,633
RAC: 668
Message 833 - Posted: 13 Nov 2020, 14:10:54 UTC - in response to Message 832.  

Thanks for the explanation, though I doubt that this is really what it shows. If this were true, it should show the same % of CPU util as the task manager shows in the corresponding process. And this is 8.6% which is roughly 1/12 out of the 6 hyper threaded cores. If this were to be the metric the GPU side panel window would show, it had to be the same 8.6% but is showing 5% instead...

I should have explained that differently. Task manager shows the load on the core, but as expressed as a percentage of the overall load on the CPU. So, for example, if the core is fully loaded, and you have 8 cores (as on my i7-4771), then task manager shows 13% (which is 12.5% rounded up). I am not quite sure what you mean by "the corresponding process".
ID: 833 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
rbpeake

Send message
Joined: 20 Jul 20
Posts: 7
Credit: 30,698,006
RAC: 0
Message 834 - Posted: 13 Nov 2020, 14:11:02 UTC

Perhaps this already has been addressed. Similar to Einstein@home, is it possible to run more than one work unit per GPU?
ID: 834 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bozz4science

Send message
Joined: 9 Jul 20
Posts: 142
Credit: 11,533,376
RAC: 487
Message 835 - Posted: 13 Nov 2020, 14:22:10 UTC - in response to Message 834.  

In my personal experience yes. You can take a look here (https://www.mlcathome.org/mlcathome/forum_thread.php?id=120) for more detailed runtime and efficiency comparisons. Currently I am running 2 WU in tandem on my 750Ti.
ID: 835 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 456
Credit: 14,368,944
RAC: 939
Message 843 - Posted: 14 Nov 2020, 20:32:20 UTC

First, we're going to pull the linux cuda client from mlds-gpu, because for some reason, not a single linux/cuda WU has successfully crunched in the mlds-gpu queue since it went live, out out of 3000+ tries. While we still see some success in the mldstest queue, something is still very wrong. For reference, we get an 89% success rate on the windows cuda client, and its climbing.

CUDA is a pain to work with if you're trying to get something that works in multiple environments.

We've also been working these last few days no getting a non-appimage version of the client up and going, which requires some playing with RPATH and linking (which appimage previously did for us). But it should simplify things going forward once its complete.

Secondly, I'm glad to hear some people who were concerned about lower GPU usage are seeing more load when using different monitoring tools. That seems to align better with reality.

Thanks for sticking with us and being patient. On the bright side, we're seeing a nice boost in getting networks trained from those that are working.
ID: 843 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bozz4science

Send message
Joined: 9 Jul 20
Posts: 142
Credit: 11,533,376
RAC: 487
Message 849 - Posted: 17 Nov 2020, 1:15:08 UTC - in response to Message 843.  
Last modified: 17 Nov 2020, 1:15:20 UTC

Any update on the potential Windows issue with MKLDNN? Still see the weird behaviour that only 1 out of 2 dedicated threads is used for each GPU WU but as soon as I scale it back to only use 1 per WU, the performance is suffering immediately. Would be great if you talk about this in you next update.
ID: 849 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
gemini8

Send message
Joined: 6 Jul 20
Posts: 7
Credit: 1,509,045
RAC: 959
Message 885 - Posted: 24 Nov 2020, 12:06:44 UTC

After it worked for some time today, I'm now getting errors again:
Di 24 Nov 2020 12:38:37 CET | MLC@Home | Aborting task rand_automata_0084-1605109125-28070-1_1: exceeded disk limit: 3054.19MB > 1953.12MB
Di 24 Nov 2020 12:38:41 CET | MLC@Home | Output file rand_automata_0084-1605109125-28070-1_1_r2028520578_0 for task rand_automata_0084-1605109125-28070-1_1 absent
Di 24 Nov 2020 12:38:41 CET | MLC@Home | Output file rand_automata_0084-1605109125-28070-1_1_r2028520578_1 for task rand_automata_0084-1605109125-28070-1_1 absent

I have plenty of diskspace available. At least that's what Boinc's telling me.
- - - - - - - - - -
Greetings, Jens
ID: 885 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 456
Credit: 14,368,944
RAC: 939
Message 886 - Posted: 24 Nov 2020, 16:45:01 UTC - in response to Message 885.  

See the update in the other thread. There's a "per WU" limit configured on the server when the WUs are created, and all the pre-existing WUs in the mlds-gpu core were configured about 500MB too small for the extra size of the linux CUDA client. I've updated these WUs in the database, but it may take a little time for the already queued WUs to update.

It's not ideal, I know.. we just didn't anticipate the increase in app size when we creating this batch of WUs.
ID: 886 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
gemini8

Send message
Joined: 6 Jul 20
Posts: 7
Credit: 1,509,045
RAC: 959
Message 887 - Posted: 24 Nov 2020, 17:32:10 UTC

I see.
Thanks for explaining and your work!
- - - - - - - - - -
Greetings, Jens
ID: 887 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Werinbert

Send message
Joined: 30 Nov 20
Posts: 14
Credit: 6,400,784
RAC: 8,132
Message 913 - Posted: 4 Dec 2020, 0:02:29 UTC

My Linux GTX 750TI machine runs GPU tasks just fine, my Win 7 GTX 1660TI how ever fails to complete tasks without error. I have also upgraded my GPU drivers which didn't have any effect. I noticed on GPU-Z that the GPU was not being utilized. and reading through this thread it looks like the tasks need many CPU cores in order to process quickly. Has there been any progress if resolving these problems?
ID: 913 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bozz4science

Send message
Joined: 9 Jul 20
Posts: 142
Credit: 11,533,376
RAC: 487
Message 914 - Posted: 4 Dec 2020, 0:09:49 UTC - in response to Message 913.  

To me it looks like sth different is going on. As you can see the dataset of the faulty tasks were read in successfully and training went on for 100+ epochs before this error code was thrown. I happened to get the same error message always if the card was too far overclocked (mainly memory clock) and/or too many tasks run in parallel on the same card resulting in VRAM overload scenarios that could have also incurred the illegal memory address error message. Try to tackle those if one of those points applies to you. Otherwise, others might help as well.
ID: 914 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bozz4science

Send message
Joined: 9 Jul 20
Posts: 142
Credit: 11,533,376
RAC: 487
Message 1061 - Posted: 26 Jan 2021, 18:31:34 UTC - in response to Message 913.  

My Linux GTX 750TI machine runs GPU tasks just fine, my Win 7 GTX 1660TI how ever fails to complete tasks without error. I have also upgraded my GPU drivers which didn't have any effect. I noticed on GPU-Z that the GPU was not being utilized. and reading through this thread it looks like the tasks need many CPU cores in order to process quickly. Has there been any progress if resolving these problems?


I happen to have the exact same problem! At the beginning of the year, I added a 1660 Super, that did run successfully at first, but eventially after a driver update failed to produce any WUs. It always started crunching and then just "stopped" computation. The tasks itself reported that the computation time acutally increased, but on GPU-Z I also saw zero utilization.

I didn't have any other problems with BOINC GPU applications but experienced some issues over at Folding. Apparently, having a seperate CUDA runtime installed along with the CUDA development toolkit can screw things up. Deleted both and no more problems on Folding, just here. 750Ti still crunching here, whenever BOINC assigns it a MLC WU, but in the meantime I pulled my 1660 Super from MLC and currently run other projects on it.
ID: 1061 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Questions and Answers : Issue Discussion : All my GPU applications have crushed.

©2022 MLC@Home Team
A project of the Cognition, Robotics, and Learning (CORAL) Lab at the University of Maryland, Baltimore County (UMBC)