PCIe bandwidth: influence on GPU performance

Message boards : Cafe : PCIe bandwidth: influence on GPU performance
Message board moderation

To post messages, you must log in.

AuthorMessage
bozz4science

Send message
Joined: 9 Jul 20
Posts: 142
Credit: 11,536,204
RAC: 3
Message 1037 - Posted: 14 Jan 2021, 19:29:08 UTC
Last modified: 14 Jan 2021, 19:53:21 UTC

I have recently been playing with a dual GPU setup, causing my primary card (1660 Super) to run in x8 mode (PCIe 3.0) along with an acient 750Ti. From preliminary runtime comparison, I see roughly a doubling of runtimes on my 1660 Super card. Anyone else running multi-GPUs systems and can report or validate my observations? That would help me in figuring out if running at half the lanes is solely responsible for the performance hit on the MLDS GPU version or something else might be going on.

For reference, this is my host: Host 6950

I am running 2 GPU WUs on the 1660 card.
Runtimes for the 2,080 credit tasks were roughly ~1,900-2,000 sec and ~3,850-4,000 sec for the 4,160 credit tasks.

Now I see runtimes that are roughly 2x of the ones I could observe prior to the installation of the second card. Recently, I could only observe runtimes of ~7,600-7,900 sec for the same 4,160 credit tasks.

For making this comparison as robust as possible, I already shortly reverted back to a single GPU setup, before giving this dual-GPU setup a second chance. Interestingly, the readings in GPU-Z for the 1660 Super card didn't change for most categories. It is running at slightly higher temps, but the same fan RPMs, voltage, mem + core clock, memory load, bus interface load, 2 tasks concurrently running. Task manager shows prior and now ~100% CUDA compute load.

Rather weird are the changes in the following readings: memory controller load, power draw, GPU load. The latter 3 readings are all down considerably in realtive terms:
Prior --> Now
- Memory controller: 11% --> 6% [rel. reduction: 45%]
- Power draw: ~ 69W --> 59W [rel. reduction: 14%]
- GPU load: 85-90% --> 66-69% [rel. reduction: 23%]

Another observation: In x16 mode one GPU WU gave me a task manager reading of ~55% CUDA compute load, what prompted me in the first place to change my app.xml settings to compute 2 WU at the same time. Now with the card running in x8 mode, running only one WU results in a task manager reading of ~95%. Any reason for this?

Boinc is sill running with the same settings as well: [# physical cores x 2] - 1

Would appreciate your input on whether you have observed this issue in the past as well, as I am kind of clueless here. Thx
ID: 1037 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
alex

Send message
Joined: 4 Dec 20
Posts: 32
Credit: 47,319,359
RAC: 0
Message 1038 - Posted: 14 Jan 2021, 22:31:36 UTC - in response to Message 1037.  

I have not exact same configuration, but a system with Nvidia and AMD.
I tested the numbers with GPU-Z. The summary can be found here in a pdf https://www.dropbox.com/s/ifb83k02fi0i1do/MLC%20answer%201.pdf?dl=0

What i found on a system running 2 wu's on the nvidia-card is a message popping up relatively often: waiting for memory. This extends thr runtime.
The system has 8 GB ram and some apps are running together with boinc. The cpu-load is in the range of 80% with spikes to nearly 100% when i start additional programs. The system has on-chip amd-gpu running Einstein; this requires (shared) ram.
Have you checked that the cpu-load is not limiting your system or ram limitations? A second gpu requires also ram and cpu-time.
I have 2 systems running with 2 wu's on the nvidia. Runtimes are longer but far away from twice as long, so this should not be the problem.
ID: 1038 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bozz4science

Send message
Joined: 9 Jul 20
Posts: 142
Credit: 11,536,204
RAC: 3
Message 1040 - Posted: 16 Jan 2021, 13:10:48 UTC - in response to Message 1038.  
Last modified: 16 Jan 2021, 13:26:57 UTC

Thanks for testing alex! Appreciate it.

However, I am kind of lost with my ongoing investigation. In the meantime I've tried running other GPU applications and compare their performance with the 1660 Super card in x16 vs. x8 mode respectively. All GPU applications, I have run in the past, such as Milkyway, Einstein, Prime Grid, SR Base and Folding seem to be working just fine. I hardly notice any performance hit at all on these apps. However, after updating my driver recently to version 461.09, I can't seem to get any MLC GPU tasks running on my cards with the settings I used to run with.

Tasks start fine and finish with only 1 tasks running at a time. If I choose to run 2 tasks concurrently, then the tasks' training data is read into the VRAM, but they never start computing on the GPU. From 2 faulty tasks like these (Task 3641959), I reckon that they only ran on the CPU(?). If I start to run 1 task on the card and switch to the 2-task settings, it just takes a few seconds and after the second task's training data has been loaded to the VRAM as well, the GPU load suddenly drops to 0%. Changing back to the 1 WU at a time setting and/or the usual suspend/restart remedy can't help to get the tasks back running... I am completely stumped by this.

Edit: I even tried resetting the project and re-downloading all project file to no avail.
ID: 1040 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
alex

Send message
Joined: 4 Dec 20
Posts: 32
Credit: 47,319,359
RAC: 0
Message 1041 - Posted: 16 Jan 2021, 13:58:20 UTC

I see a difference in the driver version i use.
One system runs with 457.51 , another one with 452.06. Both win10 pc's.
ID: 1041 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bozz4science

Send message
Joined: 9 Jul 20
Posts: 142
Credit: 11,536,204
RAC: 3
Message 1042 - Posted: 16 Jan 2021, 14:05:07 UTC - in response to Message 1041.  

Well, that could be (part of) the problem...

At least, with the 460.xx driver version, it let me run my 2-tasks at a time setup, even if it was much slower than it should have been in x8 vs. x16 mode. Strangely, the performance of the other GPU apps don't seem to be affected. Weird,...

I definitely reached the end of my troubleshooting tech skills at this point. I'll likely revisit this in a few weeks. Thanks
ID: 1042 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Cafe : PCIe bandwidth: influence on GPU performance

©2022 MLC@Home Team
A project of the Cognition, Robotics, and Learning (CORAL) Lab at the University of Maryland, Baltimore County (UMBC)