Questions and Answers :
Issue Discussion :
wu's fail with err. message out of memory
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 4 Dec 20 Posts: 32 Credit: 47,319,359 RAC: 0 |
This wu https://www.mlcathome.org/mlcathome/result.php?resultid=3201137 failed with - Unhandled Exception Record - Reason: Out Of Memory (C++ Exception) (0xe06d7363) at address 0x00007FFC6C79D759 PC: Koprozessor NVIDIA GeForce GTX 1060 6GB (4095MB) driver: 457.51 So does this mean GPU-memory or main memory? Main Memory can be increased, GPU mem not. |
|
Send message Joined: 30 Jun 20 Posts: 462 Credit: 21,406,548 RAC: 0 |
This wu https://www.mlcathome.org/mlcathome/result.php?resultid=3201137 The error indicates the system ran out of GPU RAM. Each WU takes on the order of 1.6GB-1.9GB of GPU memory when computing. And we developed the cuda app on a system with a 1650 with only 4GB of ram, so your 1060 6GB should have plenty of headroom with memory. Are you were running anything else graphics intensive at the time? maybe a game? Or are you trying to run multiple WUs at the same time on a GPU? if so you could easily run out of GPU memory in total. Hope that helps, and thanks for crunching! |
|
Send message Joined: 4 Dec 20 Posts: 32 Credit: 47,319,359 RAC: 0 |
The PC is a live backup, running nothing than BOINC and no special setups to increase GPU-load. CPU-load is around 96%. All out of the box.But BOINC cpu wu's are also running. Also rosetta wu's which are very,very memory hungry. Sometimes they are suspended with the status 'waiting for memory'. Knowing this behaviour triggered me to ask. In the meantime i have 3 failed wu' on PC 5172 with the same message and one on PC 5173 running a NVIDIA GeForce GTX 750 Ti (2048MB) driver: 457.51 OpenCL: 1.2 https://www.dropbox.com/s/5wehq33maxx12d6/gpu-z1.PNG?dl=0 |
|
Send message Joined: 9 Jul 20 Posts: 142 Credit: 11,536,204 RAC: 3 |
I happend to see that error very often myself, when my host went a bit rogue one night when the ds1+2 GPU WUs were deployed, and my cc config file specified to run 2 WUs on my 750Ti simultaneously, which incidentally triggered an error whenever my host didn't start 2 rand WUs at the same time which proved to stay well below the 2 GB VRAM. Since then I am staying at 1 task only with a somewhat lower compute load. As the ds1+2 GPU WUs only load VRAM at about 52-54% on my 750Ti, the VRAM capacity wasn't exceeded by much, but that still caused the already started rand WUs to immediately crash as soon as those WUs were read into the GPU memory. By the way, I saw a similar behaviour if overclocking the memory clock on the GPU too aggressively, so that in the middle of the network training, the tasks threw an error. |
©2022 MLC@Home Team
A project of the Cognition, Robotics, and Learning (CORAL) Lab at the University of Maryland, Baltimore County (UMBC)