Posts by Sid

1) Questions and Answers : Issue Discussion : Setup to run 2 wu's on one GPU (Message 1018)
Posted 5 Jan 2021 by Sid
Post:
Two WUs are just locking each other on my 750Ti with 2Gb of video memory . Seems as not enough VRAM.
2) Questions and Answers : Issue Discussion : Parity Modified GPU WUs often fail (Message 867)
Posted 20 Nov 2020 by Sid
Post:

- Unhandled Exception Record -
Reason: Out Of Memory (C++ Exception) (0xe06d7363) at address 0x00007FFF3BC73B29


..It's a little counter-intuitive, but Dataset 1+2 WUs use significantly more memory on GPU than Dataset 3. This is true on CPU too, but the GPU case exacerbates the issue.


My two cents:
1. I had two 750TI video adapters in a one box. They were working just fine. For some reason I have replaced both of them for two GTX 770. All GPU tasks have failed after say 15 seconds with this error message. Both 750TI and 770 have 2Gb of video memory however GTX 770 has more cuda cores. Do you mean that GTX 770 needs say twice more memory if it has double number of cuda cores comparing with 750Ti?
2. Is it possible to use conditional mem_alloc? bad_alloc, for example. In this case more clear message like "This video card does not have enough memory. Put it in your scrap box, go and buy another one" might be provided in log.
3) Questions and Answers : Issue Discussion : All my GPU applications have crushed. (Message 829)
Posted 13 Nov 2020 by Sid
Post:


Tried running 2x wu's on #1 still showed very little load with almost no rise in temps.

Not sure if it's the OS or something else but #1 acts like something is blocking the GPU from getting most of it's part of the workload if that makes any sense.


Talking about loads - the Task managers shows 3% for my GTX750-TI.
However GPU-Z shows 84% of "GPU load".
I have much more trust for GPU-Z.
4) Questions and Answers : Windows : Just one windows machine with errors (Message 817)
Posted 12 Nov 2020 by Sid
Post:
One of my window box is saying for GPU tasks or not:

-1073741515 (0xC0000135) STATUS_DLL_NOT_FOUND

What it might be?
5) Questions and Answers : Issue Discussion : All my GPU applications have crushed. (Message 762)
Posted 5 Nov 2020 by Sid
Post:
Thank you for update. Looking forward to run
6) Questions and Answers : Issue Discussion : All my GPU applications have crushed. (Message 742)
Posted 31 Oct 2020 by Sid
Post:
Minor update on this:

I'm fairly certain the issue isn't in our app code, as I placed an infinite loop before the part that's crashing to attach a debugger, and it still crashed wile sitting in an infinite loop. Which makes me suspect the BOINC client is killing it for some reason.

Still working it.


Thank you for the update.
As far as I can see the error is "process got signal 11" that might mean: "A signal 11 error, commonly know as a segmentation fault, means that the program accessed a memory location that was not assigned."
From my point of view it is unlikely that BOINC client might kill the process by this way.
7) Questions and Answers : Issue Discussion : All my GPU applications have crushed. (Message 727)
Posted 28 Oct 2020 by Sid
Post:
The error message is the same:

<core_client_version>7.16.6</core_client_version>
<![CDATA[
<message>
process got signal 11</message>
<stderr_txt>
DEBUG: Args: ../../projects/www.mlcathome.org_mlcathome/mldstest_9.72_x86_64-pc-linux-gnu__cuda_fermi -a LSTM -w 64 -b 2 -s 32 --lr 0.001 --maxepoch 192 --device 0
nthreads: 1 gpudev: 0
Re-exec()-ing to set number of threads correctly...

</stderr_txt>
]]>

Hardware:
GenuineIntel
Intel(R) Xeon(R) CPU L5640 @ 2.27GHz [Family 6 Model 44 Stepping 2]
(24 processors)

Linux Mint 20.
Nvidia driver 450.80.02
Nvidia gtx 750TI




©2022 MLC@Home Team
A project of the Cognition, Robotics, and Learning (CORAL) Lab at the University of Maryland, Baltimore County (UMBC)