Questions and Answers :
Windows :
Some errors
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 11 Jul 20 Posts: 33 Credit: 1,266,237 RAC: 0 |
1127395 1127422 <message> Now other wus seems to run correctly. |
|
Send message Joined: 30 Jun 20 Posts: 462 Credit: 21,406,548 RAC: 0 |
Thanks for the report. It looks like a null pointer dereference, but its in the c10 library (pytorch lib). It's weird that it only happens occasionally though. Overall we don't get a lot of compute errors on this project, but windows errors with with similar stack traces make up the majority of the few we do get. I'll look into it more. Note it may be fixed in the 9.50 windows app that I need to roll out again.. hopefully later today. That uses a new version of the pytorch libs (v1.6.0 vs v1.5.1) |
|
Send message Joined: 26 Aug 20 Posts: 1 Credit: 12,615 RAC: 0 |
Hi,I meet the same problem for all wus in this pc(https://www.mlcathome.org/mlcathome/result.php?resultid=1214706|https://www.mlcathome.org/mlcathome/result.php?resultid=1192679|https://www.mlcathome.org/mlcathome/result.php?resultid=1200493|https://www.mlcathome.org/mlcathome/result.php?resultid=1200481) (unknown error) - exit code -1073741819 (0xc0000005)</message> <stderr_txt> Unhandled Exception Detected... - Unhandled Exception Record - ①Reason: Access Violation (0xc0000005) at address 0x00007FFA6B9C0D50 read attempt to address 0x36065860 ②Reason: Access Violation (0xc0000005) at address 0x00007FFA5AE127A7 read attempt to address 0x5BE59378 ③Reason: Out Of Memory (C++ Exception) (0xe06d7363) at address 0x00007FFA69528B8C ④Reason: Access Violation (0xc0000005) at address 0x00007FFA0CAF2D6A read attempt to address 0xFFFFFFFF OS:WIN10 2004 20190.1000 APP:Machine Learning Dataset Generator v9.55 windows_x86_64 [/b] |
|
Send message Joined: 30 Jun 20 Posts: 462 Credit: 21,406,548 RAC: 0 |
Windows issues are so hard to debug. Fortunately the windows client error rate is in line with the linux clients, which is good (it was higher for a while). This one at least came with a helpful stack trace that seems to show it running out of memory. MLDS uses about 700-750MB per WU.. if you have a high number of threads, I can see it hitting memory problems if other things are running on the box, even at 32GB, especially since BOINC i think defaults to using only 40-50% of memory on a system. I'll have to look at your specific WU and host later in the day when I'm home. Pytorch is not very good at gracefully dealing with out of memory errors. And I'm fairly certain its not an issue with our client mis-using the interface. The other windows issues I see a lot boil down to missing DLL or missing entry point (which is, essentially, a missing a dll), and I don't know why that's the case. |
©2022 MLC@Home Team
A project of the Cognition, Robotics, and Learning (CORAL) Lab at the University of Maryland, Baltimore County (UMBC)