Questions and Answers :
Windows :
Exit status -1073741515 (0xC0000135) STATUS_DLL_NOT_FOUND
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 5 Jul 20 Posts: 25 Credit: 348,811 RAC: 0 |
Just a fyi... A task #6071628 tried to run on one of my PC's and crashed upon startup with Exit status -1073741515 (0xC0000135) STATUS_DLL_NOT_FOUND. In the tasks STDERR output was "(unknown error) - exit code 3221225781 (0xc0000135)". Never had this happen before on that PC. It is running Windows 8.1 with all the latest patches/updates. If it happens again I'll post back in this topic.
|
|
Send message Joined: 7 Jul 20 Posts: 23 Credit: 39,708,780 RAC: 358 |
I'm having something similar on a Windows server 2008 machine. https://www.mlcathome.org/mlcathome/result.php?resultid=6117764 Do these tasks require any prerequisites on older windows versions? thanks |
|
Send message Joined: 5 Jul 20 Posts: 25 Credit: 348,811 RAC: 0 |
I have sent a PM to pianoman [MLC@Home Admin], asking for a look at the error message.
|
|
Send message Joined: 30 Jun 20 Posts: 462 Credit: 21,406,548 RAC: 0 |
The problem is that Windows help messages in this case are incredibly unhelpful. I have no way of debugging. The problem is with a DLL somewhere, a dynamically loaded library on the system, is either not there or the wrong version (either too new or too old), or the one on the system conflicts with the one we ship. We don't know which DLL is causing issues or why, just "there's an issue somewhere". However, I *am* trying to fix this a different way. The client is currently linked dynamically, and we ship the DLLs needed for it to run with the client in the same directory. It would be better to link it statically (bundle all the needed libraries into a single exe). Until a few months ago pytorch wouldn't even compile staticlly, but the current new Linux client in mlds-test, with DS4 support, is now statically compiled and seems to work fine (woohoo!) I've spent the past 2 weeks trying to get the windows version of pytorch to compile statically, and just last night I finally got it at least compiling (not linking yet, but compiling at least. Once I incorporate that into the build, hopefully all these DLL problems will go away. The problem is this is very time intensive, as nothing seems to support this out of the box. I have to create the compile commands by hand through trial and error until they work, then convert them back into cmake to be repeatable. The good news is that, if I really did get what I thought I did workign last night, we're only a few days away from getting a new windows client out the door that doesn't rely on DLLs. But that's a lot of ifs so that's not a promise at the moment. As for your specific issue, unless you're a developer who can debug such issues on your own machine and get back to me with more information about what dll is causing the issues, I think I'd rather concentrate on getting the static binary out ASAP and hopefully avoiding these issues altogether. Feel free to follow along on discord. I'm even bringing another developer on board at least temporarily, so hopefully things will be moving a bit faster too. Thanks again for volunteering, and I'm sorry it's not working for you, we'll see what we can do. |
|
Send message Joined: 9 Jul 20 Posts: 142 Credit: 11,536,204 RAC: 3 |
Your update is much appreciated as is your effort you pour into MLC! Thx for keeping us in the loop! |
|
Send message Joined: 5 Jul 20 Posts: 25 Credit: 348,811 RAC: 0 |
Thanks for the fast reply. If error happens again will post back. EDIT.. I'm not a programmer by profession but a avid tinkerer that know enough programming and debugging skills be dangerous. I just did a quick check on the PC that encountered the error (Win 8.1) with Dependency Walker to see what it might tell us: IT says the following 5 DLL's were not found. API-MS-WIN-CORE-KERNEL32-PRIVATE-L1-1-1.DLL API-MS-WIN-CORE-PRIVATEPROFILE-L1-1-1.DLL API-MS-WIN-SERVICE-PRIVATE-L1-1-1.DLL C10.DLL TORCH_CPU.DLL I could not find the C10.DLL in the project folder but there are 3 other DLL's there: torch.dll-961, torch_cpu.dll-961 and torch_global_deps.dll-961 all 3 have creation dates of Friday, September 25, 2020, 14:46:09. Hope this is of some help.
|
|
Send message Joined: 30 Jun 20 Posts: 462 Credit: 21,406,548 RAC: 0 |
Thanks for this. When boinc runs, it creates a new directory and copies all the files that end with "-961" and the main exe to that new directory, and renames the files to not have the "-961" extension. Then it runs the exe from that new directory, with the DLL's available. You can simulate this manually if you like. I'll admit though that I don't understand what these new "api-ns-win-*" libraries are, I assume they're supposed to be part of core windows. None of this would be a problem if I could convince windows to statically link the client, but I have bad news on that front. I spend 2.5 weeks last month working on it, and couldn't make it work (or rather, I got windows to compile it, but it crashed when it tried to run). PyTorch is a big, complicated beast. That said, We do have an updated windows CPU client available in the "mldstest" work queue. If you'd like, you can try running that, although I'm not sure it'll be any different. So far, after approximately 400+ WUs returned, I'm seeing an 86% success rate.. which tells me there's still some missing dlls we're not shipping that are on most, but not all, systems. |
|
Send message Joined: 5 Jul 20 Posts: 25 Credit: 348,811 RAC: 0 |
Just saw your message about the new TEST app. I am running it now on the Windows 8.1 PC that had the crash. So far 2 tasks have started without any errors. That PC, HOST 377 had another crash earlier today with the default app. Looking at the TASK LIST for the PC it ran 4 default tasks with out error and one task that had a Validate error since the previous crash. Only thing different today was the Monthly "Patch Tuesday" Windows Update and subsequent manual reboot, followed by running CCleaner to clean all the "built up crap" out before restarting BOINC.
|
|
Send message Joined: 5 Jul 20 Posts: 25 Credit: 348,811 RAC: 0 |
Update Appears the TEST APP is not working out 100% of the time either on that PC. 2 TASKS FALED with "195 (0x000000C3) EXIT_CHILD_FAILED" 6594296 6593714 .. and have 2 TEST APP tasks currently in progress running for over 1 hour each so far: 6595197 6593736 CRASH Details from the Windows Error reporting LOG: Faulting application name: mlds.exe, version: 0.0.0.0, time stamp: 0x61135d4d Faulting module name: torch_cpu.dll, version: 0.0.0.0, time stamp: 0x60c3de87 Exception code: 0xc0000005 Fault offset: 0x0000000005d00009 Faulting process id: 0x484 Faulting application start time: 0x01d7900d46609a5e Faulting application path: C:\BOINCData\slots\4\mlds.exe Faulting module path: C:\BOINCData\slots\4\torch_cpu.dll Report Id: 87590a84-fc00-11eb-8363-f80f41b4b3e3 Faulting package full name: Faulting package-relative application ID:
|
|
Send message Joined: 5 Jul 20 Posts: 25 Credit: 348,811 RAC: 0 |
Tried running the Windows test app again of the two runs I tried today: The first one completed successfully and has validated. The second one failed just like the one from a few weeks ago with.Exit status 195 (0x000000C3) EXIT_CHILD_FAILED with the error being thrown by the torch_cpu.dll file. Here is the Windows error report logging from the failed task: Faulting application name: mlds.exe, version: 0.0.0.0, time stamp: 0x61233552 Faulting module name: torch_cpu.dll, version: 0.0.0.0, time stamp: 0x60c3de87 Exception code: 0xc0000005 Fault offset: 0x0000000005d00009 Faulting process id: 0xb14 Faulting application start time: 0x01d799aa9c7386fc Faulting application path: C:\BOINCData\slots\4\mlds.exe Faulting module path: C:\BOINCData\slots\4\torch_cpu.dll Report Id: dfe6bf23-059d-11ec-8365-f80f41b4b3e3 Faulting package full name: Faulting package-relative application ID:
|
|
Send message Joined: 11 Jul 20 Posts: 33 Credit: 1,266,237 RAC: 0 |
Tried running the Windows test app again of the two runs I tried today: That's strange. I'm crunching the new beta app on 2 Win10 pc, without problems |
|
Send message Joined: 9 Jul 20 Posts: 142 Credit: 11,536,204 RAC: 3 |
Same here on my Win10 host. Running just fine. All 20+ WUs ran successfully and validated. Slowly draining my CPU test WUs queue but all is looking fine so far. |
|
Send message Joined: 30 Jun 20 Posts: 462 Credit: 21,406,548 RAC: 0 |
This is so frustrating. The error is intermittent even on the same host, and appears to be happening in a bowels of the pytorch library, so it's not (necessarily) anything we've done in the client. And you're sure this system runs reliably with other intense projects? I don't doubt it, but I can't rule out a cache or memory issue endemic to your system either. That said, there are other scattered reports of rare intermittent crashes on other windows systems, but it seems to be on >1% of WUs. Hard to know what the issue is. I really wanted to statically link the new client, but that just wasn't happening on windows. I wish I had a better answer than I don't know, but that's all I have for you for now. Oddly the linux client seems to be a bit more unstable too.. maybe this is a pytorch v1.9 issue? Can you check the boinc client logs and see if the program has suspended/resumed just before crashing? We *did* change some code around that. You can simulate this by running a WU, then suspending it in the gui, and then resuming it. |
|
Send message Joined: 5 Jul 20 Posts: 25 Credit: 348,811 RAC: 0 |
This is so frustrating. The error is intermittent even on the same host, and appears to be happening in a bowels of the pytorch library, so it's not (necessarily) anything we've done in the client. And you're sure this system runs reliably with other intense projects? I don't doubt it, but I can't rule out a cache or memory issue endemic to your system either. NO MEMORY PROBLEMS. This is the only project it has any problems with. Run the most demanding PrimeGrid.com task with no errors. That said, there are other scattered reports of rare intermittent crashes on other windows systems, but it seems to be on >1% of WUs. Hard to know what the issue is. I really wanted to statically link the new client, but that just wasn't happening on windows. I wish I had a better answer than I don't know, but that's all I have for you for now. Oddly the linux client seems to be a bit more unstable too.. maybe this is a pytorch v1.9 issue? Anything could be possible. Not your fault. Maybe someone should report it to the pytorch developers as a FYI Can you check the boinc client logs and see if the program has suspended/resumed just before crashing? We *did* change some code around that. You can simulate this by running a WU, then suspending it in the gui, and then resuming it.[/quote] Interesting -- The Most Recent Failed Task has just started up seconds after the previous one completed in a different slot. Hmm... maybe a linking error and it was looking in old slot for the dll?
|
|
Send message Joined: 27 Sep 20 Posts: 6 Credit: 1,025,663 RAC: 0 |
Update Appears the TEST APP is not working out 100% of the time either on that PC. I just trashed nearly 40 WUs. With same error on Windows 10. |
©2022 MLC@Home Team
A project of the Cognition, Robotics, and Learning (CORAL) Lab at the University of Maryland, Baltimore County (UMBC)