Exit status -1073741515 (0xC0000135) STATUS_DLL_NOT_FOUND

Questions and Answers : Windows : Exit status -1073741515 (0xC0000135) STATUS_DLL_NOT_FOUND
Message board moderation

To post messages, you must log in.

AuthorMessage
Dr Who Fan
Avatar

Send message
Joined: 5 Jul 20
Posts: 25
Credit: 348,811
RAC: 0
Message 1289 - Posted: 26 Jul 2021, 21:06:32 UTC

Just a fyi... A task #6071628 tried to run on one of my PC's and crashed upon startup with Exit status -1073741515 (0xC0000135) STATUS_DLL_NOT_FOUND.
In the tasks STDERR output was "(unknown error) - exit code 3221225781 (0xc0000135)".

Never had this happen before on that PC. It is running Windows 8.1 with all the latest patches/updates.
If it happens again I'll post back in this topic.

ID: 1289 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
wolfman1360

Send message
Joined: 7 Jul 20
Posts: 23
Credit: 39,708,780
RAC: 358
Message 1295 - Posted: 29 Jul 2021, 18:17:50 UTC

I'm having something similar on a Windows server 2008 machine.
https://www.mlcathome.org/mlcathome/result.php?resultid=6117764
Do these tasks require any prerequisites on older windows versions?
thanks
ID: 1295 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dr Who Fan
Avatar

Send message
Joined: 5 Jul 20
Posts: 25
Credit: 348,811
RAC: 0
Message 1296 - Posted: 29 Jul 2021, 20:14:16 UTC

I have sent a PM to pianoman [MLC@Home Admin], asking for a look at the error message.

ID: 1296 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 462
Credit: 21,406,548
RAC: 0
Message 1297 - Posted: 29 Jul 2021, 21:16:29 UTC

The problem is that Windows help messages in this case are incredibly unhelpful. I have no way of debugging. The problem is with a DLL somewhere, a dynamically loaded library on the system, is either not there or the wrong version (either too new or too old), or the one on the system conflicts with the one we ship. We don't know which DLL is causing issues or why, just "there's an issue somewhere".

However, I *am* trying to fix this a different way. The client is currently linked dynamically, and we ship the DLLs needed for it to run with the client in the same directory. It would be better to link it statically (bundle all the needed libraries into a single exe). Until a few months ago pytorch wouldn't even compile staticlly, but the current new Linux client in mlds-test, with DS4 support, is now statically compiled and seems to work fine (woohoo!)

I've spent the past 2 weeks trying to get the windows version of pytorch to compile statically, and just last night I finally got it at least compiling (not linking yet, but compiling at least. Once I incorporate that into the build, hopefully all these DLL problems will go away.

The problem is this is very time intensive, as nothing seems to support this out of the box. I have to create the compile commands by hand through trial and error until they work, then convert them back into cmake to be repeatable.

The good news is that, if I really did get what I thought I did workign last night, we're only a few days away from getting a new windows client out the door that doesn't rely on DLLs. But that's a lot of ifs so that's not a promise at the moment.

As for your specific issue, unless you're a developer who can debug such issues on your own machine and get back to me with more information about what dll is causing the issues, I think I'd rather concentrate on getting the static binary out ASAP and hopefully avoiding these issues altogether.

Feel free to follow along on discord. I'm even bringing another developer on board at least temporarily, so hopefully things will be moving a bit faster too.

Thanks again for volunteering, and I'm sorry it's not working for you, we'll see what we can do.
ID: 1297 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bozz4science

Send message
Joined: 9 Jul 20
Posts: 142
Credit: 11,536,204
RAC: 3
Message 1300 - Posted: 29 Jul 2021, 21:27:36 UTC

Your update is much appreciated as is your effort you pour into MLC! Thx for keeping us in the loop!
ID: 1300 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dr Who Fan
Avatar

Send message
Joined: 5 Jul 20
Posts: 25
Credit: 348,811
RAC: 0
Message 1301 - Posted: 30 Jul 2021, 1:13:24 UTC - in response to Message 1297.  
Last modified: 30 Jul 2021, 1:43:59 UTC

Thanks for the fast reply. If error happens again will post back.

EDIT.. I'm not a programmer by profession but a avid tinkerer that know enough programming and debugging skills be dangerous.

I just did a quick check on the PC that encountered the error (Win 8.1) with Dependency Walker to see what it might tell us:
IT says the following 5 DLL's were not found.
API-MS-WIN-CORE-KERNEL32-PRIVATE-L1-1-1.DLL
API-MS-WIN-CORE-PRIVATEPROFILE-L1-1-1.DLL
API-MS-WIN-SERVICE-PRIVATE-L1-1-1.DLL
C10.DLL
TORCH_CPU.DLL

I could not find the C10.DLL in the project folder but there are 3 other DLL's there: torch.dll-961, torch_cpu.dll-961 and torch_global_deps.dll-961 all 3 have creation dates of ‎Friday, ‎September ‎25, ‎2020, ‏‎14:46:09.

Hope this is of some help.

ID: 1301 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 462
Credit: 21,406,548
RAC: 0
Message 1312 - Posted: 11 Aug 2021, 14:47:45 UTC - in response to Message 1301.  

Thanks for this.

When boinc runs, it creates a new directory and copies all the files that end with "-961" and the main exe to that new directory, and renames the files to not have the "-961" extension. Then it runs the exe from that new directory, with the DLL's available. You can simulate this manually if you like. I'll admit though that I don't understand what these new "api-ns-win-*" libraries are, I assume they're supposed to be part of core windows.

None of this would be a problem if I could convince windows to statically link the client, but I have bad news on that front. I spend 2.5 weeks last month working on it, and couldn't make it work (or rather, I got windows to compile it, but it crashed when it tried to run). PyTorch is a big, complicated beast.

That said, We do have an updated windows CPU client available in the "mldstest" work queue. If you'd like, you can try running that, although I'm not sure it'll be any different. So far, after approximately 400+ WUs returned, I'm seeing an 86% success rate.. which tells me there's still some missing dlls we're not shipping that are on most, but not all, systems.
ID: 1312 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dr Who Fan
Avatar

Send message
Joined: 5 Jul 20
Posts: 25
Credit: 348,811
RAC: 0
Message 1321 - Posted: 13 Aug 2021, 7:01:40 UTC - in response to Message 1312.  

Just saw your message about the new TEST app. I am running it now on the Windows 8.1 PC that had the crash. So far 2 tasks have started without any errors.

That PC, HOST 377 had another crash earlier today with the default app. Looking at the TASK LIST for the PC it ran 4 default tasks with out error and one task that had a Validate error since the previous crash.
Only thing different today was the Monthly "Patch Tuesday" Windows Update and subsequent manual reboot, followed by running CCleaner to clean all the "built up crap" out before restarting BOINC.

ID: 1321 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dr Who Fan
Avatar

Send message
Joined: 5 Jul 20
Posts: 25
Credit: 348,811
RAC: 0
Message 1322 - Posted: 13 Aug 2021, 7:18:25 UTC - in response to Message 1321.  

Update Appears the TEST APP is not working out 100% of the time either on that PC.
2 TASKS FALED with "195 (0x000000C3) EXIT_CHILD_FAILED"
6594296
6593714

.. and have 2 TEST APP tasks currently in progress running for over 1 hour each so far:
6595197
6593736

CRASH Details from the Windows Error reporting LOG:
Faulting application name: mlds.exe, version: 0.0.0.0, time stamp: 0x61135d4d
Faulting module name: torch_cpu.dll, version: 0.0.0.0, time stamp: 0x60c3de87

Exception code: 0xc0000005
Fault offset: 0x0000000005d00009
Faulting process id: 0x484
Faulting application start time: 0x01d7900d46609a5e
Faulting application path: C:\BOINCData\slots\4\mlds.exe
Faulting module path: C:\BOINCData\slots\4\torch_cpu.dll
Report Id: 87590a84-fc00-11eb-8363-f80f41b4b3e3
Faulting package full name:
Faulting package-relative application ID:

ID: 1322 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dr Who Fan
Avatar

Send message
Joined: 5 Jul 20
Posts: 25
Credit: 348,811
RAC: 0
Message 1332 - Posted: 25 Aug 2021, 13:13:07 UTC

Tried running the Windows test app again of the two runs I tried today:

The first one completed successfully and has validated.
The second one failed just like the one from a few weeks ago with.Exit status 195 (0x000000C3) EXIT_CHILD_FAILED with the error being thrown by the torch_cpu.dll file.

Here is the Windows error report logging from the failed task:
Faulting application name: mlds.exe, version: 0.0.0.0, time stamp: 0x61233552
Faulting module name: torch_cpu.dll, version: 0.0.0.0, time stamp: 0x60c3de87
Exception code: 0xc0000005
Fault offset: 0x0000000005d00009
Faulting process id: 0xb14
Faulting application start time: 0x01d799aa9c7386fc
Faulting application path: C:\BOINCData\slots\4\mlds.exe
Faulting module path: C:\BOINCData\slots\4\torch_cpu.dll
Report Id: dfe6bf23-059d-11ec-8365-f80f41b4b3e3
Faulting package full name:
Faulting package-relative application ID:

ID: 1332 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[VENETO] boboviz

Send message
Joined: 11 Jul 20
Posts: 33
Credit: 1,266,237
RAC: 0
Message 1333 - Posted: 25 Aug 2021, 14:55:19 UTC - in response to Message 1332.  

Tried running the Windows test app again of the two runs I tried today:
The first one completed successfully and has validated.
The second one failed just like the one from a few weeks ago with.


That's strange.
I'm crunching the new beta app on 2 Win10 pc, without problems
ID: 1333 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bozz4science

Send message
Joined: 9 Jul 20
Posts: 142
Credit: 11,536,204
RAC: 3
Message 1335 - Posted: 25 Aug 2021, 15:13:58 UTC

Same here on my Win10 host. Running just fine. All 20+ WUs ran successfully and validated. Slowly draining my CPU test WUs queue but all is looking fine so far.
ID: 1335 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 462
Credit: 21,406,548
RAC: 0
Message 1338 - Posted: 25 Aug 2021, 20:10:52 UTC

This is so frustrating. The error is intermittent even on the same host, and appears to be happening in a bowels of the pytorch library, so it's not (necessarily) anything we've done in the client. And you're sure this system runs reliably with other intense projects? I don't doubt it, but I can't rule out a cache or memory issue endemic to your system either.

That said, there are other scattered reports of rare intermittent crashes on other windows systems, but it seems to be on >1% of WUs. Hard to know what the issue is.
I really wanted to statically link the new client, but that just wasn't happening on windows.

I wish I had a better answer than I don't know, but that's all I have for you for now.

Oddly the linux client seems to be a bit more unstable too.. maybe this is a pytorch v1.9 issue?

Can you check the boinc client logs and see if the program has suspended/resumed just before crashing? We *did* change some code around that. You can simulate this by running a WU, then suspending it in the gui, and then resuming it.
ID: 1338 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dr Who Fan
Avatar

Send message
Joined: 5 Jul 20
Posts: 25
Credit: 348,811
RAC: 0
Message 1339 - Posted: 25 Aug 2021, 21:02:04 UTC - in response to Message 1338.  
Last modified: 25 Aug 2021, 21:03:18 UTC

This is so frustrating. The error is intermittent even on the same host, and appears to be happening in a bowels of the pytorch library, so it's not (necessarily) anything we've done in the client. And you're sure this system runs reliably with other intense projects? I don't doubt it, but I can't rule out a cache or memory issue endemic to your system either.

NO MEMORY PROBLEMS. This is the only project it has any problems with. Run the most demanding PrimeGrid.com task with no errors.

That said, there are other scattered reports of rare intermittent crashes on other windows systems, but it seems to be on >1% of WUs. Hard to know what the issue is.
I really wanted to statically link the new client, but that just wasn't happening on windows.

I wish I had a better answer than I don't know, but that's all I have for you for now.

Oddly the linux client seems to be a bit more unstable too.. maybe this is a pytorch v1.9 issue?

Anything could be possible. Not your fault. Maybe someone should report it to the pytorch developers as a FYI

Can you check the boinc client logs and see if the program has suspended/resumed just before crashing? We *did* change some code around that. You can simulate this by running a WU, then suspending it in the gui, and then resuming it.[/quote]
Interesting -- The Most Recent Failed Task has just started up seconds after the previous one completed in a different slot. Hmm... maybe a linking error and it was looking in old slot for the dll?

ID: 1339 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Hal Bregg

Send message
Joined: 27 Sep 20
Posts: 6
Credit: 1,025,663
RAC: 0
Message 1345 - Posted: 27 Aug 2021, 15:20:49 UTC - in response to Message 1322.  

Update Appears the TEST APP is not working out 100% of the time either on that PC.
2 TASKS FALED with "195 (0x000000C3) EXIT_CHILD_FAILED"
6594296
6593714

.. and have 2 TEST APP tasks currently in progress running for over 1 hour each so far:
6595197
6593736

CRASH Details from the Windows Error reporting LOG:
Faulting application name: mlds.exe, version: 0.0.0.0, time stamp: 0x61135d4d
Faulting module name: torch_cpu.dll, version: 0.0.0.0, time stamp: 0x60c3de87

Exception code: 0xc0000005
Fault offset: 0x0000000005d00009
Faulting process id: 0x484
Faulting application start time: 0x01d7900d46609a5e
Faulting application path: C:\BOINCData\slots\4\mlds.exe
Faulting module path: C:\BOINCData\slots\4\torch_cpu.dll
Report Id: 87590a84-fc00-11eb-8363-f80f41b4b3e3
Faulting package full name:
Faulting package-relative application ID:


I just trashed nearly 40 WUs. With same error on Windows 10.
ID: 1345 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Questions and Answers : Windows : Exit status -1073741515 (0xC0000135) STATUS_DLL_NOT_FOUND

©2022 MLC@Home Team
A project of the Cognition, Robotics, and Learning (CORAL) Lab at the University of Maryland, Baltimore County (UMBC)