Updated CPU client 9.9x release and issues

Message boards : News : Updated CPU client 9.9x release and issues
Message board moderation

To post messages, you must log in.

AuthorMessage
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 462
Credit: 21,406,548
RAC: 0
Message 1347 - Posted: 28 Aug 2021, 6:13:16 UTC

Earlier this week, we released the latest v9.90 CPU client after almost 3 weeks
of testing. While it initially seemed to be working fine, a number of errors started
accumulating over the last 24 hours. We've identified a server configuration issue
and believe it is now fixed as of 6AM UTC today. The server was generating
invalid WUs for the MLDS queue. We've cancelled all of the problematic WUs
and are adding new ones to the main queue. The GPU clients and MLDSTEST
queue remained unaffected.

v9.90 is an important release for MLDS, as it contains support for CNNs and Dense
feed forward network types needed for DS4. Highlights include:


    - Statically linked binary for Linux (no more AppImage)
    - DS4 support! (CNN and Dense networks)
    - Better NaN handling
    - Update to libTorch 1.9
    - Wrapper instead of BOINC native API



Our apologies for the bumpy rollout, but if you've had issues with computation errors
earlier this week, please retry at your earliest convenience, and post any issues
to our forums or our Discord server.

ID: 1347 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 462
Credit: 21,406,548
RAC: 0
Message 1348 - Posted: 28 Aug 2021, 6:44:01 UTC

This fix will not solve ALL oustanding issues, but it should help with:

* Computation errors with no output in the logs that became prevalent in the last 24 hours (the 24h failure rate jumped from 1% to 80% over the past two days)
* The memory limit has been set back to 800MB, as originally intended. It turned out that was not the issue.

There's still known issues with DLL issues on windows, at least one report of a crash involving a file already existing that shouldn't exist, and one crash on an odroid (arm) system. Please keep reporting these issues and we'll tackle them as we can. I want to re-assure anyone experiencing those that we're not ignoring you at all.

Thanks for volunteering your compute time!
ID: 1348 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bozz4science

Send message
Joined: 9 Jul 20
Posts: 142
Credit: 11,536,204
RAC: 3
Message 1349 - Posted: 28 Aug 2021, 9:20:49 UTC - in response to Message 1347.  

Awesome news! I do know from personal experience how frustrating it can be to suddenly start seeing errors after weeks of coding and testing. I applaud your continued commitment and am especially excited for the DS4 support and the future science we can help to inform with new data sets and network types. Also great to see better NaN handling incorporated into version 9.90+ as this often resulted in the GPU WUs crunching through 100s of epochs, just carrying through the NaN while the erroneous result was only all too apparent when inspecting the WU log afterwards.

Thanks for all your work!
ID: 1349 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
UBT - wbiz

Send message
Joined: 2 May 21
Posts: 9
Credit: 2,016,461
RAC: 2
Message 1355 - Posted: 31 Aug 2021, 5:45:10 UTC

I'm getting no wu's for aarch64-unknown-linux-gnu

I'm not intending to be impatient, just concerned in case you are unaware.
ID: 1355 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
UBT - wbiz

Send message
Joined: 2 May 21
Posts: 9
Credit: 2,016,461
RAC: 2
Message 1356 - Posted: 1 Sep 2021, 12:40:53 UTC - in response to Message 1355.  
Last modified: 1 Sep 2021, 12:49:50 UTC

I'm getting no wu's for aarch64-unknown-linux-gnu

I'm not intending to be impatient, just concerned in case you are unaware.


All good now, thanks

Edit: picking 32-bit arm tasks up.
ID: 1356 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : News : Updated CPU client 9.9x release and issues

©2024 MLC@Home Team
A project of the Cognition, Robotics, and Learning (CORAL) Lab at the University of Maryland, Baltimore County (UMBC)