[TMIM Notes] Aug 6 2021

Message boards : News : [TMIM Notes] Aug 6 2021
Message board moderation

To post messages, you must log in.

AuthorMessage
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 462
Credit: 21,406,548
RAC: 0
Message 1306 - Posted: 6 Aug 2021, 20:48:04 UTC

This Month in MLC@Home
Notes for Aug 6 2021
A monthly summary of news and notes for MLC@Home

Summary
Another month of good progress on MLC! First, this past month saw the completion of DS3! You have trained over 1,000,000 neural networks for DS3, which is a huge accomplishment. We're continuing to bundle and evaluate the dataset, so look for a complete public release shortly.

We also spend some time on the backend preparing for DS4. We've updated the website to show DS4 progress, but haven't sent any DS4 WUs yet. Instead we spent the bulk of the month trying to get the new client to work under Windows, which hasn't been going well. We spent a good 2.5 weeks trying to get pytorch and the client to compile (and run) statically on windows. Even though it now compiles, the client crashes when running. So last week we switched back to linking dynamically, and want to get an updated windows client out this weekend. The Linux/CPU version of the new client appears to be performing fantastically, so thanks to everyone who ran WUs from the "mldstest" queue tested!

DS4 WUs are incompatible with the older client, so we'll only release DS4 WUs as the new (v9.9x) client become available for each platform. This means CPUs first. GPUs will continue to work on finishing up DS1 and DS2.

Speaking of DS1/DS2, we're approaching the end of DS1 with only a few more weeks to go. and when we complete those networks we'll switch to DS2 to finish those up as well.

So, lots of movement this month behind the scenes, and great progress on the existing datasets. If *any* windows developers would like to help us out getting the new windows CPU client out the door, please contact us directly, we could use the help.

Other News

  • GPU and CPU queues are stacked with DS1/DS2 WUs until those completes and/or until DS4 is ready. The CPU queue will transition to DS4 first, as we assume GPU builds will be even more of a headache than the CPU ones have been.
  • Because existing WUs are incompatible with the new client, we've been keeping the CPU and GPU queues a little less full than we have in the past, because when the new client comes out, we don't want to have to cancel a bunch of existing WUs to be replaced with new ones. Unfortunately, since the updated windows CPU client is taking a longer than planned, we've run out of WUs a few times in the past month. We're trying to stay on top of this, and automating the process to keep them from running completely dry in the future.
  • We have a new developer who has joined the team, who is working on more graceful handling of NaN errors, which has been an issue for a long time. If that's ready before the windows CPU client is ready, then that fix will be in the next release. You can see more in the #devel channel on Discord, or the issue on gitlab.
  • Reminder: the MLC client is open source, and has an issues list at gitlab. If you're a programmer or data scientist and want to help, feel free to look over the issues and submit a pull request.



Project status snapshot:
(note these numbers are approximations)






Last month's TMIM Notes: Jul 1 2021

Thanks again to all our volunteers!

-- The MLC@Home Admins(s)
Homepage: https://www.mlcathome.org/
Discord invite: https://discord.gg/BdE4PGpX2y
Twitter: @MLCHome2

ID: 1306 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bozz4science

Send message
Joined: 9 Jul 20
Posts: 142
Credit: 11,536,204
RAC: 3
Message 1311 - Posted: 10 Aug 2021, 10:17:02 UTC
Last modified: 10 Aug 2021, 10:17:40 UTC

Amazing news! Thanks for keeping us up to date :)

Sorry to hear that the client development for Windows is still troubling you, but glad that you are joining forces with another developer. Is he/her a volunteer or affiliated to UMBC?

Do you have any insights into how runtimes will compare across DS4 vs. DS1/2 CPU WUs so far?

We'll help you push DS1 + 2 over the finish line soon :)
ID: 1311 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 462
Credit: 21,406,548
RAC: 0
Message 1314 - Posted: 11 Aug 2021, 15:07:30 UTC - in response to Message 1311.  

The developer is actually Delta, who does the BOINC Radio podcast.

DS4 WUs will likely take a little shorter time that DS1/DS2 initially, but we'll be tweaking those (and balancing credit). Runs on my test machine show DS4 WUs using the CNN complete an epoch in 11 seconds, versus DS1's RNNs which take 22s/epoch. Then the question becomes scaling the number of epochs we need to run to get good results, which I think will be more than what we do for DS1, but probably not twice as many. It's a balancing act that the data will drive, so expect a little bit of flux at the beginning. Also, a "DS4 WU" might not be uniform. Those numbers are using a simple CNN on MNIST-like datasets (black-and-white images), but we'd also like to do CIFAR (24-bit color), which will be more complex and drive up runtimes (and credits)

On the bright side, we released the new windows CPU client in mldstest last night, and at least one user is stating they're seeing a nice speedup with DS1 WUs.
ID: 1314 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : News : [TMIM Notes] Aug 6 2021

©2024 MLC@Home Team
A project of the Cognition, Robotics, and Learning (CORAL) Lab at the University of Maryland, Baltimore County (UMBC)