[TWIM Notes] Aug 23 2020

Message boards : News : [TWIM Notes] Aug 23 2020
Message board moderation

To post messages, you must log in.

AuthorMessage
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 199
Credit: 2,735,829
RAC: 38,126
Message 395 - Posted: 24 Aug 2020, 3:20:27 UTC

This Week in MLC@Home
Notes for Aug 23 2020
A weekly summary of news and notes for MLC@Home

This week saw the rollout of the v9.5x updated series of clients in preparation for Dataset 3 WUs. Unfortunately the rollout wasn't as smooth as we planned, so most of this week was spent on engineering issues with the client and not on the science. That should change this week, however!

News:

  • MLDS v.95x clients released (see announcement for changelog)
  • There were initial issues server-side and the updated windows client, so it was released later than the linux clients. Then multiple issues were uncovered with the new build process not including all needed DLLs and only targeting Windows 10. While some hosts are still reporting failures, most windows issues should be cleared up with the newly released v9.55 app. If you have further issues, please post to the discussion thread
  • New Linux clients also experienced some issues. While some users are reporting a nice speedup over v9.20, others have WUs exiting with "signal 4" (illegal instruction) with the new build. This is related to the old version of OpenBLAS bundled with the binary mis-detecting CPU features on some hosts. We're working on a fix, but in the meantime, a workaround is available as detailed in this discussion.
  • We're hoping to get dataset 3 WUs moving this week as these issues with the new client app are worked out.
  • Progress on datasets 1 and 2 continues unabated
  • No updates on the new server



Project status snapshot:

Tasks
Tasks ready to send 27775
Tasks in progress 22336
Users
With credit 543
Registered in past 24 hours 11
Hosts
With recent credit 1657
Registered in past 24 hours 20
Current GigaFLOPS 23517.82

Dataset 1 and 2 progress:

SingleDirectMachine       9682/10004
EightBitMachine           9635/10006
SingleInvertMachine       9686/10003
SimpleXORMachine          9648/10002
ParityMachine              417/10005
ParityModified              54/10005
EightBitModified          2302/10006
SimpleXORModified         6749/10005
SingleDirectModified      6717/10004
SingleInvertModified      6797/10002 


Last week's TWIM Notes: Aug16 2020

Thanks again to all our volunteers!

-- The MLC@Home Admins
ID: 395 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bozz4science

Send message
Joined: 9 Jul 20
Posts: 30
Credit: 146,640
RAC: 1,513
Message 398 - Posted: 25 Aug 2020, 8:07:56 UTC

Thanks for the weekly update and thanks for fixing the bugs with the new version. Still need to test the new win app to verify the speedup over 9.20.

Just wanted to ask about potential plans to provide a GPU version/support for your application. Apparently that kind of slipped through my radar, but just caught my eye on the main homepage https://www.mlcathome.org where you included it in the section "planned support". Would the GPU version mainly be targeted to those more complex RNN/CNN training sets? OpenCL or Cuda? Thanks
ID: 398 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dataman
Avatar

Send message
Joined: 1 Jul 20
Posts: 8
Credit: 6,768,770
RAC: 249,778
Message 399 - Posted: 25 Aug 2020, 15:26:16 UTC

These updates are great. Kudos! Can you teach a class to Administrators of other projects to show them how to communicate with their volunteers? LOL

Cheers.

ID: 399 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 199
Credit: 2,735,829
RAC: 38,126
Message 400 - Posted: 25 Aug 2020, 15:35:46 UTC - in response to Message 398.  

Thanks for the weekly update and thanks for fixing the bugs with the new version. Still need to test the new win app to verify the speedup over 9.20.

Just wanted to ask about potential plans to provide a GPU version/support for your application. Apparently that kind of slipped through my radar, but just caught my eye on the main homepage https://www.mlcathome.org where you included it in the section "planned support". Would the GPU version mainly be targeted to those more complex RNN/CNN training sets? OpenCL or Cuda? Thanks


GPU version would be targeted at more complex CNN tasks, and would be limited to what PyTorch supports, which is CUDA (nvidia, v9 and up) on windows and linux, and ROCm (amd, big polaris and vega only) on linux. But there will always be the option on CPU, even for the bigger workunits. GPU support is just an optimization for those that have it.

However, given the rocky release of the new client, I'm not looking forward to rolling out such a big change, especially when i don't have an nvidia box to test on. Perhaps we should set up a mlds-betatest project/(or "app") separate from the main one. Hmm.
ID: 400 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 12 Jul 20
Posts: 13
Credit: 3,583,335
RAC: 49,022
Message 401 - Posted: 25 Aug 2020, 15:42:13 UTC - in response to Message 400.  

Perhaps we should set up a mlds-betatest project/(or "app") separate from the main one. Hmm.

I can always supply at least one Nvidia (GTX 1060 or above) under Ubuntu 18.04 for testing and use. I need CUDA projects.
ID: 401 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bozz4science

Send message
Joined: 9 Jul 20
Posts: 30
Credit: 146,640
RAC: 1,513
Message 404 - Posted: 25 Aug 2020, 16:37:12 UTC

Lovely to hear that! Already imagined it would only make sense for the more complex training use cases. Definitely interesting to me to see if we could benefit from a GPU application down the road for dataset 4, 5.... ;) For my part, all I could do to help the development, would also be to offer a GPU for testing as well. I could go down as low as a GTX 750Ti if that would be of interest to you at a later stage.

IMO setting up a separate app for testing beta applications sounds like the rational way to address this "rocky" rollout experience you mentioned that you want to avoid in the near future. Every volunteer could then opt in/out in his/her respective computing preferences, and those willing to help the development process I guess, would also accept higher error rates in testing.

Just my 2 cents:
I believe that to be an improvement over the current testing process with a "live" application already deployed on all volunteers' machines, as you are not juggling just a handful of willing volunteers but rather everyone resulting in this stressful development environment. Additionally, I would expect the forum to a) not be overwhelmed by messages, b) be easier to manage/moderate with only 1 respective thread/subforum for beta testing and c) not lose the important take-aways within the resulting forum chaos and messages then have to be repeated elsewhere to divert the attention back to the main issue. Beta-testers could then more easily direct their issues in this respective forum and report back errors to you. And in the end you could still award beta testers for their support with a beta-test participant badge :)

Maybe there are some worthwhile ideas hidden herein. Thanks anyway for consideration! Great work so far!
ID: 404 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : News : [TWIM Notes] Aug 23 2020

©2020 MLC@Home Team
A project of the Cognition, Robotics, and Learning (CORAL) Lab at the University of Maryland, Baltimore County (UMBC)