[TWIM Notes] Aug 31 2020

Message boards : News : [TWIM Notes] Aug 31 2020
Message board moderation

To post messages, you must log in.

AuthorMessage
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 248
Credit: 3,363,401
RAC: 19,487
Message 442 - Posted: 1 Sep 2020, 2:09:13 UTC

This Week in MLC@Home
Notes for Aug 31 2020
A weekly summary of news and notes for MLC@Home

The major news this week is that we tweaked the priority a little bit, which allowed a lot of progress on the simpler workunits, and as you'll notice 6 of the 10 network types in Datasets 1 and 2 have reached their goal of 10000 samples! This is a big milestone for the project, and is getting us closer to releasing the first official dataset from this effort.

News:

  • A slightly misconfigured priority setting was causing the server to favor sending out EightBit* and Parity* WUs over other WUs. This lead to some users experiencing a backlog of "validation pending" . That issue was identified and fixed this week, which lead to a nice bump in granted (vs pending) credit, and and a nice jump in the completion numbers for the other machines. Note that the Parity and EightBit work is still needed, it just was going to take longer to reflect in the stats.
  • First Dataset 3 WUs rolling out later tonight or tomorrow. It's taken a bit longer than planned due to time constraints outside of this project.
  • MLDS v9.5x clients have stabilized, especially on windows. Lots of time this week spend investigating why older AMD and Intel machines had issues on Linux, which turned out to be related PyTorch v1.6 using MKL-DNN (now oneAPI) and hardcoding sse4 as a minimum requirement. Updated clients due within a day or so that fix that problem.
  • We'll do an official release of a preliminary dataset once we have 1000 examples of each machine type, and we're getting closer!
  • New server is still in the works.



Project status snapshot:

Tasks
Tasks ready to send 19149
Tasks in progress 20125
Users
With credit 596
Registered in past 24 hours 30
Hosts
With recent credit 1806
Registered in past 24 hours 58
Current GigaFLOPS 28292.26

Dataset 1 and 2 progress:

SingleDirectMachine      10002/10004
EightBitMachine           9801/10006
SingleInvertMachine      10000/10003
SimpleXORMachine         10000/10002
ParityMachine              484/10005
ParityModified              70/10005
EightBitModified          2971/10006
SimpleXORModified        10005/10005
SingleDirectModified     10004/10004
SingleInvertModified     10002/10002 


Last week's TWIM Notes: Aug 23 2020

Thanks again to all our volunteers!

-- The MLC@Home Admins
ID: 442 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[VENETO] boboviz

Send message
Joined: 11 Jul 20
Posts: 18
Credit: 274,224
RAC: 1,544
Message 443 - Posted: 1 Sep 2020, 13:23:01 UTC - in response to Message 442.  
Last modified: 1 Sep 2020, 13:23:27 UTC

Lots of time this week spend investigating why older AMD and Intel machines had issues on Linux, which turned out to be related PyTorch v1.6 using MKL-DNN (now oneAPI) and hardcoding sse4 as a minimum requirement.

Do you think, before or later, to introduce an optimized app for cpu (with sse2, for example)?
ID: 443 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 248
Credit: 3,363,401
RAC: 19,487
Message 444 - Posted: 1 Sep 2020, 14:16:03 UTC - in response to Message 443.  
Last modified: 1 Sep 2020, 15:21:29 UTC

So the app already uses OpenBLAS and FBGEMM as the core for its algorithms, both of which dynamically check CPU capabilities at runtime and uses code paths that are fastest on that CPU. So if you have AVX512, it's using AVX512, if you have AVX2, it uses that, if you have sse4, it uses that, etc. So there's no need for a specific "sse2" or "avx2" version of the app, you're already using the best capabilities of your CPU.

By default, the newest release of PyTorch, which v9.5x uses, *also* uses MKLDNN, a library from Intel that's specifically made to speed up neural networks on Intel CPUs. However, this library isn't as good at dynamically checking and adapting, and (by default) is compiled to require sse4. First, we tried changing the compiler settings to only require sse2, but that didn't work. There is little to no performance penalty for running without MKLDNN (intel now calls this lib oneAPI, after changing the name 3 times in the past year) for our current WUs (and likely future networks), so we've disabled MKLDNN at this time. And if we did eventually have WUs big enough to benefit from MKLDNN, then we'd benefit more from rolling out GPU support.

Besides, my personal bias is against using Intel-specific libraries. Intel has a history of only optimizing all their libraries (including MKL and MKLDNN) for Intel processors, to the detriment of AMD processors or even older Intel processors. Both of which make up a significant portion of our contributor base.
ID: 444 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[VENETO] boboviz

Send message
Joined: 11 Jul 20
Posts: 18
Credit: 274,224
RAC: 1,544
Message 445 - Posted: 1 Sep 2020, 15:35:04 UTC - in response to Message 444.  

Thanks for the answer, is very interesting!!
ID: 445 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bozz4science

Send message
Joined: 9 Jul 20
Posts: 58
Credit: 194,935
RAC: 2,144
Message 446 - Posted: 1 Sep 2020, 15:40:53 UTC - in response to Message 444.  

Thanks for the input and clarification! Personally, I think this it a great summary of the ongoing optimization/app issues. Maybe in the future you can just point people towards this post. :)

Looking forward to the rollout of dataset 3 and the exciting possibility of more complex use cases that would maybe even require a GPU supported app version.

Wish you a great week
ID: 446 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : News : [TWIM Notes] Aug 31 2020

©2020 MLC@Home Team
A project of the Cognition, Robotics, and Learning (CORAL) Lab at the University of Maryland, Baltimore County (UMBC)