[TWIM Notes] Oct 27 2020

Message boards : News : [TWIM Notes] Oct 27 2020
Message board moderation

To post messages, you must log in.

AuthorMessage
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 446
Credit: 14,122,724
RAC: 36,468
Message 723 - Posted: 28 Oct 2020, 4:38:01 UTC

This Week in MLC@Home
Notes for Oct 27 2020
A weekly summary of news and notes for MLC@Home

Summary
This week we crossed the 1000 users-with-credit threshold! Thanks again for all our volunteers!

GPU week, part 1. This week was consumed with developing and releasing to testing CUDA clients. The good news, when they work, they do provide a nice performance boost. The bad news, there are a lot of kinks to work out yet. Currently, we've released a windows/CUDA and linux/CUDA binary. The windows binary works if the user has the right environment (cuda 10.2 is known to work). The linux/CUDA binary was just released 48 hours ago and currently is broken in new and surprising ways not seen in our internal testing. Both apps will need some server-side changes to only allow hosts that can meet these minimum requirements. Luckily, that's why we have the "mldstest" application to find these issues before we release to the main channel!

Also, the GPU apps are much larger than our current CPU app, and have wildly different resource requirements. So much so, that we will likely be releasing them as a separate app to run alongside the CPU apps with their own WUs. That way we can isolate GPU WUs from CPU WUs and keep everyone happy and crunching.

If you're interested in testing, please make sure you have "Run test applications?" and "Use NVIDIA GPU?" checked in your project preferences, and follow/post your experience in the forum.

News:

  • We expect to have most CUDA issues ironed out and have them in general (non-beta) use by next week. ROCm support would be next, but it is a lower priority.
  • Datasets 1,2 and 3 continue crunching away. GREAT progress so far!
  • Tweaks to internal flops specifications for WUs are leading the client to overestimate how long a WU will take to complete (some have estimates of days to complete, despite being the same WUs as before that should take 4-10 hours). We believe these should even out over time, but if this continues to be an issue we'll back that estimate back down again on new WUs.
  • With the development/release of the GPU client, not much progress was made last week on other parts of the project such as preparing a DS3 100x100 for release, a related paper for arXiv, dataset 4, etc.. Those of course remain important, but there is a finite amount of developer time and we've chosen to prioritize getting the GPU apps ready for the next week or so to speed up completion for the next round of paper deadlines around mid-December.



Project status snapshot:
(note these numbers are approximations)

Tasks
Tasks ready to send 17159
Tasks in progress 22491
Users
With credit 1020
Registered in past 24 hours 66
Hosts
With recent credit 2051
Registered in past 24 hours 53
Current GigaFLOPS 30532.46

Dataset 1 and 2 progress:

SingleDirectMachine      10002/10004
EightBitMachine          10001/10006
SingleInvertMachine      10001/10003
SimpleXORMachine         10000/10002
ParityMachine              884/10005

ParityModified             275/10005
EightBitModified          6492/10006
SimpleXORModified        10005/10005
SingleDirectModified     10004/10004
SingleInvertModified     10002/10002 

Dataset 3 progress:
Overall (so far): 37600/40425
Milestone 1, 100x100:  10000/10000
Milestone 2, 100x1000: 37600/100000
Milestone 3: 100x10000: 37600/1000000


Last week's TWIM Notes: Oct 19 2020

Thanks again to all our volunteers!

-- The MLC@Home Admins
ID: 723 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Albert Argilaga, Ph.D.
Avatar

Send message
Joined: 4 Oct 20
Posts: 6
Credit: 366,035
RAC: 1,264
Message 726 - Posted: 28 Oct 2020, 12:53:27 UTC - in response to Message 723.  

Great news!!! OpenCL support anytime soon?

Congratulations for the achievements!
ID: 726 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 446
Credit: 14,122,724
RAC: 36,468
Message 729 - Posted: 28 Oct 2020, 14:13:16 UTC

We're limited by what PyTorch (our underlying framework) supports well, which is CUDA and ROCm.
That covers NVIDIA GPUs on Linux and Windows, and some AMD GPUs (discrete POLARIS and VEGA)
on Linux.

We would love to support OpenCL or (Vulkan Compute!) but PyTorch doesn't at the moment. Intel
is doing their own thing (oneAPI/ideep/mkldnn/whatever they're calling it this week) which will likely
support their newer GPUs (both discrete and integrated), and are heavy contributors to PyTorch, so
I suspect they will be supported at some point.

Relying on a framework is win overall, in that we're not writing all the math/algorithms ourselves
and instead using tested/proven code, and we get GPU support "built in". But it does mean we're
stuck with its shortcomings (limited to what it currently supports for GPUs and no static linking).
ID: 729 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Speedy51

Send message
Joined: 20 Sep 20
Posts: 1
Credit: 171,735
RAC: 32
Message 736 - Posted: 29 Oct 2020, 3:38:48 UTC

Thanks for the news. Exciting I am looking forward to seeing how this works on my RTX 2070.
ID: 736 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bill F
Avatar

Send message
Joined: 2 Jul 20
Posts: 7
Credit: 1,276,351
RAC: 4,559
Message 746 - Posted: 1 Nov 2020, 2:24:58 UTC - in response to Message 736.  

Asking out of curiosity how much volume will you be creating for the Nvida test work. I have adjusted my settings but there does not seem to be any test tasks being generated.

Thanks
Bill F
In October of 1969 I took an oath to support and defend the Constitution of the United States against all enemies, foreign and domestic;
There was no expiration date.


ID: 746 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 446
Credit: 14,122,724
RAC: 36,468
Message 752 - Posted: 1 Nov 2020, 18:49:44 UTC - in response to Message 746.  

At the moment, GPU testing WUs are paused while we try and figure out a bizarre issue with the linux cuda app which is crashing from an external signal. We're not going to release any more test units until that's fixed, and it's a real head-scratcher, as it only crashes when run from BOINC, never when standalone. We're in touch with other project admins and boinc developers who have cuda clients to get some help debugging.

When that's resolved, we'll throw some more testing WUs out and make a BOINC "app plan" that sets some minimum system requirement before you'll get GPU WUs. We'll also post requirements prominently on the web site and forums for all our existing and future clients, as that's something we could do a better job publishing. Going forward, we expect to have about an even split between GPU WUs and CPU WUs.
ID: 752 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : News : [TWIM Notes] Oct 27 2020

©2021 MLC@Home Team
A project of the Cognition, Robotics, and Learning (CORAL) Lab at the University of Maryland, Baltimore County (UMBC)