[TWIM Notes] Nov 23 2020

Message boards : News : [TWIM Notes] Nov 23 2020
Message board moderation

To post messages, you must log in.

AuthorMessage
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 462
Credit: 21,406,548
RAC: 0
Message 881 - Posted: 24 Nov 2020, 7:55:08 UTC
Last modified: 24 Nov 2020, 8:03:48 UTC

This Week in MLC@Home
Notes for Nov 23 2020
A weekly summary of news and notes for MLC@Home

Summary
BoincNetwork did a podcast introducing MLC@Home! You can listen to the episode, including some comments from your MLC Admins, at https://www.boinc.network/episode/mlchome. Thanks to all those at BoincNetwork for taking an interest in us.

Last week we pivoted towards analysis and writing. However, we did take one last crack at Linux/CUDA support, and surprisingly that seems to have done the trick. We've released Linux/CUDA support to the main mlds-gpu queue. There are some trade-offs with the new client though, especially with regards to disk space required.

We're also released Linux/ROCm AMD GPU support to mldstest, but it is currently only enabled for VEGA 56/64 GPUs, and requires a Linux 5.0.0 kernel or greater. Please see the link in the News section below for more details.

We also spent some time overhauling the MLDS Dataset page on the main website. It's still not done, but a little less outdated than it was, and is preparing for more releasing the first rounds of datasets. With that and some more analysis, we're working towards our first Dataset paper. More news below:

News:

  • More details on the updated Linux GPU support, both CUDA and ROCm, and it is here: https://www.mlcathome.org/mlcathome/forum_thread.php?id=127.
  • The latest GPU clients drop the AppImage requirement. CPU clients still use it, but it may make sense to drop it there too. See the above thread for the associated trade-offs.
  • DS1+DS2 continue to march towards completion. Only Parity and EightBit workunits left! Keep it up!
  • DS3 progress is approaching milestone 2, 1000x100.. approaching 100000 DS3 networks trained. That will definitely help with our ongoing analysis. It's nice to see the scoreboard at the bottom of https://www.mlcathome.org/ turning from yellow to green!
  • Updates MLDS Dataset page here: https://www.mlcathome.org/mlds.html, you can see how we're currently planning to split up the datasets for release.
  • We've starting writing a dataset paper this week, and hope to get back to preparing dataset 4 this week.



Project status snapshot:
(note these numbers are approximations)

Tasks
Tasks ready to send 28803
Tasks in progress 19629
Users
With credit 1251
Registered in past 24 hours 57
Hosts
With recent credit 2156
Registered in past 24 hours 19
Current GigaFLOPS 36126.42

Dataset 1 and 2 progress:

SingleDirectMachine      10002/10004
EightBitMachine          10001/10006
SingleInvertMachine      10001/10003
SimpleXORMachine         10000/10002
ParityMachine             1031/10005

ParityModified             348/10005
EightBitModified          6910/10006
SimpleXORModified        10005/10005
SingleDirectModified     10004/10004
SingleInvertModified     10002/10002 

Dataset 3 progress:
Overall (so far): 75935/84376
Milestone 1, 100x100:  10000/10000
Milestone 2, 100x1000: 75935/100000
Milestone 3: 100x10000: 75935/1000000


Last week's TWIM Notes: Nov 17 2020

Thanks again to all our volunteers!

-- The MLC@Home Admins(s)
Homepage: https://www.mlcathome.org/
Twitter: @MLCHome2
ID: 881 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jan Vaclavik

Send message
Joined: 10 Aug 20
Posts: 2
Credit: 122,971
RAC: 6
Message 932 - Posted: 9 Dec 2020, 16:15:22 UTC

Now that the NVIDIA apps seem to be up and running and crushing the CPUs in terms of TFLOPS with AMD app to follow soon, is there a point running the project on CPU? Or would it be a good time to let the GPUs do their magic and move the CPUs to some CPU-only project?
ID: 932 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 462
Credit: 21,406,548
RAC: 0
Message 933 - Posted: 9 Dec 2020, 16:36:40 UTC - in response to Message 932.  

Please consider continuing to CPU crunch. First, GPU performance is nice, but the bulk of the work we're doing is still on the CPU side. Second, now that we have both, we can tailor workunits to each one to help the overall goal. Third, one of the interesting results we hope to compare as part of this research is if you can tell the difference between a neural network trained on a GPU versus one on a CPU, which in an of itself would be an interesting result. Fourth, in a project like MLDS, where the goal is to train as many individual networks as possible, there are a *lot* more CPU cores out there than people have GPUs. While a single GPU is faster than a single CPU thread, in a world where 16 and even 32 threads in a machine is not unheard of, running several MLDS WUs mixed in with other projects in parallel on a CPU, while slower individually, is sometimes better than running one WU on a GPU (which is also probably competing for a timeslice with other GPU projects). Finally, GPU performance is a bit inconsistent at the moment (seems to be very dependent on how loaded the CPU is otherwise), while CPU performance is very reliable.

Bottom line, we're going to contunue feeding both queues and hope our volunteers continue to run both to help support the project. Of course, its your hardware and your donation, so please allocate your resources as you see fit, and thanks for contributing!
ID: 933 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jan Vaclavik

Send message
Joined: 10 Aug 20
Posts: 2
Credit: 122,971
RAC: 6
Message 934 - Posted: 9 Dec 2020, 18:03:30 UTC
Last modified: 9 Dec 2020, 18:04:18 UTC

Thank you for the quick reply. Many GPU enabled (sub)projects are IMO waste of CPU time, but since you point out that CPUs do contribute to this project beyond the raw performance, I guess it only makes sense to continue using them. Not that I am large contributor myself.
ID: 934 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : News : [TWIM Notes] Nov 23 2020

©2024 MLC@Home Team
A project of the Cognition, Robotics, and Learning (CORAL) Lab at the University of Maryland, Baltimore County (UMBC)