[TWIM Notes] Nov 17 2020

Message boards : News : [TWIM Notes] Nov 17 2020
Message board moderation

To post messages, you must log in.

AuthorMessage
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 462
Credit: 21,406,548
RAC: 0
Message 854 - Posted: 18 Nov 2020, 4:29:36 UTC

This Week in MLC@Home
Notes for Nov 17 2020
A weekly summary of news and notes for MLC@Home

Summary
GPU week(s), part 4: The saga continues.

This coming week we're starting to pivot back to writing and analysis of existing data. There's a lot to discuss related to GPUs still, but we also need to take time to write a bit; and frankly, the issues with the Linux/CUDA client have us stumped for the moment, so taking a week or two to focus on some of the science that's been piling up should help clear our heads and come back to it with a fresh perspective.

This past week we enabled the release track ("mlds-gpu" application) of the GPU client for Windows and Linux, and the good news is that at least the Windows client is working fairly well*, and its chewing through WUs wonderfully, complimenting the CPU crunching that continues to unabated. Having a separate app allows us to capture the wildly different RAM requirements for CUDA WUs without penalizing CPU crunchers. The GPU line allows us to send out longer dataset 1+2 WUs, which has led to a nice boost to the number of complete Parity* WUs, meaning we finally have over 1000 examples of each network type for Dataset1, and on our way to that in Dataset 2. It would be nice to wrap up those two sooner rather than later. We'll continue to release WUs in parallel to both the CPU and GPU queues to keep both users fed.

Not all is well in GPU land though. We released the Linux/CUDA client, but after several days, not a single WU completed without error, so we've pulled that back from production and will try again. This is incredibly frustrating, as it works on our test machine, but on volunteer machines it fails with CUDA errors indicating userspace<->driver incompatibilities. Clearly we're not bundling it up correctly. In addition, there's been some strange results to CPU utilization and the Windows CUDA client. Users have reported better performance and utilization if they assign two CPUs to the WU instead of one, even though one core remains idle the entire time. There's some speculation in the linked thread, but we should track that down soon as well.

All that's to say we're really excited that GPU support is at least partially live and giving us a nice performance boost, but it's also been more of a drain on resources than anticipated, and we need to turn focus back a bit before tackling Linux/CUDA again. If any experienced Linux/CUDA devs would like to offer help deploying our pytorch/cuda app combination, we'd love for you to contact us and help us troubleshoot.

More specific news below, some of it is even non-GPU related!

News:

  • You can follow GPU client progress on several forum threads like this one: https://www.mlcathome.org/mlcathome/forum_thread.php?id=111
  • We fired up our ARM-based test systems that had fallen off the network to make sure the current ARM app continues to run. We're able to verify that all three of our arm32/arm64 test systems running Debian 10 are crunching fine with the latest client, this includes a RPi3 (32-bit), RPi4 (64-bit), and a CuBox-i4 (32-bit).
  • The Dataset 1+2 WUs we release in the GPU queue have a larger epoch limit than those in the CPU queue, and have a proportional increase in credit awarded. We may make a similar change in the CPU queue, but it would mean much longer runtimes, so for now we're seeing how it goes in the GPU queue and will make a determination in the future.
  • We spent some time this week researching how to drop the AppImage (FUSE) requirement on Linux. Its definitely possible, but we're loathe to roll out that change, even to the test queue as, overall, appimage hasn't caused too many issues and don't want to do anything unnecessary at the moment. We thought it might help with the Linux/CUDA issues, but no longer things that's true.
  • Datasets 1,2 and 3 continue crunching away. GREAT progress so far!
  • We know some of the web pages are out of date, and we hope to address that this week. Updates queued include: a complete update/redo of the MLDS Dataset page, and an update to the "system requirements" section of the main page to better list minimum software requirements.
  • If we divide each of the three datasets into 3 releases based on the number of examples in each release (100, 1000, 10000), then we're ready to package up Dataset 1 (100, 1000), Dataset 2 (100), and Dataset 3 (100).
  • If you aren't aware of the BOINC Network Podcast, the MLC@Home devs lurk there and sometimes contribute Be sure to check it out if you're interested: https://www.boinc.network/.
  • We hope to get back to preparing Dataset 4, and writing a tech report/paper to go along with the Dataset releases this week.



Project status snapshot:
(note these numbers are approximations)

Tasks
Tasks ready to send 48470
Tasks in progress 24464
Users
With credit 1190
Registered in past 24 hours 47
Hosts
With recent credit 2129
Registered in past 24 hours 25
Current GigaFLOPS 33798.72

Dataset 1 and 2 progress:

SingleDirectMachine      10002/10004
EightBitMachine          10001/10006
SingleInvertMachine      10001/10003
SimpleXORMachine         10000/10002
ParityMachine             1005/10005

ParityModified             336/10005
EightBitModified          6803/10006
SimpleXORModified        10005/10005
SingleDirectModified     10004/10004
SingleInvertModified     10002/10002 

Dataset 3 progress:
Overall (so far): 64502/84376
Milestone 1, 100x100:  10000/10000
Milestone 2, 100x1000: 64502/100000
Milestone 3: 100x10000: 64502/1000000


Last week's TWIM Notes: Nov 9 2020

Thanks again to all our volunteers!

-- The MLC@Home Admins[/s]
Homepage: https://www.mlcathome.org/
Twitter: @MLCHome2
ID: 854 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[VENETO] boboviz

Send message
Joined: 11 Jul 20
Posts: 33
Credit: 1,266,237
RAC: 0
Message 856 - Posted: 19 Nov 2020, 8:28:45 UTC - in response to Message 854.  
Last modified: 19 Nov 2020, 8:28:58 UTC

You're doing a GREAT work!!

P.S.
At the end, also Rocm (Rocm 4.0) arrives officially to consumer gpu!!
And performances of RX68xx family seems very good in OpenCL.
ID: 856 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 12 Jul 20
Posts: 48
Credit: 73,492,193
RAC: 0
Message 858 - Posted: 19 Nov 2020, 14:51:13 UTC - in response to Message 856.  

Don't the recent drivers (latest is 20.45) for the RX 570 support OpenCl 2.0?
Would that work?
ID: 858 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 12 Jul 20
Posts: 48
Credit: 73,492,193
RAC: 0
Message 859 - Posted: 19 Nov 2020, 19:46:00 UTC - in response to Message 858.  
Last modified: 19 Nov 2020, 19:46:14 UTC

To answer my own question, I can't get the 20.45 driver installed on Ubuntu 20.04, even though they were released for that.
It is some sort of 32-bit dependency problem, and Ubuntu 20.04 doesn't support the 32 bit architecture.
Or something like that. I went back to a GTX 1060, and the drivers installed right away.
ID: 859 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bozz4science

Send message
Joined: 9 Jul 20
Posts: 142
Credit: 11,536,204
RAC: 3
Message 861 - Posted: 19 Nov 2020, 20:50:47 UTC

Thanks for going into great detail in your weekly update.
ID: 861 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 462
Credit: 21,406,548
RAC: 0
Message 863 - Posted: 20 Nov 2020, 2:00:31 UTC

OpenCL would be ideal, but PyTorch doesn't accelerate using openCL or vulkan. Or rather, it had some preliminary support for some small speedups for *inference* on opencl/vulkan, but nothing ready for production, and nothing for training.

If you want acceleration, it's CPU, CUDA, or ROCm. And maybe Metal on OSX? not sure.
ID: 863 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : News : [TWIM Notes] Nov 17 2020

©2024 MLC@Home Team
A project of the Cognition, Robotics, and Learning (CORAL) Lab at the University of Maryland, Baltimore County (UMBC)