Message boards :
News :
[TWIM Notes] Nov 17 2020
Message board moderation
Author | Message |
---|---|
Send message Joined: 30 Jun 20 Posts: 462 Credit: 21,406,548 RAC: 0 ![]() ![]() ![]() ![]() |
This Week in MLC@Home Notes for Nov 17 2020 A weekly summary of news and notes for MLC@Home Summary GPU week(s), part 4: The saga continues. This coming week we're starting to pivot back to writing and analysis of existing data. There's a lot to discuss related to GPUs still, but we also need to take time to write a bit; and frankly, the issues with the Linux/CUDA client have us stumped for the moment, so taking a week or two to focus on some of the science that's been piling up should help clear our heads and come back to it with a fresh perspective. This past week we enabled the release track ("mlds-gpu" application) of the GPU client for Windows and Linux, and the good news is that at least the Windows client is working fairly well*, and its chewing through WUs wonderfully, complimenting the CPU crunching that continues to unabated. Having a separate app allows us to capture the wildly different RAM requirements for CUDA WUs without penalizing CPU crunchers. The GPU line allows us to send out longer dataset 1+2 WUs, which has led to a nice boost to the number of complete Parity* WUs, meaning we finally have over 1000 examples of each network type for Dataset1, and on our way to that in Dataset 2. It would be nice to wrap up those two sooner rather than later. We'll continue to release WUs in parallel to both the CPU and GPU queues to keep both users fed. Not all is well in GPU land though. We released the Linux/CUDA client, but after several days, not a single WU completed without error, so we've pulled that back from production and will try again. This is incredibly frustrating, as it works on our test machine, but on volunteer machines it fails with CUDA errors indicating userspace<->driver incompatibilities. Clearly we're not bundling it up correctly. In addition, there's been some strange results to CPU utilization and the Windows CUDA client. Users have reported better performance and utilization if they assign two CPUs to the WU instead of one, even though one core remains idle the entire time. There's some speculation in the linked thread, but we should track that down soon as well. All that's to say we're really excited that GPU support is at least partially live and giving us a nice performance boost, but it's also been more of a drain on resources than anticipated, and we need to turn focus back a bit before tackling Linux/CUDA again. If any experienced Linux/CUDA devs would like to offer help deploying our pytorch/cuda app combination, we'd love for you to contact us and help us troubleshoot. More specific news below, some of it is even non-GPU related! News:
SingleDirectMachine 10002/10004 EightBitMachine 10001/10006 SingleInvertMachine 10001/10003 SimpleXORMachine 10000/10002 ParityMachine 1005/10005 ParityModified 336/10005 EightBitModified 6803/10006 SimpleXORModified 10005/10005 SingleDirectModified 10004/10004 SingleInvertModified 10002/10002 Dataset 3 progress: Overall (so far): 64502/84376 Milestone 1, 100x100: 10000/10000 Milestone 2, 100x1000: 64502/100000 Milestone 3: 100x10000: 64502/1000000 Last week's TWIM Notes: Nov 9 2020 Thanks again to all our volunteers! -- The MLC@Home Admins[/s] Homepage: https://www.mlcathome.org/ Twitter: @MLCHome2 |
Send message Joined: 11 Jul 20 Posts: 33 Credit: 1,266,237 RAC: 0 ![]() ![]() ![]() ![]() |
You're doing a GREAT work!! P.S. At the end, also Rocm (Rocm 4.0) arrives officially to consumer gpu!! And performances of RX68xx family seems very good in OpenCL. |
Send message Joined: 12 Jul 20 Posts: 48 Credit: 73,492,193 RAC: 0 ![]() ![]() ![]() ![]() |
Don't the recent drivers (latest is 20.45) for the RX 570 support OpenCl 2.0? Would that work? |
Send message Joined: 12 Jul 20 Posts: 48 Credit: 73,492,193 RAC: 0 ![]() ![]() ![]() ![]() |
To answer my own question, I can't get the 20.45 driver installed on Ubuntu 20.04, even though they were released for that. It is some sort of 32-bit dependency problem, and Ubuntu 20.04 doesn't support the 32 bit architecture. Or something like that. I went back to a GTX 1060, and the drivers installed right away. |
Send message Joined: 9 Jul 20 Posts: 142 Credit: 11,536,204 RAC: 3 ![]() ![]() ![]() ![]() |
Thanks for going into great detail in your weekly update. |
Send message Joined: 30 Jun 20 Posts: 462 Credit: 21,406,548 RAC: 0 ![]() ![]() ![]() ![]() |
OpenCL would be ideal, but PyTorch doesn't accelerate using openCL or vulkan. Or rather, it had some preliminary support for some small speedups for *inference* on opencl/vulkan, but nothing ready for production, and nothing for training. If you want acceleration, it's CPU, CUDA, or ROCm. And maybe Metal on OSX? not sure. |
©2023 MLC@Home Team
A project of the Cognition, Robotics, and Learning (CORAL) Lab at the University of Maryland, Baltimore County (UMBC)