Questions and Answers :
Unix/Linux :
GPU update
Message board moderation
Previous · 1 · 2
Author | Message |
---|---|
![]() Send message Joined: 1 Jul 20 Posts: 30 Credit: 22,436,564 RAC: 170,465 ![]() ![]() ![]() ![]() ![]() |
Just to let you know there is progress. This is with a rand_automata/DS3 datafile. Want to get it into testing tonight or tomorrow. "Use GPU's" should not default to "Yes". The option should default to opt-in not opt-out. |
Send message Joined: 30 Jun 20 Posts: 356 Credit: 8,169,868 RAC: 77,753 ![]() ![]() ![]() ![]() ![]() |
Will look at fixing the GPU option. The option showed up as soon as we uploaded the mldstest windows client (which is *huge*, 1GB in size). So its whatever the BOINC server defaults to now. |
![]() Send message Joined: 1 Jul 20 Posts: 30 Credit: 22,436,564 RAC: 170,465 ![]() ![]() ![]() ![]() ![]() |
Will look at fixing the GPU option. Fixed. Thanks! |
Send message Joined: 12 Aug 20 Posts: 20 Credit: 20,874,053 RAC: 20,072 ![]() ![]() ![]() ![]() ![]() |
Will there also be a CUDA app for Linux soon? //Gunnar |
Send message Joined: 30 Jun 20 Posts: 356 Credit: 8,169,868 RAC: 77,753 ![]() ![]() ![]() ![]() ![]() |
Yes. only reason it isn't out yet is I have only one nvidia card, and I started with windows thinking it would be the hardest. Once I'm able to verify the GPU client works with boinc/windows (I know I can run it by hand just fine), I'll reboot into linux and compile a linux cuda client. I still have the rocm version (linux) too, but that will require a custom app plan, which I still haven't learned how to do. Please note, there are going to be issues (which is why it's in mldstest). Here's a rundown of things I expect to be a problem:
|
Send message Joined: 30 Jun 20 Posts: 356 Credit: 8,169,868 RAC: 77,753 ![]() ![]() ![]() ![]() ![]() |
I got good news and bad new. The good news is, I have (finally) received the first test GPU WU served up by the project and its crunching now. The bad news is that it uses up to 2GB of RAM and 2GB of disk space for DS3 WUs, which are well beyond the limits CPU crunching requires, so if anyone else gets a GPU WU, they'll immediately error out with a "exceeded disk space" error. This leads to a complicated problem on my end.... Do I a) up the limits on all existing WUs so they can be run on a GPU or CPU, meaning pure CPU crunchers will suffer because the client will refuse to run unless it has 2GB/WU free (despite actually using only ~300MB when not using a GPU), and making things like a RPi3 w/ 1GB of ram impossible to schedule, or b) somehow create two sets of WUs, one for GPUs, one for CPUs, and figure out how to tell the BOINC server which is which (hint, that ain't easy). Option b) is really the only answer, but will require a bit more thought. Perhaps it would be best to create a separate "mlds-gpu" application with a separate WU pool. Hmm. |
Send message Joined: 30 Jun 20 Posts: 356 Credit: 8,169,868 RAC: 77,753 ![]() ![]() ![]() ![]() ![]() |
Linux cuda client rolled out, not much success by the looks of the early returns.. lots of library/driver mismatches. Will let it run overnight and assess the damage tomorrow. |
Send message Joined: 1 Jul 20 Posts: 33 Credit: 15,808,999 RAC: 139,091 ![]() ![]() ![]() ![]() ![]() |
This is what I am getting with my linux machines. It doesn't seem like a memory/storage issue. But what do I know. :) <core_client_version>7.9.3</core_client_version> I haven't received a task for my windows machines yet. But if the ram/storage requirements are significantly different, then yes, perhaps a different app is required, so that people can select one and/or the other. Reno, NV Team: SETI.USA ![]() |
Send message Joined: 1 Jul 20 Posts: 33 Credit: 15,808,999 RAC: 139,091 ![]() ![]() ![]() ![]() ![]() |
Also, if you send out tasks that allow any resource to use them, then they get consumed by CPUs. And you won't get testing just on GPUs, assuming that is your goal for this batch. Reno, NV Team: SETI.USA ![]() |
Send message Joined: 1 Jul 20 Posts: 33 Credit: 15,808,999 RAC: 139,091 ![]() ![]() ![]() ![]() ![]() |
This task worked on a windows machine. It's a single GPU machine. I wonder if that is why my dual GPU windows machine gives errors. https://www.mlcathome.org/mlcathome/result.php?resultid=2592308 Reno, NV Team: SETI.USA ![]() |
Send message Joined: 30 Jun 20 Posts: 356 Credit: 8,169,868 RAC: 77,753 ![]() ![]() ![]() ![]() ![]() |
First, thanks again for testing. According to this question on the nvidia docs, it might be related to the specific driver version you have installed (450?). I note the one that worked uses a later driver (452): https://forums.developer.nvidia.com/t/forward-compatibility-runtime-error-after-installing-cuda-11-0/128503/4. Proprietary drivers are fun! Also, I'm fairly certain this app currently ignores multiple GPUs and only uses GPU 0. I haven't implemented that yet. |
Send message Joined: 20 Jul 20 Posts: 23 Credit: 774,499 RAC: 964 ![]() ![]() ![]() |
I got good news and bad new. option b. Most of my mlc units run on Atom boards that only have 2GB (1.5GB) shared between all 4 cores, and 8GB of remaining disk space. Cuda also shouldn't be this big. I was hoping you'd start with Intel, as most intel GPUs are unused (or, doing the collatz). If gpu acceleration is slow, it would make no sense to have big and heavy gpus do the job, and focus on the smaller ones. On the other hand, if you can improve performance on big gpus, making them 90-100% utilized, then a separate pool is necessary, as these gpu systems usually do have the ram requirements. |
![]() Send message Joined: 23 Sep 20 Posts: 6 Credit: 1,741,269 RAC: 117 ![]() ![]() ![]() ![]() |
I was hoping you'd start with Intel, as most intel GPUs are unused (or, doing the collatz). I would love to use my onboard Intel GPU |
Send message Joined: 12 Aug 20 Posts: 20 Credit: 20,874,053 RAC: 20,072 ![]() ![]() ![]() ![]() ![]() |
Hi! Got my first CUDA task now (tasknr 2629681), but it ended up in sigsegv and computation error (signal 11). OS: Ubuntu 18.04 Nvidia driver: 390.116 ?? (got it via the "nvidia-smi" command) GPU: GTX750Ti I'm standing by for more tests! :-) Good luck with the CUDA app!!! //Gunnar |
Send message Joined: 30 Jun 20 Posts: 356 Credit: 8,169,868 RAC: 77,753 ![]() ![]() ![]() ![]() ![]() |
Honestly, its very bizarre. The segv isn't coming from the new CUDA code, its coming from the training data loading code which is unmodified and running without issue on every other client, It also only happens when run by the boinc client, not when run by hand on the exact same system with same input. I'm reproducing it here, but its a real head scratcher. I won't be sending out more tests until I solve it. |
Send message Joined: 1 Jul 20 Posts: 33 Credit: 15,808,999 RAC: 139,091 ![]() ![]() ![]() ![]() ![]() |
Cuda on linux has never worked for me to date. But with 9.8, it is working now. Nice! Reno, NV Team: SETI.USA ![]() |
Send message Joined: 30 Jun 20 Posts: 356 Credit: 8,169,868 RAC: 77,753 ![]() ![]() ![]() ![]() ![]() |
Cuda on linux has never worked for me to date. But with 9.8, it is working now. Nice! woohoo! I'm going to post an update in a new thread to capture current status. |
Send message Joined: 9 Jul 20 Posts: 105 Credit: 2,165,446 RAC: 13,355 ![]() ![]() ![]() ![]() ![]() |
Looking forward to test on my system and verify. Great accomplishment! |
©2021 [email protected] Team
A project of the Cognition, Robotics, and Learning (CORAL) Lab at the University of Maryland, Baltimore County (UMBC)