|
1)
Questions and Answers :
Issue Discussion :
25% 'Error while computing' after upgrade to v9.90
(Message 1341)
Posted 27 Aug 2021 by Gunnar Hjern Post: Hi! I've just notices that about one in four of the new v9.90 tasks end up in 'Error while computing'! Since yesterday my computers have processed 166 tasks and 42 of them ended up in errors. I currently have four different computers running on this project and none of them seems more affected than the others. In the 'Stderr output' for the tasks I find one line particularly interesting: ... terminate called after throwing an instance of 'std::filesystem::__cxx11::filesystem_error' what(): filesystem error: cannot copy: File exists [model-best.pt] [model-final.pt] ... I've also processed hundreds of the v9.95 and v9.96 test-tasks and non of these had errors. Hope it can be fixed soon as it affects such a high percentage of the tasks. Kindest regards, Gunnar |
|
2)
Questions and Answers :
Unix/Linux :
GPU update
(Message 734)
Posted 29 Oct 2020 by Gunnar Hjern Post: Hi! Got my first CUDA task now (tasknr 2629681), but it ended up in sigsegv and computation error (signal 11). OS: Ubuntu 18.04 Nvidia driver: 390.116 ?? (got it via the "nvidia-smi" command) GPU: GTX750Ti I'm standing by for more tests! :-) Good luck with the CUDA app!!! //Gunnar |
|
3)
Questions and Answers :
Issue Discussion :
All my GPU applications have crushed.
(Message 733)
Posted 29 Oct 2020 by Gunnar Hjern Post: Same here! <core_client_version>7.9.3</core_client_version> <![CDATA[ <message> process got signal 11</message> <stderr_txt> DEBUG: Args: ../../projects/www.mlcathome.org_mlcathome/mldstest_9.72_x86_64-pc-linux-gnu__cuda_fermi -c -a LSTM --lr 0.001 -w 64 -b 2 -s 32 --maxepoch 192 --device 0 nthreads: 1 gpudev: 0 Re-exec()-ing to set number of threads correctly... </stderr_txt> ]]> (signal 11 == segfault) I have run the nvidia-smi command and from this I draw the conclusion that my Nvidia driver is 390.116 OS: Xubuntu 18.04 GPU: GTX750Ti //Gunnar |
|
4)
Questions and Answers :
Unix/Linux :
GPU update
(Message 698)
Posted 25 Oct 2020 by Gunnar Hjern Post: Will there also be a CUDA app for Linux soon? //Gunnar |
|
5)
Questions and Answers :
Unix/Linux :
GPU update
(Message 575)
Posted 4 Oct 2020 by Gunnar Hjern Post: Thanks for the thorough and understandable explanation!! Now I'm beginning to understand it a bit better. :-) We will wait eagerly. Just issue a new test app if you want to get help with testing the CUDA version! Nice crunchy weekend!!! //Gunnar |
|
6)
Questions and Answers :
Unix/Linux :
GPU update
(Message 568)
Posted 3 Oct 2020 by Gunnar Hjern Post: Thanks for the updates! I'm not quite sure about the AMD Vega56 GPU, but I had expected at least an order of magnitude faster that those 40% you mentioned. Maybe PyTorch doesn't make as good use of the AMD ROCm as it does with CUDA? It should be very interesting to hear from you when you have a CUDA test app ready. I have two machines with usable GPUs (GTX960 and GTX750Ti) and I will gladly help with the testing. I'll set them up on MLC as soon as there is a CUDA app! :-) Nice weekend!!! //Gunnar |
|
7)
Message boards :
News :
[TWIM Notes] Sep 14 2020
(Message 484)
Posted 17 Sep 2020 by Gunnar Hjern Post: I read someone mentioning 2080 or higher as a possibility. but i have a 2070 super. Most of us don't have anything even close to a 20xx!!! The two usable GPU:s that I have are a GTX 960 and a GTX 750 Ti. Those are of course not the most modern, but still they keep me going fairly strong in projects like Einstein@home and GPUGRID. I would reckon that at least the GTX 960 are about average among the GPU:s actually running out there, and the 750Ti is a really classic one. Under Linux, they supports both CUDA- and OpenCL-apps, and I have also tested them successfully on projects like Asteroids@home, Milkyway, and Collatz. I would immediately start running MLC with both of them if there comes a GPU app that supports them! :-) //Gunnar |
|
8)
Message boards :
Cafe :
Add project to GRCPool whitelist?
(Message 475)
Posted 14 Sep 2020 by Gunnar Hjern Post: Ok, thank you. This project is awesome! +1 !!! |
|
9)
Questions and Answers :
Issue Discussion :
GDPR setting missing
(Message 458)
Posted 8 Sep 2020 by Gunnar Hjern Post: Hi! Look at the "MLC@Home preferences" page under "preferences" on your home page: One of the checkboxes (the last) have the text: "Do you consent to exporting your data to BOINC statistics aggregation Web sites?" I think that's the setting you're looking for. Happy crunching!!! //Gunnar |
|
10)
Questions and Answers :
Issue Discussion :
"No tasks sent"
(Message 453)
Posted 5 Sep 2020 by Gunnar Hjern Post: Hi! I'm not really a black-belt Linux expert, but it looks to my eyes as you are running a 32-bit Linux variant. That's really a show-stopper as all applications in MLC are compiled for 64 bit Linux on Intel processors. Btw., what is the name of the Linux distribution you're running? //Gunnar |
|
11)
Questions and Answers :
Unix/Linux :
Workaround for "signal 4" problems with 9.50 linux client
(Message 449)
Posted 2 Sep 2020 by Gunnar Hjern Post: Hi! YES!!! Now it works on all my machines, even those with the old glibc 2.19! :-) I've put all my old cans back onto MLC again - let's do some crunching! Thank you, and have a really nice week! //Gunnar |
|
12)
Questions and Answers :
Unix/Linux :
Workaround for "signal 4" problems with 9.50 linux client
(Message 433)
Posted 30 Aug 2020 by Gunnar Hjern Post: Hi! Tested this on three of my T7500, T7300, and T5870 machines, with (x)ubuntu 16 and 18, and found that the first application ".mlds-test-no-sse4.appimage" always executes well (see example output below), but the second "mlds-test-no-sse4-mkldnn.appimage" always crashes with: Illegal instruction (core dumped) Nice weekend, and good luck with the work!!! //Gunnar Output from one of the computers: gunnar@gunnar-hp6910p2:~/testMLC$ ./mlds-test-no-sse4.appimage -m 2 Detected Intel Family 6 processor, switching OpenBLAS to generic Re-exec()-ing to set number of threads correctly... Machine Learning Dataset Generator v9.55 (Linux/x86_64) (libTorch: release/1.6) [2020-08-30 14:05:11 main:399] : INFO : Set logging level to 1 [2020-08-30 14:05:11 main:407] : INFO : Running in BOINC Standalone mode [2020-08-30 14:05:11 main:412] : INFO : Resolving all filenames [2020-08-30 14:05:11 main:420] : INFO : Resolved: dataset.hdf5 => dataset.hdf5 (exists = 1) [2020-08-30 14:05:11 main:420] : INFO : Resolved: model.cfg => model.cfg (exists = 0) [2020-08-30 14:05:11 main:420] : INFO : Resolved: model-final.pt => model-final.pt (exists = 0) [2020-08-30 14:05:11 main:420] : INFO : Resolved: model-input.pt => model-input.pt (exists = 0) [2020-08-30 14:05:11 main:420] : INFO : Resolved: snapshot.pt => snapshot.pt (exists = 0) [2020-08-30 14:05:11 main:434] : INFO : Dataset filename: dataset.hdf5 [2020-08-30 14:05:11 main:436] : INFO : Configuration: [2020-08-30 14:05:11 main:437] : INFO : Validation Loss Threshold: 0.0001 [2020-08-30 14:05:11 main:438] : INFO : Max Epochs: 2 [2020-08-30 14:05:11 main:439] : INFO : Batch Size: 128 [2020-08-30 14:05:11 main:440] : INFO : Patience: 10 [2020-08-30 14:05:11 main:441] : INFO : Hidden Width: 12 [2020-08-30 14:05:11 main:442] : INFO : # Recurrent Layers: 4 [2020-08-30 14:05:11 main:443] : INFO : # Backend Layers: 4 [2020-08-30 14:05:11 main:445] : INFO : Preparing Dataset [2020-08-30 14:05:11 load_hdf5_ds_into_tensor:28] : INFO : Loading Dataset /Xt from dataset.hdf5 into memory [2020-08-30 14:05:12 load_hdf5_ds_into_tensor:28] : INFO : Loading Dataset /Yt from dataset.hdf5 into memory [2020-08-30 14:05:13 load:103] : INFO : Successfully loaded dataset of 2048 examples into memory. [2020-08-30 14:05:13 load_hdf5_ds_into_tensor:28] : INFO : Loading Dataset /Xv from dataset.hdf5 into memory [2020-08-30 14:05:13 load_hdf5_ds_into_tensor:28] : INFO : Loading Dataset /Yv from dataset.hdf5 into memory [2020-08-30 14:05:13 load:103] : INFO : Successfully loaded dataset of 512 examples into memory. [2020-08-30 14:05:13 main:451] : INFO : Creating Model [2020-08-30 14:05:13 main:456] : INFO : Preparing config file [2020-08-30 14:05:13 main:468] : INFO : Creating new config file [2020-08-30 14:05:13 main:499] : INFO : Loading DataLoader into Memory [2020-08-30 14:05:13 main:502] : INFO : Starting Training [2020-08-30 14:07:01 main:514] : INFO : Epoch 1 | loss: 0.037587 | val_loss: 0.0315968 | Time: 107342 ms [2020-08-30 14:08:45 main:514] : INFO : Epoch 2 | loss: 0.0312114 | val_loss: 0.030173 | Time: 104527 ms [2020-08-30 14:08:45 main:533] : INFO : Saving trained model to model-final.pt, val_loss 0.030173 [2020-08-30 14:08:45 main:538] : INFO : Saving end state to config to file [2020-08-30 14:08:45 main:543] : INFO : Success, exiting.. gunnar@gunnar-hp6910p2:~/testMLC$ rm -f model.cfg model-final.pt snapshot.pt gunnar@gunnar-hp6910p2:~/testMLC$ ./mlds-test-no-sse4-mkldnn.appimage -m 2 Illegal instruction (core dumped) gunnar@gunnar-hp6910p2:~/testMLC$ |
|
13)
Questions and Answers :
Unix/Linux :
OS/Distribution support question?
(Message 420)
Posted 27 Aug 2020 by Gunnar Hjern Post: Hi! Thanks for a fast response! I'll be patient, and wait for the new version. Have a nice week, and again: Thanks for the excellent project administration!!! //Gunnar |
|
14)
Questions and Answers :
Unix/Linux :
OS/Distribution support question?
(Message 418)
Posted 27 Aug 2020 by Gunnar Hjern Post: Hi! I'm running 26 computers, see the list below. I made a run-down of the OS:s and the libc-versions: The OS:s are: 9 Ubuntu 14.04, 2 Ubuntu 16.04, 14 Ubuntu 18.04, 1 Linux mint 18.3, libc versions are: 9 libc 2.19 3 libc 2.23 14 libc 2.27 I have noticed that none of the Ubuntu 14 machines can run any 9.55 tasks, but they all end in computation error as the tasks demand for the libc 2.23 to be present! :-( //Gunnar List: ID: 1975, CPU: Intel(R) Core(TM) i5-3470S CPU @ 2.90GHz [Family 6 Model 58 Stepping 9] (4 cpu), OS: Linux Ubuntu 18.04.2 LTS [5.3.0-61-generic|libc 2.27 (Ubuntu GLIBC 2.27-3ubuntu1)] ID: 1988, CPU: Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz [Family 6 Model 15 Stepping 11] (2 cpu), OS: Linux Ubuntu 18.04.2 LTS [4.18.0-25-generic|libc 2.27 (Ubuntu GLIBC 2.27-3ubuntu1)] ID: 1996, CPU: Intel(R) Xeon(R) CPU E3-1245 V2 @ 3.40GHz [Family 6 Model 58 Stepping 9] (8 cpu), OS: Linux Ubuntu 14.04 LTS, 4.4.0-148-generic, libc 2.19 ID: 2004, CPU: Intel(R) Xeon(R) CPU E3-1245 V2 @ 3.40GHz [Family 6 Model 58 Stepping 9] (8 cpu), OS: Linux Ubuntu 18.04.2 LTS [5.0.0-23-generic|libc 2.27 (Ubuntu GLIBC 2.27-3ubuntu1)] ID: 2005, CPU: Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz [Family 6 Model 58 Stepping 9] (4 cpu), OS: Linux Ubuntu 18.04.2 LTS [5.3.0-61-generic|libc 2.27 (Ubuntu GLIBC 2.27-3ubuntu1)] ID: 2006, CPU: Intel(R) Core(TM) i5-4570 CPU @ 3.20GHz [Family 6 Model 60 Stepping 3] (4 cpu), OS: Linux Ubuntu 18.04.2 LTS [5.3.0-61-generic|libc 2.27 (Ubuntu GLIBC 2.27-3ubuntu1)] ID: 2007, CPU: Intel(R) Core(TM) i5-3570 CPU @ 3.40GHz [Family 6 Model 58 Stepping 9] (4 cpu), OS: Linux Ubuntu 18.04.2 LTS [5.3.0-61-generic|libc 2.27 (Ubuntu GLIBC 2.27-3ubuntu1)] ID: 2008, CPU: Intel(R) Core(TM) i5-2500 CPU @ 3.30GHz [Family 6 Model 42 Stepping 7] (4 cpu), OS: Linux Ubuntu 14.04 LTS, 4.4.0-184-generic, libc 2.23 ID: 2009, CPU: Intel(R) Core(TM) i5-2500 CPU @ 3.30GHz [Family 6 Model 42 Stepping 7] (4 cpu), OS: Linux Ubuntu 18.04.2 LTS [5.3.0-61-generic|libc 2.27 (Ubuntu GLIBC 2.27-3ubuntu1)] ID: 2010, CPU: Intel(R) Core(TM) i5-4590 CPU @ 3.30GHz [Family 6 Model 60 Stepping 3] (4 cpu), OS: Linux Ubuntu 18.04.2 LTS [4.18.0-15-generic|libc 2.27 (Ubuntu GLIBC 2.27-3ubuntu1)] ID: 2011, CPU: Intel(R) Core(TM) i5-3470S CPU @ 2.90GHz [Family 6 Model 58 Stepping 9] (4 cpu), OS: Linux Ubuntu 18.04.2 LTS [4.18.0-15-generic|libc 2.27 (Ubuntu GLIBC 2.27-3ubuntu1)] ID: 2020, CPU: Intel(R) Core(TM)2 Duo CPU P8600 @ 2.40GHz [Family 6 Model 23 Stepping 10] (2 cpu), OS: Linux Ubuntu 14.04 LTS, 4.4.0-62-generic, libc 2.19 ID: 2051, CPU: Intel(R) Core(TM)2 Duo CPU P8700 @ 2.53GHz [Family 6 Model 23 Stepping 10] (2 cpu), OS: Linux Ubuntu 18.04.4 LTS [5.3.0-51-generic|libc 2.27 (Ubuntu GLIBC 2.27-3ubuntu1)] ID: 2062, CPU: Intel(R) Core(TM) i5 CPU M 520 @ 2.40GHz [Family 6 Model 37 Stepping 2] (4 cpu), OS: Linux Mint, 4.10.0-38-generic, libc 2.23 ID: 2063, CPU: Intel(R) Core(TM)2 Duo CPU P8700 @ 2.53GHz [Family 6 Model 23 Stepping 10] (2 cpu), OS: Linux Ubuntu 18.04.2 LTS [5.0.0-23-generic|libc 2.27 (Ubuntu GLIBC 2.27-3ubuntu1)] ID: 2064, CPU: Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz [Family 6 Model 23 Stepping 10] (2 cpu), OS: Linux Ubuntu 14.04 LTS, 3.13.0-67-generic, libc 2.19 ID: 2065, CPU: Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz [Family 6 Model 23 Stepping 10] (2 cpu), OS: Linux Ubuntu 14.04 LTS, 3.13.0-67-generic, libc 2.19 ID: 2066, CPU: Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz [Family 6 Model 15 Stepping 11] (2 cpu), OS: Linux Ubuntu 14.04 LTS, 4.4.0-31-generic, libc 2.19 ID: 2067, CPU: Intel(R) Core(TM)2 Duo CPU E7400 @ 2.80GHz [Family 6 Model 23 Stepping 10] (2 cpu), OS: Linux Ubuntu 14.04 LTS, 4.4.0-79-generic, libc 2.19 ID: 2118, CPU: Intel(R) Core(TM) i5 CPU M 520 @ 2.40GHz [Family 6 Model 37 Stepping 5] (4 cpu), OS: Linux Ubuntu 18.04.2 LTS [5.0.0-23-generic|libc 2.27 (Ubuntu GLIBC 2.27-3ubuntu1)] ID: 2119, CPU: Intel(R) Core(TM) i5 CPU M 540 @ 2.53GHz [Family 6 Model 37 Stepping 2] (4 cpu), OS: Linux Ubuntu 14.04 LTS, 4.4.0-148-generic, libc 2.19 ID: 2123, CPU: Intel(R) Core(TM)2 Duo CPU T6570 @ 2.10GHz [Family 6 Model 23 Stepping 10] (2 cpu), OS: Linux Ubuntu 14.04 LTS, 3.13.0-170-generic, libc 2.19 ID: 2137, CPU: Intel(R) Core(TM)2 Duo CPU P8600 @ 2.40GHz [Family 6 Model 23 Stepping 6] (2 cpu), OS: Linux Ubuntu 18.04.2 LTS [4.18.0-15-generic|libc 2.27 (Ubuntu GLIBC 2.27-3ubuntu1)] ID: 2214, CPU: Intel(R) Core(TM)2 Duo CPU T7300 @ 2.00GHz [Family 6 Model 15 Stepping 10] (2 cpu), OS: Linux Ubuntu 16.04 LTS, 4.4.0-157-generic, libc 2.23 ID: 2228, CPU: Intel(R) Core(TM) i7 CPU L 640 @ 2.13GHz [Family 6 Model 37 Stepping 5] (4 cpu), OS: Linux Ubuntu 14.04 LTS, 4.4.0-31-generic, libc 2.19 ID: 2230, CPU: Intel(R) Core(TM)2 Duo CPU T5870 @ 2.00GHz [Family 6 Model 15 Stepping 13] (2 cpu), OS: Linux Ubuntu, 18.04.3 LTS [5.4.0-42-generic|libc 2.27 (Ubuntu GLIBC 2.27-3ubuntu1.2)] |
|
15)
Questions and Answers :
Unix/Linux :
Workaround for "signal 4" problems with 9.50 linux client
(Message 414)
Posted 26 Aug 2020 by Gunnar Hjern Post: Hi! Still problems for T-type CPUs. Tested the new version on three of my old cans with T5870 and T7500, but they still got the SIGILL signal, (Xubuntu 18.04) and the computer with the old Xubuntu 14.04 OS couldn't find GLIBC_2.23. (Also tried with resetting the project.) Here are some of the errors, and links to tasks and computers: Xubuntu 18.04: host: https://www.mlcathome.org/mlcathome/show_host_detail.php?hostid=2230 task: https://www.mlcathome.org/mlcathome/result.php?resultid=1226047 <message>process got signal 4</message> Xubuntu 18.04: host: https://www.mlcathome.org/mlcathome/show_host_detail.php?hostid=1988 task: https://www.mlcathome.org/mlcathome/result.php?resultid=1225407 <message>process got signal 4</message> Xubuntu 14.04: host: https://www.mlcathome.org/mlcathome/show_host_detail.php?hostid=2066 task: https://www.mlcathome.org/mlcathome/result.php?resultid=1192553 <message> process exited with code 1 (0x1, -255) </message> <stderr_txt> ../../projects/www.mlcathome.org_mlcathome/mlds_9.55_x86_64-pc-linux-gnu: /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.23' not found (required by /tmp/.mount_mlds_9ptg2Tv/usr/bin/../lib/libtorch_cpu.so) ../../projects/www.mlcathome.org_mlcathome/mlds_9.55_x86_64-pc-linux-gnu: /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.23' not found (required by /tmp/.mount_mlds_9ptg2Tv/usr/bin/../lib/libquadmath.so.0) </stderr_txt> Good luck with the future versions! //Gunnar |
|
16)
Questions and Answers :
Issue Discussion :
"No tasks sent"
(Message 409)
Posted 25 Aug 2020 by Gunnar Hjern Post: Hi! Now I'm getting tasks again, for all my computers! :-) Thanks for a VERY fast response and problem fix!!! I've run 35 different Boinc projects but never seen such fast and professional project admin before! I salute you all at MLC!!! //Gunnar |
|
17)
Questions and Answers :
Issue Discussion :
"No tasks sent"
(Message 405)
Posted 25 Aug 2020 by Gunnar Hjern Post: Hi! Same for me! None of my computers are getting any new tasks (only once here and there, and very sporadically) and at least one machine is already starving! :-( I cannot understand it as the status page says there are 32000++ tasks ready to send??? Hope admins will address this issue ASAP!! //Gunnar |
|
18)
Questions and Answers :
Unix/Linux :
Workaround for "signal 4" problems with 9.50 linux client
(Message 396)
Posted 24 Aug 2020 by Gunnar Hjern Post: Hi! Thanks for the advises! I tried both ways but I ran totally out of luck there I'm afraid! :-( The old cans I have that are affected by this bug is not any big and important ones, so I guess I have to wait for a newer version instead. I'll keep on crunching until then. Good luck with all your project work, and a nice new week to come!!! //Gunnar |
|
19)
Questions and Answers :
Unix/Linux :
Workaround for "signal 4" problems with 9.50 linux client
(Message 393)
Posted 24 Aug 2020 by Gunnar Hjern Post: Hi! The few times I have to (re)start the boinc client I would normally use the $ sudo service boinc-client start command, and that doesn't seem to allow to set any environment variables, or does it? I had a look-around in my computers but I couldn't find any .bashrc or bash_profile files in the home folder of boinc. Do you know what command to give or what file to edit in order to set that environment variable? //Gunnar |
|
20)
Questions and Answers :
Unix/Linux :
Sorry for the delay
(Message 356)
Posted 21 Aug 2020 by Gunnar Hjern Post: Hi! Most of my machines seems to be doing fine with the new version, and for example the HP Elitedesk 8300 USDT with the CPU i5-3470s seems to run ~15% more efficient! :-) However, I have a problem with old Intel processors, family 6, model 15, ... (and earlier?): These all seem to produce the SIGILL (signal 4) error. I have four such old cans in my "Boinc-farm", and none of them can run any 9.50 tasks. I'm not really a hardware black-belter, so I don't know exactly what CPU instruction are illegal for those CPUs, but many of my older computers can run the project although they do not have the AVX extensions. A good example could be my two old faithful HPs with Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz [Family 6 Model 23 Stepping 10]. They certainly do not have any AVX, although they do feature the SSE instruction set, all the way up to SSE4.1. Could it be the lack of SSE4 instruction set in the CPU that causes the problem? Those of my cans that caused troubles all featured some Intel mobile CPU of model CPU T7500 @ 2.20GHz [Family 6 Model 15 Stepping 11], and they are known to lack the SSE4 set. Good luck with fixing the problem, and a nice weekend!!! //Gunnar |
©2022 MLC@Home Team
A project of the Cognition, Robotics, and Learning (CORAL) Lab at the University of Maryland, Baltimore County (UMBC)