Posts by bozz4science

1) Questions and Answers : Issue Discussion : GPU taks are freezing up (Message 1414)
Posted 25 days ago by bozz4science
I’d say that’s definitely a better choice than win11 atm
2) Questions and Answers : Windows : 3 times longer runtimes under Win 11? (Message 1401)
Posted 24 Oct 2021 by bozz4science
As far as I can tell, the slowdown of your runtimes are not (only) connected to your Win11 upgrade.

Just take a look at your stderr file output of your slower task vs. faster ones prior to your upgrade:

Prior to upgrade (Quick task)

[2021-10-23 18:12:53	                main:574]	:	INFO	:	Epoch 1556 | loss: 0.0311986 | val_loss: 0.0312002 | Time: 1904.55 ms
[2021-10-23 18:12:55	                main:574]	:	INFO	:	Epoch 1557 | loss: 0.0311993 | val_loss: 0.0312003 | Time: 1928.85 ms
[2021-10-23 18:12:57	                main:574]	:	INFO	:	Epoch 1558 | loss: 0.0311996 | val_loss: 0.0312058 | Time: 1899.59 ms
[2021-10-23 18:12:59	                main:574]	:	INFO	:	Epoch 1559 | loss: 0.0312017 | val_loss: 0.0312017 | Time: 1945.45 ms
[2021-10-23 18:13:01	                main:574]	:	INFO	:	Epoch 1560 | loss: 0.0311992 | val_loss: 0.0312004 | Time: 1947.02 ms
[2021-10-23 18:13:03	                main:574]	:	INFO	:	Epoch 1561 | loss: 0.0312008 | val_loss: 0.0312009 | Time: 1929.72 ms

Runtime: 4,057 sec & CPU time: 3,881 sec

After upgrade (Slow task)
[2021-10-24 15:38:04	                main:574]	:	INFO	:	Epoch 1546 | loss: 0.0310213 | val_loss: 0.0312575 | Time: 8615.13 ms
[2021-10-24 15:38:12	                main:574]	:	INFO	:	Epoch 1547 | loss: 0.0310251 | val_loss: 0.0312677 | Time: 7332.4 ms
[2021-10-24 15:38:22	                main:574]	:	INFO	:	Epoch 1548 | loss: 0.0310204 | val_loss: 0.0312522 | Time: 9820.51 ms
[2021-10-24 15:38:30	                main:574]	:	INFO	:	Epoch 1549 | loss: 0.0310245 | val_loss: 0.0312602 | Time: 8587.51 ms
[2021-10-24 15:38:37	                main:574]	:	INFO	:	Epoch 1550 | loss: 0.0310288 | val_loss: 0.031248 | Time: 6888.06 ms

Runtime: 14,101 sec & CPU time: 10,746 sec

The variation in computing time per epoch should be marginal only, but in the slow task, it does vary considerably with ∆(deviation) approaching max ~4sec or almost 58%. That combined with the larger gap between runtime and CPU time stays to show that some other system process is infringing the computing process. It seems that your CPU is overcommitted, maybe even through some Win11 upgrade related background processes. And while you're at it, it wouldn't hurt to upgrade your GPU driver. Currently you have installed version 456.71. According to NVIDIA's driver page, the latest version is 496.13. :)

Hope that helps!
3) Questions and Answers : Issue Discussion : Rogue batch ? (Message 1377)
Posted 1 Oct 2021 by bozz4science
Thanks for looking into it!
4) Questions and Answers : Windows : Long run times on second GPU (Message 1354)
Posted 29 Aug 2021 by bozz4science
A while back I had experienced the same issue, when I added a second card to my host and both PCIe slots then ran in a dual x8 slot width configuration instead of x16. PCIe bandwidth seemingly has a large influence on GPU WUs as far as I can tell. All of my cards (750Ti / 970 / 1660S) that I tested in a x16 slot config were faster than running in x8 mode (can be easily tweaked in most modern BIOS/UEFI settings). I eventually settled on a single GPU setup with my 1660S running in x16 mode. I do slightly remember that the cards had about a 50 to 100% performance penalty when running in x8 mode because the GPU WUs rely heavily on CPU support
5) Message boards : News : Updated CPU client 9.9x release and issues (Message 1349)
Posted 28 Aug 2021 by bozz4science
Awesome news! I do know from personal experience how frustrating it can be to suddenly start seeing errors after weeks of coding and testing. I applaud your continued commitment and am especially excited for the DS4 support and the future science we can help to inform with new data sets and network types. Also great to see better NaN handling incorporated into version 9.90+ as this often resulted in the GPU WUs crunching through 100s of epochs, just carrying through the NaN while the erroneous result was only all too apparent when inspecting the WU log afterwards.

Thanks for all your work!
6) Questions and Answers : Issue Discussion : Memory requirement for CPU WUs (Message 1336)
Posted 25 Aug 2021 by bozz4science
That's great news. Thanks!
7) Questions and Answers : Windows : Exit status -1073741515 (0xC0000135) STATUS_DLL_NOT_FOUND (Message 1335)
Posted 25 Aug 2021 by bozz4science
Same here on my Win10 host. Running just fine. All 20+ WUs ran successfully and validated. Slowly draining my CPU test WUs queue but all is looking fine so far.
8) Questions and Answers : Issue Discussion : Out of work for CPU (Message 1329)
Posted 25 Aug 2021 by bozz4science
Thanks for the quick update. Was wondering already why I couldn't receive any CPU work on my Win machine. I'll start running my test WUs :)
9) Message boards : Cafe : ML & more in the news (Message 1326)
Posted 19 Aug 2021 by bozz4science
Now in 3D - Deep learning techniques help visualize X-ray data in three dimensions (ScienceDaily): Processing of 2D images is easy and can be done on a smartphone nowadays. Now, scientists might have found a new way of handling 2D data to interpret it as a fully modelled 3D representation using AI. This may be the key to turning X-ray data into visible, understandable shapes at a much faster rate. A breakthrough in this area could have implications for astronomy, electron microscopy and other areas of science dependent on large amounts of 3D data.

Here's the corresponding paper: Rapid 3D nanoscale coherent imaging via physics-aware deep learning
10) Questions and Answers : Windows : New Nvidia 3080 graphic card showing 0% GPU use (Message 1323)
Posted 16 Aug 2021 by bozz4science
another idea is to open a third party software such as GPU-Z and look for GPU load (in %), GPU clock (MHz), bus interface load (%), memory load as well as power draw (in W) in the sensors tab. If these numbers are close to the max numbers reported in the tech spec sheets for your GPU card model on sites such as "techpowerup", you should be good.

Alternatively look for the GPU specs in Win task manager and select one of the measurements to be "CUDA". A high number indicates a large compute load on the card.

Finally, the app is not optimised for the RTX 3000 series as far as I know. To boost the overall load, you could bump up the number of tasks that run concurrently on you card via an app_config.xml file that you put in your project directory/subfolder. You should keep in mind to reserve one CPU core for each GPU task that you run so that you don't overcommit your CPU, causing your GPU to always wait on the CPU.

Hope that helps. If not, just report back on the numbers that these tools show.
11) Questions and Answers : Windows : New Windows CPU client in mldstest (9.90) (Message 1320)
Posted 12 Aug 2021 by bozz4science
Greatly appreciate your effort! Waiting for the first test WUs to arrive on my windows host.
12) Message boards : News : [TMIM Notes] Aug 6 2021 (Message 1311)
Posted 10 Aug 2021 by bozz4science
Amazing news! Thanks for keeping us up to date :)

Sorry to hear that the client development for Windows is still troubling you, but glad that you are joining forces with another developer. Is he/her a volunteer or affiliated to UMBC?

Do you have any insights into how runtimes will compare across DS4 vs. DS1/2 CPU WUs so far?

We'll help you push DS1 + 2 over the finish line soon :)
13) Message boards : Cafe : ML & more in the news (Message 1310)
Posted 10 Aug 2021 by bozz4science
Artificial Intelligence may diagnose dementia in a day (BBC):
The algorithm can identify patterns in the scans even expert neurologists cannot see and match them to patient outcomes in its database. [...] In pre-clinical tests, it has been able to diagnose dementia, years before symptoms develop, even when there is no obvious signs of damage on the brain scan.

Racial profiling/classification from x-rays sounds a bit scary if you ask me :)
14) Message boards : Science : A quick update on Dataset 3 (Message 1302)
Posted 31 Jul 2021 by bozz4science
Thank you so much for taking the time to getting back to me. The whole development process sounds tiresome, especially that you have to build compiler link lines by hand! I will follow along the development process on Discord, but not really sure if I can be of much help. You could launch a DS4 assault on the Linux machines and release these DS4 WUs into the wild before you get the Windows client updated and ready to go. I am also way more excited for the launch of the DS4 experiment than I was at the launch of DS2/3. Implications for industry applications and research could be far more profound with these kinds of networks.

Do you have an estimate for the expected runtime of these networks? I guess that the input files are rather large, requiring more RAM and VRAM on the GPU, as well as the runtimes to be longer...

Appreciate your responsiveness as always and keep up the awesome work!
15) Questions and Answers : Windows : Exit status -1073741515 (0xC0000135) STATUS_DLL_NOT_FOUND (Message 1300)
Posted 29 Jul 2021 by bozz4science
Your update is much appreciated as is your effort you pour into MLC! Thx for keeping us in the loop!
16) Message boards : Science : Artificial intelligence speeds forecasts to control fusion experiments (Message 1287)
Posted 26 Jul 2021 by bozz4science
Main page

MLC@Home's initial project, the Machine Learning Dataset Generator (MLDS), will generate a large dataset of simple networks trained with both clean and adversarial data. To our knowledge, this is the first dataset of its kind.

All that you mention is yet to come. (as pointed out in the last main update and the video you mention)
17) Message boards : Science : Artificial intelligence speeds forecasts to control fusion experiments (Message 1285)
Posted 26 Jul 2021 by bozz4science
But that is exactly what we supported on MLC so far. The MLDS application distributing WUs across DS 1-3 had the goal in mind to generate this one of a kind data set. It’s all laid out on the main page. I’ll leave it at that.

Yet, I have to say that I am an avid supporter of this project and will continue to do so as I believe that the implications so far have far reaching impact and I anticipate many more interesting experiments here on MLC.
18) Message boards : Science : Artificial intelligence speeds forecasts to control fusion experiments (Message 1282)
Posted 26 Jul 2021 by bozz4science
MLC will help by ensuring the reliability of the machines.
How? I think that this research so for that we contribute to with MLDS (Machine Learning Data Set Generation) will help many practitioners understand that models can be flawed/manipulated if training data is malicious and trained networks can be detected through weight space clustering analysis. But I don't see how this applies to sophisticated models with much more enormous data sets and way more complex parameters. So far, we don't train exact models with any practical use in the real world. The training data has no inherent meaning (toy data) and thus any derived implications are illuminating urgent questions in basic research but do not inform any modelling and/or model tuning decisions. Nor do they provide domain knowledge that is at the heart of any complex model. And while helping to separate "good" networks from "bad" ones according to the data set having been used for training, data integrity checks themselves have to be informed largely through experts/domain knowledge.

In my understanding, future applications/experiments here on MLC might do exactly what you suggest, but I don't see how current DS1-3 runs and its results would help fusion tech run any more safely.
19) Message boards : Science : A quick update on Dataset 3 (Message 1280)
Posted 26 Jul 2021 by bozz4science
I do see DS4 listed on the main page and that means that you are working hard on the launch of DS4 behind the scenes. Can you give an estimate for when you expect DS4 to launch on MLC? Thx
20) Message boards : Science : Artificial intelligence speeds forecasts to control fusion experiments (Message 1279)
Posted 26 Jul 2021 by bozz4science
Love the enthusiasm here, but how do you see MLC help advance fusion tech? So far, all we have done is the generation of a first of its kind data set of neural networks that was trained with both clean and adversarial data and then show that in the resulting weight space we see clustering that allows to disentangle and tell these networks apart. Amazing results with profound implications, especially if it were to replicate as we dive into the CNN space and image classification with DS4, but so far that's all there is to it. While this will hopefully advance our understanding of ML in general, it will not help to fine tune or develop an AI/ML-model that will help with plasma localisation forecasting in a fusion reactor - At least not as of now, and at least not directly. And ML models/AI systems this important will most likely only be developed in-house and not on a public grid. At the forefront, I reckon that is DOE's Oak Ridge National Laboratory's Summit Supercomputer, Argonne's Aurora as well as NERSC's Cori and upcoming Perlmutter HPC systems. More likely that this will be classified as a national security concern.
Just my 2 cents.

Next 20

©2021 MLC@Home Team
A project of the Cognition, Robotics, and Learning (CORAL) Lab at the University of Maryland, Baltimore County (UMBC)