|
21)
Message boards :
Science :
Artificial intelligence speeds forecasts to control fusion experiments
(Message 1279)
Posted 26 Jul 2021 by bozz4science Post: Love the enthusiasm here, but how do you see MLC help advance fusion tech? So far, all we have done is the generation of a first of its kind data set of neural networks that was trained with both clean and adversarial data and then show that in the resulting weight space we see clustering that allows to disentangle and tell these networks apart. Amazing results with profound implications, especially if it were to replicate as we dive into the CNN space and image classification with DS4, but so far that's all there is to it. While this will hopefully advance our understanding of ML in general, it will not help to fine tune or develop an AI/ML-model that will help with plasma localisation forecasting in a fusion reactor - At least not as of now, and at least not directly. And ML models/AI systems this important will most likely only be developed in-house and not on a public grid. At the forefront, I reckon that is DOE's Oak Ridge National Laboratory's Summit Supercomputer, Argonne's Aurora as well as NERSC's Cori and upcoming Perlmutter HPC systems. More likely that this will be classified as a national security concern. Just my 2 cents. |
|
22)
Message boards :
Cafe :
ML & more in the news
(Message 1276)
Posted 22 Jul 2021 by bozz4science Post: I'll make a start with this news article: Rensselaer Team Aims to Pave Way for Robust AI in Medical Imaging (Source: HPC Wire) Rensselaer Polytechnic Institute is the very same university handling the BOINC project Milkyway@Home btw. Extract
|
|
23)
Message boards :
Cafe :
ML & more in the news
(Message 1275)
Posted 22 Jul 2021 by bozz4science Post: This is meant as the main place for us to exchange exciting news about these technologies, such as ML (Machine Learning), DL (Deep Learning), AI, Quantum computing, HPC and more. |
|
24)
Message boards :
News :
[TMIM Notes] July 1 2021 --- Celebrating 1 year of MLC@Home!
(Message 1263)
Posted 15 Jul 2021 by bozz4science Post: Wow, that is a lot of great information and took a while to fully work through! But first of all thanks for giving a lot of thought to my questions and your overall outstanding responsiveness. I'll just go through your answers one by one. 1. Makes sense. Torrents will likely be the (only) way to go for the larger data sets. 2. Great priorities. DS4 support is great as this will obviously extend the future data sets and thus possible prospective research questions that can be addressed later on. Hope that you'll find some talented devs that can support you on this quest. 3. Same thoughts here. Varying more parameters than just weights (one at a time or various) would definitely be an interesting setting for future research. I guess that clustering would be apparent earlier on due to larger variation in the output (trained networks). 4. Awesome - Fingers crossed! 5. That was my impression too, and what motivated this question. I'll keep that in mind. 6. I saw that your personal machine computing for MLC got upgraded to a 5950X 16c/32t monster :) Guess the 3600 just got retired for that purpose? :) Evidently, this setting works great but might become unmanageable/unbearable (heat/electricity cost/noise) in the future if/when the project's volunteer base were to scale up soon! 7. Kudos to you. Running a BOINC project is one thing, but a whole other level setting one up all by yourself and custom building the client software. 8. Great! 9. Naive of me to totally forget about privacy issues here. But sure thing, that most cutting edge research with practical applications/use cases is conducted at private corporations and that these IPs are proprietary and protected. My primary interest in MLC always has been in fundamental research with a much broader impact than training/assessment of specific ML models. So many interesting questions yet have to be researched/addressed to satisfy my understanding of a great "comprehension" of AI and ML. Only this will make sure that these technologies are safe to use, benefit the many and will not be malicious. 10. Fair. I am still navigating the easy side of BOINC of us volunteers after more than a year :) 11. Is a donation system sensible/helpful in that case? Gladly would like to donate if that were to help the project thrive and get the support you need to lessen the burden of a one-man show workload you currently buckle. As always, I do highly appreciate you taking the time to curate and care for the MLC community here as well as on other channels. All the best for you and the next 6 months plan for the project! |
|
25)
Questions and Answers :
Unix/Linux :
New client 9.91 in test
(Message 1258)
Posted 14 Jul 2021 by bozz4science Post: Thanks for the short update! Looking forward to testing :) |
|
26)
Message boards :
News :
[TMIM Notes] July 1 2021 --- Celebrating 1 year of MLC@Home!
(Message 1250)
Posted 13 Jul 2021 by bozz4science Post: Thanks as always for the thorough updates! Most highly appreciated :) Glad to finally call myself a long-term supporter of MLC@H and hope to add many years to that! (Another idea for batches based on participation in years such as in MW@H) And what a year it has been! First of all, congrats on building your first BOINC project and shedding a lot of sweat while developing and maintaining the application code. Congrats as well on your first paper that has been generated by leveraging data from this project. And Kudos to you for having sustained the very engagement and responsiveness with us volunteers that you promised from the very start. I am very impressed with the architecture that you have built here on MLC@H that supports so many OS, and even various apps that allow GPU computing! I am extremely pleased with the overall progress, in particular ...
- Progress on the larger data sets - Introduction of a beta app as main test channel - Badges :) - Development progress on a graphics application showing the training progress (Thx to tank busters) - Academic publication using our results - First results surrounding weight-space clustering of the trained networks that validates the initial theory/assumption - Future pipeline of additional sub-projects + thoughts on BOINC-wide promotion - Open-access of results and open-access collaboration with app development via GitLab
2. As there is a wide list of issues on GitLab that I would love to contribute to myself but am ATM rather limited at helping with due to my lack of programming skills, I would be interested in a priorization of that list. What is the most important aspect you are currently working on? Onboarding of DS4? App development? GPU app optimisation? Validation technique? 3. Do you plan to extend the data set sizes even further beyond the current dimension? Is this sensible? 4. How can we help to extend the reach of this project? 5. Discord vs. message boards: Should we use Discord rather than the message boards here for prolonged discussions? 6. What is the current backend server architecture you have and how long do you think it will be sufficient? (assuming a sudden growth of our volunteer base within the next 6 months) You could f.ex. list it on the server status page 7. Who will decide on future research project collaborations? Is there a board/committee that will vote, discuss and organize future collabs within your faculty/university? Will it only be you? 8. Assuming future projects will launch that won't be mlds-based, will you roll out new applications that allow to opt-in/opt-out? 9. Would you consider to have research being conducted on MLC@H that target ML models on real-world data (f.ex. diagnostic ML models for cancer on health related data, accuracy estimation of autopilots/traffic sign detection, etc.) or would you rather like to see/limit future research collabs to be fundamental research into ML? 10. Have you already conceptualized an onboarding plan and T&As for potential future projects? (scientists might be unfamiliar with BOINC related peculiarities and requirements) 11. Do you plan on reaching out to other universities/faculties that conduct research in the field of ML/DL/AI proactively? Or do you plan on advertising MLC@H and wait for research teams to approach you?
|
|
27)
Questions and Answers :
Issue Discussion :
Out of work for CPU
(Message 1236)
Posted 16 Jun 2021 by bozz4science Post: What about the rand WU. Are these networks from DS3? Anyway, hopefully we’ll see the launch of DS4 Witze the new client software soon. |
|
28)
Questions and Answers :
Issue Discussion :
Out of work for CPU
(Message 1233)
Posted 15 Jun 2021 by bozz4science Post: Yeah, I know that. But obtained trained network samples is not the same as the number of work units AFAIK. That said, the admin's latest news post stated the intention of moving forward with the DS1/2 experiments to reach the next defined milestone of 10k samples. That's all. I am just stating what might be in the near future while you were referring to the present as represented by the server stats. Guess I wasn't clear about that. Cheers |
|
29)
Questions and Answers :
Issue Discussion :
Out of work for CPU
(Message 1231)
Posted 15 Jun 2021 by bozz4science Post: The last news (TMIM Notes June 8, 2021) did read that "DS1/DS2 continues along as a slow pace. It will continue in the background until we have 10,000 samples of each." As we are now nearing completion of the 5,000 samples mark, I guess we'll soon see the jump back to 50% completion as we'll move on to train the second half of networks to get to 10,000 samples. This website is also a great place to occasionally check on the progress of the various experiments. I could be wrong though... |
|
30)
Message boards :
Cafe :
Badges
(Message 1228)
Posted 12 Jun 2021 by bozz4science Post: I'd like to push the question of "what about a 100M / 250M badge" to keep the wait between the 1M and 500M badge down a bit? |
|
31)
Questions and Answers :
Windows :
My GTX 760 Wont compute with tasks
(Message 1227)
Posted 12 Jun 2021 by bozz4science Post: Same story with my old 750Ti card. It did crunch some tasks successfully but occasionally when VRAM usage spiked only shortly above the 2 GB threshold, the task immediately failed. I did retire this card from this project as a result of that. Hope that helps. |
|
32)
Questions and Answers :
Windows :
My GTX 760 Wont compute with tasks
(Message 1224)
Posted 11 Jun 2021 by bozz4science Post: Have you set the cc_config file “ <use_all_gpus>0|1</use_all_gpus>” flag to 1? Otherwise BOINC will only use the more powerful GPU by default. |
|
33)
Questions and Answers :
Windows :
Validation error on GPU task done on a 1650
(Message 1223)
Posted 11 Jun 2021 by bozz4science Post: Your validation rate already looks better now. A certain quota of invalids is to be expected and due to the stochastic nature of neural network learning. The admin John Clemens is already aware of this issue being caused mainly due to NaN errors in midst of the training process and will roll out a fix to periodically check the nature of the computed network weights to catch any NaN weight values and revert/reinitialize the weights to avoid some of the otherwise invalid results. |
|
34)
Message boards :
News :
[TWIM Notes] Apr 8 2021
(Message 1158)
Posted 20 Apr 2021 by bozz4science Post: I am looking forward to seeing the recroding as I unfortunately missed the live session on 04/14. Hope that it'll attract new volunteers! Meanwhile, congrats on the paper and on reaching the most recent milestones! |
|
35)
Questions and Answers :
Windows :
long runtimes
(Message 1157)
Posted 20 Apr 2021 by bozz4science Post: Strangely enough, my "upgrade" from a GTX750Ti to a 970 also lead to a massive slowdown in throughput. However, the card is used on another motherboard and with another chipset. I'll run various test to investigate to why that is .... |
|
36)
Message boards :
Cafe :
How does MLC verify results without running multiple tasks per work unit (redundancy)?
(Message 1139)
Posted 8 Apr 2021 by bozz4science Post: Isn't each network learning task a stochastic process, in the sense that no 2 tasks will ever (very unlikely) produce the same result? |
|
37)
Questions and Answers :
Issue Discussion :
All my GPU applications have crushed.
(Message 1061)
Posted 26 Jan 2021 by bozz4science Post: My Linux GTX 750TI machine runs GPU tasks just fine, my Win 7 GTX 1660TI how ever fails to complete tasks without error. I have also upgraded my GPU drivers which didn't have any effect. I noticed on GPU-Z that the GPU was not being utilized. and reading through this thread it looks like the tasks need many CPU cores in order to process quickly. Has there been any progress if resolving these problems? I happen to have the exact same problem! At the beginning of the year, I added a 1660 Super, that did run successfully at first, but eventially after a driver update failed to produce any WUs. It always started crunching and then just "stopped" computation. The tasks itself reported that the computation time acutally increased, but on GPU-Z I also saw zero utilization. I didn't have any other problems with BOINC GPU applications but experienced some issues over at Folding. Apparently, having a seperate CUDA runtime installed along with the CUDA development toolkit can screw things up. Deleted both and no more problems on Folding, just here. 750Ti still crunching here, whenever BOINC assigns it a MLC WU, but in the meantime I pulled my 1660 Super from MLC and currently run other projects on it. |
|
38)
Message boards :
Cafe :
PCIe bandwidth: influence on GPU performance
(Message 1042)
Posted 16 Jan 2021 by bozz4science Post: Well, that could be (part of) the problem... At least, with the 460.xx driver version, it let me run my 2-tasks at a time setup, even if it was much slower than it should have been in x8 vs. x16 mode. Strangely, the performance of the other GPU apps don't seem to be affected. Weird,... I definitely reached the end of my troubleshooting tech skills at this point. I'll likely revisit this in a few weeks. Thanks |
|
39)
Message boards :
Cafe :
PCIe bandwidth: influence on GPU performance
(Message 1040)
Posted 16 Jan 2021 by bozz4science Post: Thanks for testing alex! Appreciate it. However, I am kind of lost with my ongoing investigation. In the meantime I've tried running other GPU applications and compare their performance with the 1660 Super card in x16 vs. x8 mode respectively. All GPU applications, I have run in the past, such as Milkyway, Einstein, Prime Grid, SR Base and Folding seem to be working just fine. I hardly notice any performance hit at all on these apps. However, after updating my driver recently to version 461.09, I can't seem to get any MLC GPU tasks running on my cards with the settings I used to run with. Tasks start fine and finish with only 1 tasks running at a time. If I choose to run 2 tasks concurrently, then the tasks' training data is read into the VRAM, but they never start computing on the GPU. From 2 faulty tasks like these (Task 3641959), I reckon that they only ran on the CPU(?). If I start to run 1 task on the card and switch to the 2-task settings, it just takes a few seconds and after the second task's training data has been loaded to the VRAM as well, the GPU load suddenly drops to 0%. Changing back to the 1 WU at a time setting and/or the usual suspend/restart remedy can't help to get the tasks back running... I am completely stumped by this. Edit: I even tried resetting the project and re-downloading all project file to no avail. |
|
40)
Message boards :
Cafe :
PCIe bandwidth: influence on GPU performance
(Message 1037)
Posted 14 Jan 2021 by bozz4science Post: I have recently been playing with a dual GPU setup, causing my primary card (1660 Super) to run in x8 mode (PCIe 3.0) along with an acient 750Ti. From preliminary runtime comparison, I see roughly a doubling of runtimes on my 1660 Super card. Anyone else running multi-GPUs systems and can report or validate my observations? That would help me in figuring out if running at half the lanes is solely responsible for the performance hit on the MLDS GPU version or something else might be going on. For reference, this is my host: Host 6950 I am running 2 GPU WUs on the 1660 card. Runtimes for the 2,080 credit tasks were roughly ~1,900-2,000 sec and ~3,850-4,000 sec for the 4,160 credit tasks. Now I see runtimes that are roughly 2x of the ones I could observe prior to the installation of the second card. Recently, I could only observe runtimes of ~7,600-7,900 sec for the same 4,160 credit tasks. For making this comparison as robust as possible, I already shortly reverted back to a single GPU setup, before giving this dual-GPU setup a second chance. Interestingly, the readings in GPU-Z for the 1660 Super card didn't change for most categories. It is running at slightly higher temps, but the same fan RPMs, voltage, mem + core clock, memory load, bus interface load, 2 tasks concurrently running. Task manager shows prior and now ~100% CUDA compute load. Rather weird are the changes in the following readings: memory controller load, power draw, GPU load. The latter 3 readings are all down considerably in realtive terms: Prior --> Now - Memory controller: 11% --> 6% [rel. reduction: 45%] - Power draw: ~ 69W --> 59W [rel. reduction: 14%] - GPU load: 85-90% --> 66-69% [rel. reduction: 23%] Another observation: In x16 mode one GPU WU gave me a task manager reading of ~55% CUDA compute load, what prompted me in the first place to change my app.xml settings to compute 2 WU at the same time. Now with the card running in x8 mode, running only one WU results in a task manager reading of ~95%. Any reason for this? Boinc is sill running with the same settings as well: [# physical cores x 2] - 1 Would appreciate your input on whether you have observed this issue in the past as well, as I am kind of clueless here. Thx |
©2022 MLC@Home Team
A project of the Cognition, Robotics, and Learning (CORAL) Lab at the University of Maryland, Baltimore County (UMBC)