|
41)
Message boards :
Science :
A quick update on Dataset 3
(Message 1298)
Posted 29 Jul 2021 by pianoman [MLC@Home Admin] Post: Currently working on getting the updated windows CPU client out the door to match the updated Linux client. Just last night after weeks of trying I finally coaxed pytorch to build statically on windows. Now I need to convince the client to link against it. As I mentione din another thread, this involves a lot of trial and error, building compiler link lines by hand until you get one that works, and then converting it back to cmake syntax for readability. But DS4 itself it pretty much ready to go on the Linux side, I just need to push the test WUs. Maybe I'll get to some of those tonight. For DS4, we'll be training both a dense network and some CNN networks with image classification datasets: 3 MNIST-like ones, and CIFAR-10 and CIFAR-100... both of which are about 160MB. These are industry standard benchmarks. On the web page you see "Dense" and "LeNet5", but I also plan on having a few others as well, some substet of Imagenet, Alexnet, and/or resnet to round out the data set. Feel free to follow along with development on discord. |
|
42)
Questions and Answers :
Windows :
Exit status -1073741515 (0xC0000135) STATUS_DLL_NOT_FOUND
(Message 1297)
Posted 29 Jul 2021 by pianoman [MLC@Home Admin] Post: The problem is that Windows help messages in this case are incredibly unhelpful. I have no way of debugging. The problem is with a DLL somewhere, a dynamically loaded library on the system, is either not there or the wrong version (either too new or too old), or the one on the system conflicts with the one we ship. We don't know which DLL is causing issues or why, just "there's an issue somewhere". However, I *am* trying to fix this a different way. The client is currently linked dynamically, and we ship the DLLs needed for it to run with the client in the same directory. It would be better to link it statically (bundle all the needed libraries into a single exe). Until a few months ago pytorch wouldn't even compile staticlly, but the current new Linux client in mlds-test, with DS4 support, is now statically compiled and seems to work fine (woohoo!) I've spent the past 2 weeks trying to get the windows version of pytorch to compile statically, and just last night I finally got it at least compiling (not linking yet, but compiling at least. Once I incorporate that into the build, hopefully all these DLL problems will go away. The problem is this is very time intensive, as nothing seems to support this out of the box. I have to create the compile commands by hand through trial and error until they work, then convert them back into cmake to be repeatable. The good news is that, if I really did get what I thought I did workign last night, we're only a few days away from getting a new windows client out the door that doesn't rely on DLLs. But that's a lot of ifs so that's not a promise at the moment. As for your specific issue, unless you're a developer who can debug such issues on your own machine and get back to me with more information about what dll is causing the issues, I think I'd rather concentrate on getting the static binary out ASAP and hopefully avoiding these issues altogether. Feel free to follow along on discord. I'm even bringing another developer on board at least temporarily, so hopefully things will be moving a bit faster too. Thanks again for volunteering, and I'm sorry it's not working for you, we'll see what we can do. |
|
43)
Questions and Answers :
Issue Discussion :
MLDG (test) doesn't respect suspend signal
(Message 1271)
Posted 17 Jul 2021 by pianoman [MLC@Home Admin] Post: OK, *NOW* I think it's finally working. At least I was able to suspend a process with the latest test build on my machine. Releasing new test WUs now, so please test at your leisure. |
|
44)
Questions and Answers :
Issue Discussion :
MLDG (test) doesn't respect suspend signal
(Message 1269)
Posted 16 Jul 2021 by pianoman [MLC@Home Admin] Post: Hmmm...what I though was the problem didn't actually fix it. That's disheartening. Will take another crack at it tomorrow. |
|
45)
Questions and Answers :
Issue Discussion :
MLDG (test) doesn't respect suspend signal
(Message 1267)
Posted 15 Jul 2021 by pianoman [MLC@Home Admin] Post: OK, I think I see whats going on and I think I know how to fix it. Expect an update later tonight. Thanks again for reporting. |
|
46)
Questions and Answers :
Issue Discussion :
MLDG (test) doesn't respect suspend signal
(Message 1266)
Posted 15 Jul 2021 by pianoman [MLC@Home Admin] Post: Yes, I've reproduced it here. Definitely a bug, but didn't think to try it until you mentioned it here, so thank you. It's definitely new behavior with the wrapper app. I'll look into it. |
|
47)
Message boards :
News :
[TMIM Notes] July 1 2021 --- Celebrating 1 year of MLC@Home!
(Message 1262)
Posted 15 Jul 2021 by pianoman [MLC@Home Admin] Post: Tons of great questions as usual. Thanks for sticking around! I've been working very hard on wrapping up what I need for my thesis, and this is part of it, but I also need to do a lot of writing as well, so I've been splitting my free time a bit more lately. Note, I see this project continuing beyond completing my thesis, so don't worry, we're not going anywhere. To answer your questions: 1. For DS3 I'm planning to do a torrent, or if researches approach me directly, I can give them one-time access to my download directly from my home systems. DS1/DS2 will likely remain small enough to download directly. 2. At the moment the priority is the updated client and DS4, in the order I mentioned in the other forum post (CPU first, then GPU). As for other features, the highest priority is getting NaN recovery working. That requires someone to change the way the training class is coded in C++ to both swap out the current model and create a new optimizer object.. which isn't possible from the main loop the way I coded it up originally. It's not rocket science, but would take some careful thinking. DS4 support is already in the new client, so all that is left there is to update the WU generation/validation/assimilation scripts to handle it, which shouldn't take long. That's not a complete list, but its a start. 3. I don't think we need more examples of the same things we have. I think DS4 with CNNs will be a big help. For DS5, I'm thinking of maybe varying the shape and size parameters for DS1/2/3/4. Right now, we only vary the weights.. this made the analysis easy, and showed the good results we already have. It would be nice if we could show the same clustering even if we vary the shape of the network (different numbers of hidden nodes, different number of layers, etc..). DS5 may do that, but I haven't decided yet. 4. I've decided I'm going to swing for the fences, and beef up the paper to submit to one of the big ML conferences, AAAI. It's a stretch and hyper competitive, but even if its only a poster, that should get some people's interest. The poster would be more about MLC@Home itself as a research platform. The paper mentions it too. 5. As for Discord, I find I'm personally on discord a lot, and checking this forum a lot less. I've hooked the forum into the RSS feed on discord, but even then I seem to work better on Discord. Still, for conversations that need to exist long term, like longer term discussions, the forum is probably better. But for tech support and actually getting my attention faster, Discord is a better bet. 6. The server computation-wise if fine. Since the upgrade to a 6c/12t processor (thanks Ryzen!) its barely breaking a sweat. Disk space is more of a concern, but I can mitigate that more easily by moving the DS3 archive off of that onto a torrent. Now, network bandwidth hasn't been a huge issue, but I am still running the system off my home network, as my university is just now allowing people back onto campus post pandemic. That means I can move the server onto their network. However, it also means if I need to access it, it'll be 30 minutes away by car instead of under my desk. So, I think we're good for now. 7. At the moment, it's only me. Once it grows beyond me, I'd like to set up a small governance committee, but for now all you have is my word on that. I will say that I will keep to the tenets laid out on our homepage, that the resulting data needs to be made publicaly available as soon as possible. 8. Of course, any new applications will be separate BOINC application queues, so you can opt in/out as you wish. 9. It's difficult, but I'm open to anything. I've learned early on if you want something you create to thrive, you have to be open to other's ideas on how to use it. In general, I feel ML on the BOINC platform is more suited to trying out a lot of small things in parallel, rather than trying to build one bigger network to target a single problem. That said, one can look at, trying a whole bunch of parameters on a network at once to see which ones perform better, etc. Another problem with real-world data is privacy. First, dealing with medical data is a potential whole can of worms in the US, as the data could be personally identifying, so hosting the data on something like MLC@Home could potentially open us up to some liability there. On the other end of the spectrum, there's so many businesses looking to make money on ML, they tend to jealously guard their data. So getting real world data is often a problem. Keeping things at the fundamental ML level tends to avoid all those problems and I think has a broader impact on the field as a whole. But like I said, we'd absolutely listen to any honest researcher with an idea! 10. Heck, I'm still learning BOINC peculiarities, so while I've had some informal discussions with other researchers and mentioned some of the caveats, I don't have anything formal. 11. So the funny thing at the moment is that MLC@Home is completely unfunded. The new server was purchased with some grant money, but that's it. I'm able to work on it because I'm essentially self-funded and work another full-time job (I'm a part-time student). The main issue I've had talking with other researchers is that they need funding to continue as a grad student, and working on MLC won't get them any. I've been looking at potential funding opportunities and collaborations, but one particular one I had high hopes for didn't pan out. So at the moment it's advertise and hope to get noticed. Attending conferences like AAAI might help.
|
|
48)
Message boards :
Science :
Why is there 1027643 workunits in dataset 3 instead of 1000000?
(Message 1260)
Posted 14 Jul 2021 by pianoman [MLC@Home Admin] Post: Well, it's always good to have a few extra as sometimes things go wrong. I also got a little impatient trying to finish DS3, so I queued up a few extra so that fast, active machines would get them and complete instead of waiting for stragglers/disappeared machines to time out WUs (1 week) before they'd get resubmitted. The progress scoreboard on the wiki caps the colored boxes at 10k, but the actual count is likely higher for each one. When I cut the full DS3 release (I've been busy with the client) it'll have a 10k of each from the 10k+extra for each network type. |
|
49)
Questions and Answers :
Windows :
Computation errors on 2080 Ti
(Message 1259)
Posted 14 Jul 2021 by pianoman [MLC@Home Admin] Post: Well, I'm about to turn my attention back to the windows build anyway for the new client, so I'll take a look within the next week. I'm sorry lots of you are having issues! |
|
50)
Questions and Answers :
Issue Discussion :
Work done total and average work done not being updated.
(Message 1257)
Posted 14 Jul 2021 by pianoman [MLC@Home Admin] Post: I'm sorry, I don't understand.. what is the issue? According to here: https://www.mlcathome.org/mlcathome/results.php?hostid=13483 this machine seems to be completing and getting credit for tasks... |
|
51)
Questions and Answers :
Unix/Linux :
New client 9.91 in test
(Message 1256)
Posted 14 Jul 2021 by pianoman [MLC@Home Admin] Post: Linux/amd64 only at the moment..even the arm builds are grumpy at me. Right now the priority for getting the new client to work is Linux/amd64 > Linux/arm > Windows/CPU > Linux/CUDA > Windows/CUDA > Linux/ROCm. I you're running Linux, please test! |
|
52)
Questions and Answers :
Unix/Linux :
New client 9.91 in test
(Message 1255)
Posted 14 Jul 2021 by pianoman [MLC@Home Admin] Post: I'm testing the new client on mldstest. It seems faster! On my fast test machine (Ryzen 5950x) it'll complete a Parity DS1 WU in about 23 minutes (with no other WUs running). This client is a major change from the current production one, which is why it's taking so long.. The changes include: * DS4 support * Uses the BOINC wrapper instead of the native BOINC API (requires extensive backend changes) * Jump to PyTorch v1.9 (up from v1.7) * Statically compiled! (no more appimage or fuse requirement, and should support more Linux distros now, like CentOS 7) I'm concerned that all these changes may introduce errors on hardware that I can't readily test. At the moment, only Linux/CPU is supported.. amd64 is out, the ARM builds are...progressing ... (raspberry pis are slow to build big complicated things like pytorch from scratch). And I'm still testing validation and the backend changes to work with the wrapper, but it's now open for testing and I encourage everyone on Linux to try it out... just expect some growing pains! I'll also beable to pump out some DS4 test units too, hopefully. |
|
53)
Questions and Answers :
Unix/Linux :
GPU support update 11/23
(Message 1254)
Posted 14 Jul 2021 by pianoman [MLC@Home Admin] Post: That's an interesting question! The short answer is probably not, since we're using pytorch and that requires cuda 10, which requires a minimum driver version in of 440(?) or higher. But I'll look into it. |
|
54)
Message boards :
News :
[TMIM Notes] July 1 2021 --- Celebrating 1 year of MLC@Home!
(Message 1244)
Posted 2 Jul 2021 by pianoman [MLC@Home Admin] Post: This Month in MLC@Home Notes for July 1 2021 A monthly summary of news and notes for MLC@Home Summary Happy first birthday to MLC@Home! This project went live on July 1, 2020, and caught on pretty quickly in the BOINC community. We've remained focused on our goal, which is breaking open the black box of neural networks to explain why they make the choices they do. This is so important as machine learning permeates more and more of our everyday life; from autonomous cars, to banking decisions, and medical diagnoses. We need research to understand how to keep bias out of these systems. We are also the first, and to date only, public machine learning focused BOINC project. This means that while we could leverage the BOINC framework for job management, we have to build most of the ML client infrastructure from the ground up. This hasn't always been smooth, but we've accomplished so much in the past year regardless. In the past year, we have:
|
|
55)
Message boards :
Cafe :
GitHub and OpenAI launch a new AI tool that generates its own code
(Message 1243)
Posted 2 Jul 2021 by pianoman [MLC@Home Admin] Post: It's pretty darn cool work, not going to lie. It's funny, when I started my undergraduate career (err... 1996), but uncle who had worked in computers for a while pulled me aside and asked me if I was going to do hardware or software. I was signed up for the computer engineering degree (hardware), so I said so. He told me "good, someday they're going to write a compiler or AI that can write all the software it needs itself, from just what someone says... but they'll always need hardware to run on". It's not necessarily true (i wound up going into low level software.. OS, firmware, hypervisors, which are still pretty safe from this stuff too), but I'll always remember that remark when something like this comes up. |
|
56)
Message boards :
Cafe :
I noticed that there is no rocm app now in apps.php
(Message 1241)
Posted 1 Jul 2021 by pianoman [MLC@Home Admin] Post: Not cancelled, just broken and on the back burner to be fixed :/. |
|
57)
Questions and Answers :
Issue Discussion :
Out of work for CPU
(Message 1237)
Posted 16 Jun 2021 by pianoman [MLC@Home Admin] Post: Yep, we're run out of initial DS3 WUs the CPU queue. This is great news! Unfortunately, DS4 isn't quite ready yet, although I think we're only a day or two away. In the mean time, later today, I'll send out some DS1/DS2 WUs from the GPU queue to the CPU one. Expect more to start flowing within the next 24 hours. Sorry for the gap, this is completely on me. |
|
58)
Questions and Answers :
Windows :
My GTX 760 Wont compute with tasks
(Message 1226)
Posted 12 Jun 2021 by pianoman [MLC@Home Admin] Post: How much vram is on your 760 system? Sadly, you need a GPU with a minimum of 3 GB of VRAM to crunch MLC. |
|
59)
Message boards :
News :
[TMIM Notes] June 8 2021
(Message 1216)
Posted 9 Jun 2021 by pianoman [MLC@Home Admin] Post: Yes and no. That's more for projects that use well-known applications... like folding projects tend to use a small handful of common executables (some closed source), but do different things with them. So it makes sense instead of each project bundling the executable separately that there be a common set of known executables for projects to just point to. The ML world is a little different (less mature?) at the moment. We have common libraries (pytorch, tensorflow, mxnet, xgboost, etc..) , and everyone wraps their own custom executables around these libraries. So I'm not sure how helpful it'll be for us. The issue with the failed app is that the native BOINC API uses SIGALRM (without documenting that it does) , and so does PyTorch (barely documenting that it does). The app ran fine when run standalone, but once it was launched by the boinc client it crashed once pytorch overwrote the SIGALRM handler and boinc tried to call it. The fix is to re-write the app using the boinc wrapper, which means undoing a lot of the work I put in to use the native API in the first place (which annoys me), and means that WUs created for the old app won't be compatible with those from the new app. And its just turned out to take a while longer than planned. |
|
60)
Message boards :
News :
[TMIM Notes] Jun 8 2021 posted
(Message 1214)
Posted 9 Jun 2021 by pianoman [MLC@Home Admin] Post: MLC@Home has posted the Jun 8 2021 edition of its monthly "This Month In MLC@Home" newsletter! A monthly update including the new client with DS4 support, a note on disk space on the server, and a mvoe to monthly updates instead of weekly ones moving forward. Read the update and join the discussion here. |
©2022 MLC@Home Team
A project of the Cognition, Robotics, and Learning (CORAL) Lab at the University of Maryland, Baltimore County (UMBC)