Questions and Answers :
Issue Discussion :
Too many WUs
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 27 Jul 20 Posts: 8 Credit: 1,153,620 RAC: 0 |
My BOINC queue is set to 1.0/0.1 days but MLC sends me far more WUs than I can possibly finish in a day or before the deadline in 2 days. This causes a couple of problems. A small one is if one wants to transition from say WCG to MLC the excess MLC WUs cause all MLC WUs to "Run as High Priority." This means BOINC does not timeslice back to WCG to finish them off. Have to babysit until WCG WUs are cleared out. Bigger issue is that you need fast turn around time for your WUs. Sending too many guarantees you slow yourself down since many WUs will come back as unstarted. |
|
Send message Joined: 27 Jul 20 Posts: 8 Credit: 1,153,620 RAC: 0 |
After MLC becomes the only project running on a computer it seems to behave better. Maybe the machine learned ;-)
|
|
Send message Joined: 30 Jun 20 Posts: 462 Credit: 21,406,548 RAC: 0 |
Honestly we're still figuring out the black magic of the BOINC scheduler. It is.... complex. As you'll see on other threads here. |
|
Send message Joined: 27 Jul 20 Posts: 8 Credit: 1,153,620 RAC: 0 |
One option is to emulate what WCG did on their Device Profiles page. Now all my MLC rigs are overloaded. I'll try 0.1/0.1 and see if that helps. |
|
Send message Joined: 12 Jul 20 Posts: 48 Credit: 73,492,193 RAC: 0 |
Honestly we're still figuring out the black magic of the BOINC scheduler. It is.... complex. As you'll see on other threads here. I have had this problem (too many work units) more or less at random on various projects, starting with BOINC 7.16.3. They changed the scheduler significantly at that time, though since nobody understood it to begin with, no one can say what the difference is. I posted on it a few months ago on WCG, which should be the best-behaved project, but did not learn much. https://www.worldcommunitygrid.org/forums/wcg/viewthread_thread,41860 I think the problem may just go away with training, though whether on the server or the client end I have no idea. I still see it occasionally, mainly on new projects. Hang in there. |
|
Send message Joined: 24 Jul 20 Posts: 30 Credit: 3,485,605 RAC: 0 |
I think one problem is your idea of how BOINC should work. BOINC is designed to work on its own and make its own decisions. You seem to wish to be always in control, in detail. But babysitting, micromanagement and sudden changes just make things more difficult for the system. Set up the basic guidelines, avoid restrictions and leave it alone, that's how BOINC works best. It may be boring and it takes patience but it works fine. My cache settings are the same as yours, 1.0+0.1 days, and I added MLC with 10% resource share a few days ago. No interference since and all is well. The client doesn't fetch more tasks than it can handle and does work at will. Turnaround times are approaching one day and will hopefully settle there. After that I may re-evaluate. Down the thread you ask for more controls to be made available. IMO that's not the way to go. I wouldn't mind if those controls existed but personally I wouldn't use them. I think what Jim1348 mentioned is a different issue. I've seen it occasionally but not often enough to develop a theory. It seems to happen with new projects or new applications. BOINC then fetches the same amount of work again and again, as fast as possible and apparently without end. No New Tasks will bring it to its senses but by then you'll have far more than you can handle. Surely you would notice if that happened. |
|
Send message Joined: 7 Jul 20 Posts: 23 Credit: 39,708,780 RAC: 358 |
I think one problem is your idea of how BOINC should work. BOINC is designed to work on its own and make its own decisions. You seem to wish to be always in control, in detail. But babysitting, micromanagement and sudden changes just make things more difficult for the system. Set up the basic guidelines, avoid restrictions and leave it alone, that's how BOINC works best. It may be boring and it takes patience but it works fine. I have far more issues in the long term trying to babysit rather than let Boinc decide what is best. The more projects you have, the more frustrating it is going to be for you, the user, in the end. I'm still trying to figure out how exactly resource share works. Is it just less time contacting servers? Less workunits per project? I know it won't be an instant process and will likely take a solid 3 weeks or so to sort itself out. I do have a small cache of work here too. That always seems to work best. |
|
Send message Joined: 12 Jul 20 Posts: 48 Credit: 73,492,193 RAC: 0 |
I know it won't be an instant process and will likely take a solid 3 weeks or so to sort itself out. I do have a small cache of work here too. That always seems to work best.To speed it up, you use a cc_config.xml file: <cc_config> <options> <rec_half_life_days>1.000000</rec_half_life_days> </options> </cc_config> That should get it to converge within a couple of days. I use a short cache too. It gives BOINC less of an opportunity to go wrong. |
|
Send message Joined: 20 Jul 20 Posts: 23 Credit: 1,958,714 RAC: 0 |
Boinc is one of the worst programs in terms of running independently. Often it runs WUs inefficiently (either downloads too many WUs, or runs WUs with less than optimal CPU cores due to ram shortage; or just misconfigured multicore WUs. I see them all the time, one WU hogging up multiple CPU cores, and these WUs are public, non beta ones!). There also have been numerous complaints on Boinc's WU priority algorithm (the algorithm that organizes and decides which WUs get processed first); specifically, some WUs are done nearly 98%, and are pending, while new WUs are started. Another thing Boinc does extremely lousy, is that the end user really doesn't have control. Boinc depends on a whole bunch of algorithms and dependencies, to make life easier, but they actually do the opposite. You can for instance, set in Boinc manager certain configurations. However, these configurations may only work for one system. Boinc manager offers different profiles, however your system count is limited to the amount profiles you have. If you want a different setting for each client, you can't use the BAM account manager. I mean, you could, but if you have more than say 6 different systems (that are different in configuration), BAM is not the way to go. But if you do use it, and one system has few CPU cores, with 2 Nvidia GPUs, while the other only has many CPUs, and an Intel IGP, setting up for these differences of configurations is a real pain in BAM. Full Control isn't given and prioritized to the client. You can for instance set something into the client, only to have BAM revert it to a global setting. And it's infuriating to see how BAM constantly reverts for instance, projects set to GPU only, BAM can reset to GPU + CPU when no GPU work is available. Yes, you can alter the cc_config and app_config files, however that job is tedious, and changes when a project brings out new WUs, or anything in existing projects change (eg: ram usage of WUs, or WU intensity (eg: atom or thread count)). I can really go on and on about the real damn annoying issues Boinc has; but the fact of the matter is, that it's a really lousy program. They could learn a lot from late 1990s developed client software, like Napster, or Azureus/Vuze, which had at least the processing algorithms correct for downloads, and FAH, which can tailor certain WUs to fit certain hardware. If FAH client tells the server it has a high end RTX GPU (like a 2080), the server will assign and send more of the higher intense WUs (high atom count, high parallel processing). If the client tells the server it has a low end GPU, or only CPU available, the server will assign more of the lower intensity WUs. In Boinc, the end user has to adjust for that, however, there's no standard. For instance, if a CPU runs a WU in X-amount of time, and a GPU runs the WU in Y-amount of time, make it a reference for all WUs, from all projects. Based on that reference, it would have been so much easier for end users to say "Oh, my GPU is 3x as fast as reference, so I'd like to tune my system WUs for that performance". No, instead Boinc has no standard, doesn't care what your hardware is, and often runs WUs at a less than optimal setting. In some cases, GPU WUs run at 1/3rd of my 2080Ti's capabilities (even when tripling them). Sorry for the rant, but no, boinc does definitely NOT work best when you just leave it. It's AI is virtually zero! As for the amount of WUs, I was forced to abort around 200 WUs today, because they would have never finished on time. And this is with me being nice, and offering two days solely to MLC, and put other projects on hold. |
|
Send message Joined: 24 Jul 20 Posts: 30 Credit: 3,485,605 RAC: 0 |
Come on, you can't seriously compare FAH to BOINC. The functionality of the FAH client is very limited. That's not negative, it does what it's designed to do. But what is that really? Get a task, finish it, return it, get another one. No decisions to make. And you take that as an example of being smart? BOINC is much more complex than that. Different projects, different applications, work cache, deadlines, resource shares. People use all that to their liking. And then they're annoyed when a machine can't do it on its own or makes decisions they don't approve of. Set up BOINC as simple as FAH. Eliminate the cache, run a single project, and leave it alone. Then you are a big step closer to what FAH does and BOINC will do it well enough. You can carefully add more complexity but remember YOU are responsible for providing the right conditions. |
|
Send message Joined: 20 Jul 20 Posts: 23 Credit: 1,958,714 RAC: 0 |
Come on, you can't seriously compare FAH to BOINC. The functionality of the FAH client is very limited. That's not negative, it does what it's designed to do. But what is that really? Get a task, finish it, return it, get another one. No decisions to make. And you take that as an example of being smart? BOINC is much more complex than that. Different projects, different applications, work cache, deadlines, resource shares. People use all that to their liking. And then they're annoyed when a machine can't do it on its own or makes decisions they don't approve of. Set up BOINC as simple as FAH. Eliminate the cache, run a single project, and leave it alone. Then you are a big step closer to what FAH does and BOINC will do it well enough. You can carefully add more complexity but remember YOU are responsible for providing the right conditions. Yes, and that is exactly the reason why BOINC isn't working very well. Because the scientists need some sort of standard to work towards, like with FAH. Boinc Mgr won't be able to learn anything from FAH in terms of WU queuing, however they can by looking how well the WUs are optimized for the hardware it's running at. In my opnion, scientists need a standardized condition, and from that condition each client can individually adjust. The way things currently are, is that one project aims towards very small WUs, that don't work well on fast hardware; while others run the GPU to 70 or 80% load, while even others are using the full 100% load. It's a topic more suited for the boinc forums, but the whole boinc statistics, boinc setup, boinc BAM, is a spagetti of all kinds of systems working against one another... Instead of editing the cc and app_config files, they should have at least some sort of adjustment within the GUI for those things. Like, use only 1 GPU; Don't use CPU; or set CPU and/or GPU values to for instance 0.5 or 0.33. |
|
Send message Joined: 7 Jul 20 Posts: 23 Credit: 39,708,780 RAC: 358 |
Come on, you can't seriously compare FAH to BOINC. The functionality of the FAH client is very limited. That's not negative, it does what it's designed to do. But what is that really? Get a task, finish it, return it, get another one. No decisions to make. And you take that as an example of being smart? BOINC is much more complex than that. Different projects, different applications, work cache, deadlines, resource shares. People use all that to their liking. And then they're annoyed when a machine can't do it on its own or makes decisions they don't approve of. Set up BOINC as simple as FAH. Eliminate the cache, run a single project, and leave it alone. Then you are a big step closer to what FAH does and BOINC will do it well enough. You can carefully add more complexity but remember YOU are responsible for providing the right conditions. Not sure where FAH even came into this. FAH is its own completely separate program and its own project, independent of anything else. Boinc has dozens of projects that utilize its capabilities and I'm not sure how one can expect biological, numbers, and machine learning projects to act the same on every CPU or GPU. PS: I have yet to see FAH have the ability to only use half a GPU, so ....really confused there too. Every project is coded just slightly different and it is up to the end user to connect to a project *AFTER* all other projects are run dry. This has been said time and time again. Sure, FAH works great. It also doesn't download a day or more of cached tasks, have different varieties of projects. So obviously it works better. Playing with Boinc and trying to micromanage it only leads one to become more frustrated - let projects run dry before adding other projects, read about on forums to figure out what works best. This is where Boinc is different. Every project is not optimized for every type of CPU or GPU. I feel like you're comparison is similar to Apple vs Windows - sure, Apple hardware runs great on the Apple operating system. Whereas Windows has a variety of variables that cause it to run better or worse, depending on the machine in question. Even with the 2 day deadline, I never had Boinc download too much work for this project, even on an older core 2 duo. People like to change the work cache setting and this is usually where things start deteriorating. Just my thoughts of course. |
|
Send message Joined: 8 Jul 20 Posts: 1 Credit: 456,245 RAC: 0 |
Could the Deadline be set to at least 6 days,that would help a bit! I have a few Too late to validate.. aprox 25k sec to complete task on that computer! |
©2022 MLC@Home Team
A project of the Cognition, Robotics, and Learning (CORAL) Lab at the University of Maryland, Baltimore County (UMBC)