Questions and Answers :
Issue Discussion :
Validation errors.
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 1 Jul 20 Posts: 11 Credit: 286,780 RAC: 0 |
I have been crunching the project for three weeks. In that time, I have had 12 validation errors. That is more errors of that type than I have had at projects that I've run for a decade or more. I checked, it is not a Linux / Windows issue. The rate of validation error results is seriously making me consider staying with the project, those 12 hours could have crunched a Rosetta work unit. Explanation required. |
|
Send message Joined: 22 Jul 20 Posts: 3 Credit: 7,138,698 RAC: 0 |
same problem. |
|
Send message Joined: 7 Jul 20 Posts: 6 Credit: 6,264,619 RAC: 0 |
I've been crunching for 3 days and I have 8 invalids. They are spread across all machines (Intel and AMD). |
|
Send message Joined: 1 Jul 20 Posts: 11 Credit: 286,780 RAC: 0 |
Clearly, there is something wrong here. No new tasks set for now, Admin comment AND action required. |
|
Send message Joined: 12 Jul 20 Posts: 48 Credit: 73,492,193 RAC: 0 |
It is a minuscule invalid rate, around 1%. I hope to increase my contribution this weekend when a machine becomes free. |
|
Send message Joined: 1 Jul 20 Posts: 11 Credit: 286,780 RAC: 0 |
The miniscule failure rate for me amounts to 1.5 Rosetta work units, you believe 1.5 Rosettas are worthless? |
|
Send message Joined: 9 Jul 20 Posts: 142 Credit: 11,536,204 RAC: 3 |
All right. We all got it. Hope it gets resolved soon. The admin is by now aware of the issue I guess - more clutter won't help the issue. Rather describe your problem in more detail. Maybe we can move this post to the General Issue forum. No need to be that harsh... and noone is making decisions for you here. If you think Rosetta is more worthy of your resources (time + money) feel free to dedicate yourself to it. |
|
Send message Joined: 12 Jul 20 Posts: 48 Credit: 73,492,193 RAC: 0 |
Apparently you are not aware that not all projects yield deterministic results. That is, there is some randomness. That is one of the reasons they have validity checks. Why do you think they send it out to two computers for comparison? You might as well complain that they are wasting half their processing power if you like. Go where you want. |
|
Send message Joined: 30 Jun 20 Posts: 462 Credit: 21,406,548 RAC: 0 |
Sorry for the delay in response, Wednesdays and Thursdays tend to be my busy days. The validation error rate is a bit too high for my comfort level, even though it is less than 0,8% project wide. This is a new project, and I'm still tweaking the validation algorithm to find the right balance between granting credit, scientific merit, and making it so the system can't be gamed. The types of WUs being prioritized right now (the ones starting with EightBit and Parity) are also particularly susceptible to triggering the existing negative validation criteria, so the past week has seen a bit of an uptick that wasn't there before (project-wide). Machine learning is stochastic, which makes validation (in the BOINC sense) a bit tricky. However, that randomness is also exactly what we're trying to capture and categorize in this project! It just makes my job hard. I've already adjusted the validation criteria again a few hours ago to address this issue. This won't make validation errors go away completely (they will never go away completely.. that's the nature of randomness), but it should be much better now. |
|
Send message Joined: 1 Jul 20 Posts: 11 Credit: 286,780 RAC: 0 |
>>> If you think Rosetta is more worthy of your resources It is not that I consider Rosetta more worthy per se, I crunch lots of projects, I simply used Rosetta as an examlpe. I have, however, upped priority on several projects, Rosetta included, which are working on the current Corona virus situation, I feel totally justified in doing that. It was the simple waste of resources that worried me. I have deleted the project from my systems. Others are, of course, completely free to do what they wish with their research portfolios. I wish him all the best, as a retired software engineer, I have a lot of interest in activities such as his. |
|
Send message Joined: 12 Jul 20 Posts: 48 Credit: 73,492,193 RAC: 0 |
Rosetta is very worthy. But it won't take much perusal of their forums to find all the complaints about their "inefficient" code. But efficiency is a tricky concept. What are you comparing it to, and what are you trying to achieve? If this project allows some artificial intelligence machine to discover a cure for COVID-19, then the validity errors won't matter. |
|
Send message Joined: 24 Jul 20 Posts: 30 Credit: 3,485,605 RAC: 0 |
I'm still tweaking the validation algorithm to find the right balance Machine learning is stochastic, which makes validation (in the BOINC sense) a bit tricky No doubt correct and fair validation is difficult but here I dare guess the results were not rated invalid, instead validation was never attempted. I browsed through people's tasks and the majority of those with validate errors were replacement tasks where the result was returned after the delayed original result. They were invalidated just for formal reasons. As far as I know it's up to the project to decide under what circumstances results are considered for validation, but the first thing I'd suggest is to increase your quite short deadlines if possible. If you give people some more time to return their results you'll have fewer unnecessary extra tasks in the first place and if you allow people to cache some work that gets you time to abort unneeded tasks before they are processed. The reason I care about this is it just happened to me after I joined only today. I can tell you, if the first feedback you get is a validate error that's not exactly motivating. On the other hand, I watched this project for a while before I decided to join. I try to choose my projects carefully and this one looks promising. The topic is interesting and the way you run the project is a welcome change. There's so many projects running mostly behind the scenes, where staff don't have the time or don't care to keep their volunteers involved. I hope this one stays different. Keep up the good work. |
|
Send message Joined: 30 Jun 20 Posts: 462 Credit: 21,406,548 RAC: 0 |
There is at least one other non-stochastic source of validation errors I'm seeing in the logs. I'm tracking it down now. Hang tight. |
|
Send message Joined: 30 Jun 20 Posts: 462 Credit: 21,406,548 RAC: 0 |
OK, there are two issues I've uncovered. In the past 24 hours, there have been 157 validation errors out of 23563 results, or 0.6%. One is a small race condition between the boinc result validation process and my custom extra layer/DB that keeps track when generating new WUs, It didn't cause an issue with the science, but did cause some extra valid results to be marked as invalid. I counted 57 such invalid results. This didn't show up in my internal test, but did show up in testing at scale. I've since fixed this. The second issue isn't going to make people happy. In going through some of the results, I was surprised to see a few results returning 'NaN'. For those in the ML community, you know this numerical instability sometimes happens in training... but I hadn't experienced it with these simple test cases and wasn't expecting it, but it apparently does when running at the scale that we are. Even worse, these results were marked as valid until last night, and were generating follow-on WUs, which will also eventually generate NaNs. Machines that generate Follow-on WUs are particularly susceptible to this type of numerical instability. I counted an even 100 of these results since last night. I'm trying to find these effected WUs and cancel them, but its going to take some time and I'm sure I'll miss some, so expect a slight uptick in validation errors as these get flushed/filtered from the system. over the next few days. Fortunately, our tight deadline means that after a few days these shouldn't be an issue anymore, and also fortunately, these extra NaN are still only a small number of WUs. So even if there is an uptick, it should still be well below 1% overall. Like I said, there are going to be some growing pains. I'm addressing these problems as I uncover them. Thanks for reporting these issues and your patience. |
|
Send message Joined: 30 Jun 20 Posts: 462 Credit: 21,406,548 RAC: 0 |
Moved to Issue Discussion forum. Also, it looks like I caught the NaN issue before it got too big, It effected a total of ~190 WUs, and all of them are now cancelled. Future NaN results will be caught properly as validation errors and not generate new WUs. |
|
Send message Joined: 24 Jul 20 Posts: 30 Credit: 3,485,605 RAC: 0 |
did cause some extra valid results to be marked as invalid. I counted 57 such invalid resultsIs that what I experienced? If it is, 57 in a day seems low. I had 4 in hours, 3 of them caused by the same wingman who is notoriously late. I still do not wish to do such work that in the end turns out to be not needed, even if it is validated and credit granted, as it blocks valuable resources. So I now set up a cron job that aborts any replacement task I receive. I understand this goes somewhat beyond a solution to my problem and if someone has a better idea they're welcome but I think something needs to be done about this. Between the lines I read stretching the deadline is not acceptable and I wouldn't know how to make people return their results faster, but many have more errors on their account than actual results. Not computation errors but timed out or aborted tasks. That's such an overhead, is looks hardly acceptable to me. these results were marked as valid until last night, and were generating follow-on WUs, which will also eventually generate NaNsOkay, so those results are not suitable for generating new work units. But even if the results are not as expected or as desired, if they are correct results of your tasks as you assigned them, agreed upon by independent hosts, wouldn't that mean they're valid? Not necessarily useful but valid? Of course given the low number of such results this question is not really important, but still interesting. |
|
Send message Joined: 30 Jun 20 Posts: 462 Credit: 21,406,548 RAC: 0 |
did cause some extra valid results to be marked as invalid. I counted 57 such invalid resultsIs that what I experienced? If it is, 57 in a day seems low. I had 4 in hours, 3 of them caused by the same wingman who is notoriously late. I still do not wish to do such work that in the end turns out to be not needed, even if it is validated and credit granted, as it blocks valuable resources. So I now set up a cron job that aborts any replacement task I receive. I understand this goes somewhat beyond a solution to my problem and if someone has a better idea they're welcome but I think something needs to be done about this. Between the lines I read stretching the deadline is not acceptable and I wouldn't know how to make people return their results faster, but many have more errors on their account than actual results. Not computation errors but timed out or aborted tasks. That's such an overhead, is looks hardly acceptable to me. Raising the timeout is always an option, but it can cause uneven follow-on work generation. It's a balance. When I started 3 weeks ago, the timeout was the default 7 days, and we wound up with a lot of work sitting queued on users machines, and other users complaining there was no work for them. My understanding is that, over time, the boinc client and server keep per-project metrics where they learn what work a host can accomplish within a time period and only request that much work. But if you've just joined a project, it needs time to learn it. these results were marked as valid until last night, and were generating follow-on WUs, which will also eventually generate NaNsOkay, so those results are not suitable for generating new work units. But even if the results are not as expected or as desired, if they are correct results of your tasks as you assigned them, agreed upon by independent hosts, wouldn't that mean they're valid? Not necessarily useful but valid? Of course given the low number of such results this question is not really important, but still interesting. It effects a tiny number of results, and gets into thorny issues of identifying users trying game the system for credit. |
|
Send message Joined: 27 Jul 20 Posts: 8 Credit: 1,153,620 RAC: 0 |
If this project allows some artificial intelligence machine to discover a cure for COVID-19, then the validity errors won't matter.Is this a goal of this project? First I've seen this but it's only my second day here and I'm still reading. |
|
Send message Joined: 9 Jul 20 Posts: 142 Credit: 11,536,204 RAC: 3 |
Hey Aurum, simply put: no it's not. It's not tied to any specific use case. At least so far. The research question is concerned with a rather simple question (understanding machine learning / neural network training) that could however have much further reaching implications beyond any specific use case. You can take a look here: https://www.mlcathome.org/mlcathome/forum_thread.php?id=30 Think you'll find it an interesting read addressing your question. Cheers |
|
Send message Joined: 24 Jul 20 Posts: 30 Credit: 3,485,605 RAC: 0 |
My understanding is that, over time, the boinc client and server keep per-project metrics where they learn what work a host can accomplish within a time periodThat's right and it works quite well in simple cases. For more complex scenarios the estimates are less reliable and can change any time. BOINC needs manoeuvring room to react effectively then. and only request that much workThat does not work exactly the way you seem to think it does. While the client has a somewhat vague idea of how much work can be done in a time period, the time period to fetch work for is given by the user alone. Inadequate settings are a frequent cause of trouble. Unfortunately many users don't notice that, or don't care, or just don't understand the problem. Hence my suggestion to increase the deadlines somewhat if possible, just to make it less likely that people over-fetch when they don't adjust their settings to a new project with characteristics different from what they're used to. Or make them do just that but I wouldn't know how given the conditions above. It effects a tiny number of results, and gets into thorny issues of identifying users trying game the system for credit.I hadn't thought of cheating and I trust you know more about that than I do. It's your decision after all. |
©2022 MLC@Home Team
A project of the Cognition, Robotics, and Learning (CORAL) Lab at the University of Maryland, Baltimore County (UMBC)