Validation errors.

Questions and Answers : Issue Discussion : Validation errors.
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
adrianxw

Send message
Joined: 1 Jul 20
Posts: 11
Credit: 286,780
RAC: 0
Message 209 - Posted: 22 Jul 2020, 12:35:39 UTC

I have been crunching the project for three weeks. In that time, I have had 12 validation errors. That is more errors of that type than I have had at projects that I've run for a decade or more. I checked, it is not a Linux / Windows issue. The rate of validation error results is seriously making me consider staying with the project, those 12 hours could have crunched a Rosetta work unit. Explanation required.
ID: 209 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
seronegativo

Send message
Joined: 22 Jul 20
Posts: 3
Credit: 7,138,698
RAC: 0
Message 212 - Posted: 22 Jul 2020, 16:34:49 UTC - in response to Message 209.  

same problem.
ID: 212 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
entity

Send message
Joined: 7 Jul 20
Posts: 6
Credit: 6,264,619
RAC: 0
Message 213 - Posted: 22 Jul 2020, 17:22:57 UTC

I've been crunching for 3 days and I have 8 invalids. They are spread across all machines (Intel and AMD).
ID: 213 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
adrianxw

Send message
Joined: 1 Jul 20
Posts: 11
Credit: 286,780
RAC: 0
Message 215 - Posted: 22 Jul 2020, 18:44:46 UTC

Clearly, there is something wrong here. No new tasks set for now, Admin comment AND action required.
ID: 215 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 12 Jul 20
Posts: 48
Credit: 73,492,193
RAC: 0
Message 216 - Posted: 22 Jul 2020, 19:00:06 UTC - in response to Message 215.  

It is a minuscule invalid rate, around 1%.
I hope to increase my contribution this weekend when a machine becomes free.
ID: 216 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
adrianxw

Send message
Joined: 1 Jul 20
Posts: 11
Credit: 286,780
RAC: 0
Message 219 - Posted: 22 Jul 2020, 20:41:54 UTC

The miniscule failure rate for me amounts to 1.5 Rosetta work units, you believe 1.5 Rosettas are worthless?
ID: 219 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bozz4science

Send message
Joined: 9 Jul 20
Posts: 142
Credit: 11,536,204
RAC: 3
Message 220 - Posted: 22 Jul 2020, 20:53:27 UTC

All right. We all got it. Hope it gets resolved soon. The admin is by now aware of the issue I guess - more clutter won't help the issue. Rather describe your problem in more detail. Maybe we can move this post to the General Issue forum.

No need to be that harsh... and noone is making decisions for you here. If you think Rosetta is more worthy of your resources (time + money) feel free to dedicate yourself to it.
ID: 220 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 12 Jul 20
Posts: 48
Credit: 73,492,193
RAC: 0
Message 223 - Posted: 23 Jul 2020, 0:56:28 UTC - in response to Message 219.  

Apparently you are not aware that not all projects yield deterministic results. That is, there is some randomness.
That is one of the reasons they have validity checks. Why do you think they send it out to two computers for comparison?
You might as well complain that they are wasting half their processing power if you like. Go where you want.
ID: 223 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 462
Credit: 21,406,548
RAC: 0
Message 227 - Posted: 23 Jul 2020, 3:22:24 UTC

Sorry for the delay in response, Wednesdays and Thursdays tend to be my busy days.

The validation error rate is a bit too high for my comfort level, even though it is less than 0,8% project wide.

This is a new project, and I'm still tweaking the validation algorithm to find the right balance between granting credit, scientific merit, and making it so the system can't be gamed. The types of WUs being prioritized right now (the ones starting with EightBit and Parity) are also particularly susceptible to triggering the existing negative validation criteria, so the past week has seen a bit of an uptick that wasn't there before (project-wide).

Machine learning is stochastic, which makes validation (in the BOINC sense) a bit tricky. However, that randomness is also exactly what we're trying to capture and categorize in this project! It just makes my job hard. I've already adjusted the validation criteria again a few hours ago to address this issue. This won't make validation errors go away completely (they will never go away completely.. that's the nature of randomness), but it should be much better now.
ID: 227 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
adrianxw

Send message
Joined: 1 Jul 20
Posts: 11
Credit: 286,780
RAC: 0
Message 233 - Posted: 24 Jul 2020, 7:39:10 UTC

>>> If you think Rosetta is more worthy of your resources

It is not that I consider Rosetta more worthy per se, I crunch lots of projects, I simply used Rosetta as an examlpe. I have, however, upped priority on several projects, Rosetta included, which are working on the current Corona virus situation, I feel totally justified in doing that. It was the simple waste of resources that worried me. I have deleted the project from my systems. Others are, of course, completely free to do what they wish with their research portfolios.

I wish him all the best, as a retired software engineer, I have a lot of interest in activities such as his.
ID: 233 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 12 Jul 20
Posts: 48
Credit: 73,492,193
RAC: 0
Message 235 - Posted: 24 Jul 2020, 15:15:15 UTC - in response to Message 233.  

Rosetta is very worthy. But it won't take much perusal of their forums to find all the complaints about their "inefficient" code.

But efficiency is a tricky concept. What are you comparing it to, and what are you trying to achieve?
If this project allows some artificial intelligence machine to discover a cure for COVID-19, then the validity errors won't matter.
ID: 235 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
floyd

Send message
Joined: 24 Jul 20
Posts: 30
Credit: 3,485,605
RAC: 0
Message 238 - Posted: 25 Jul 2020, 2:44:29 UTC - in response to Message 227.  

I'm still tweaking the validation algorithm to find the right balance

Machine learning is stochastic, which makes validation (in the BOINC sense) a bit tricky

No doubt correct and fair validation is difficult but here I dare guess the results were not rated invalid, instead validation was never attempted. I browsed through people's tasks and the majority of those with validate errors were replacement tasks where the result was returned after the delayed original result. They were invalidated just for formal reasons. As far as I know it's up to the project to decide under what circumstances results are considered for validation, but the first thing I'd suggest is to increase your quite short deadlines if possible. If you give people some more time to return their results you'll have fewer unnecessary extra tasks in the first place and if you allow people to cache some work that gets you time to abort unneeded tasks before they are processed.
The reason I care about this is it just happened to me after I joined only today. I can tell you, if the first feedback you get is a validate error that's not exactly motivating. On the other hand, I watched this project for a while before I decided to join. I try to choose my projects carefully and this one looks promising. The topic is interesting and the way you run the project is a welcome change. There's so many projects running mostly behind the scenes, where staff don't have the time or don't care to keep their volunteers involved. I hope this one stays different. Keep up the good work.
ID: 238 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 462
Credit: 21,406,548
RAC: 0
Message 242 - Posted: 25 Jul 2020, 17:02:48 UTC

There is at least one other non-stochastic source of validation errors I'm seeing in the logs. I'm tracking it down now. Hang tight.
ID: 242 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 462
Credit: 21,406,548
RAC: 0
Message 243 - Posted: 25 Jul 2020, 17:59:35 UTC - in response to Message 242.  

OK, there are two issues I've uncovered.

In the past 24 hours, there have been 157 validation errors out of 23563 results, or 0.6%.

One is a small race condition between the boinc result validation process and my custom extra layer/DB that keeps track when generating new WUs, It didn't cause an issue with the science, but did cause some extra valid results to be marked as invalid. I counted 57 such invalid results. This didn't show up in my internal test, but did show up in testing at scale. I've since fixed this.

The second issue isn't going to make people happy. In going through some of the results, I was surprised to see a few results returning 'NaN'. For those in the ML community, you know this numerical instability sometimes happens in training... but I hadn't experienced it with these simple test cases and wasn't expecting it, but it apparently does when running at the scale that we are. Even worse, these results were marked as valid until last night, and were generating follow-on WUs, which will also eventually generate NaNs. Machines that generate Follow-on WUs are particularly susceptible to this type of numerical instability. I counted an even 100 of these results since last night. I'm trying to find these effected WUs and cancel them, but its going to take some time and I'm sure I'll miss some, so expect a slight uptick in validation errors as these get flushed/filtered from the system. over the next few days. Fortunately, our tight deadline means that after a few days these shouldn't be an issue anymore, and also fortunately, these extra NaN are still only a small number of WUs. So even if there is an uptick, it should still be well below 1% overall.

Like I said, there are going to be some growing pains. I'm addressing these problems as I uncover them. Thanks for reporting these issues and your patience.
ID: 243 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 462
Credit: 21,406,548
RAC: 0
Message 244 - Posted: 25 Jul 2020, 18:53:51 UTC

Moved to Issue Discussion forum.

Also, it looks like I caught the NaN issue before it got too big, It effected a total of ~190 WUs, and all of them are now cancelled. Future NaN results will be caught properly as validation errors and not generate new WUs.
ID: 244 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
floyd

Send message
Joined: 24 Jul 20
Posts: 30
Credit: 3,485,605
RAC: 0
Message 253 - Posted: 26 Jul 2020, 3:45:53 UTC - in response to Message 243.  

did cause some extra valid results to be marked as invalid. I counted 57 such invalid results
Is that what I experienced? If it is, 57 in a day seems low. I had 4 in hours, 3 of them caused by the same wingman who is notoriously late. I still do not wish to do such work that in the end turns out to be not needed, even if it is validated and credit granted, as it blocks valuable resources. So I now set up a cron job that aborts any replacement task I receive. I understand this goes somewhat beyond a solution to my problem and if someone has a better idea they're welcome but I think something needs to be done about this. Between the lines I read stretching the deadline is not acceptable and I wouldn't know how to make people return their results faster, but many have more errors on their account than actual results. Not computation errors but timed out or aborted tasks. That's such an overhead, is looks hardly acceptable to me.

these results were marked as valid until last night, and were generating follow-on WUs, which will also eventually generate NaNs
Okay, so those results are not suitable for generating new work units. But even if the results are not as expected or as desired, if they are correct results of your tasks as you assigned them, agreed upon by independent hosts, wouldn't that mean they're valid? Not necessarily useful but valid? Of course given the low number of such results this question is not really important, but still interesting.
ID: 253 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 462
Credit: 21,406,548
RAC: 0
Message 255 - Posted: 26 Jul 2020, 15:55:19 UTC - in response to Message 253.  
Last modified: 26 Jul 2020, 15:56:39 UTC

did cause some extra valid results to be marked as invalid. I counted 57 such invalid results
Is that what I experienced? If it is, 57 in a day seems low. I had 4 in hours, 3 of them caused by the same wingman who is notoriously late. I still do not wish to do such work that in the end turns out to be not needed, even if it is validated and credit granted, as it blocks valuable resources. So I now set up a cron job that aborts any replacement task I receive. I understand this goes somewhat beyond a solution to my problem and if someone has a better idea they're welcome but I think something needs to be done about this. Between the lines I read stretching the deadline is not acceptable and I wouldn't know how to make people return their results faster, but many have more errors on their account than actual results. Not computation errors but timed out or aborted tasks. That's such an overhead, is looks hardly acceptable to me.


Raising the timeout is always an option, but it can cause uneven follow-on work generation. It's a balance. When I started 3 weeks ago, the timeout was the default 7 days, and we wound up with a lot of work sitting queued on users machines, and other users complaining there was no work for them. My understanding is that, over time, the boinc client and server keep per-project metrics where they learn what work a host can accomplish within a time period and only request that much work. But if you've just joined a project, it needs time to learn it.

these results were marked as valid until last night, and were generating follow-on WUs, which will also eventually generate NaNs
Okay, so those results are not suitable for generating new work units. But even if the results are not as expected or as desired, if they are correct results of your tasks as you assigned them, agreed upon by independent hosts, wouldn't that mean they're valid? Not necessarily useful but valid? Of course given the low number of such results this question is not really important, but still interesting.


It effects a tiny number of results, and gets into thorny issues of identifying users trying game the system for credit.
ID: 255 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 27 Jul 20
Posts: 8
Credit: 1,153,620
RAC: 0
Message 256 - Posted: 27 Jul 2020, 13:51:09 UTC - in response to Message 235.  
Last modified: 27 Jul 2020, 13:51:36 UTC

If this project allows some artificial intelligence machine to discover a cure for COVID-19, then the validity errors won't matter.
Is this a goal of this project? First I've seen this but it's only my second day here and I'm still reading.
ID: 256 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bozz4science

Send message
Joined: 9 Jul 20
Posts: 142
Credit: 11,536,204
RAC: 3
Message 260 - Posted: 27 Jul 2020, 16:13:29 UTC - in response to Message 256.  

Hey Aurum,

simply put: no it's not. It's not tied to any specific use case. At least so far. The research question is concerned with a rather simple question (understanding machine learning / neural network training) that could however have much further reaching implications beyond any specific use case.

You can take a look here: https://www.mlcathome.org/mlcathome/forum_thread.php?id=30
Think you'll find it an interesting read addressing your question.

Cheers
ID: 260 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
floyd

Send message
Joined: 24 Jul 20
Posts: 30
Credit: 3,485,605
RAC: 0
Message 262 - Posted: 27 Jul 2020, 19:02:11 UTC - in response to Message 255.  

My understanding is that, over time, the boinc client and server keep per-project metrics where they learn what work a host can accomplish within a time period
That's right and it works quite well in simple cases. For more complex scenarios the estimates are less reliable and can change any time. BOINC needs manoeuvring room to react effectively then.

and only request that much work
That does not work exactly the way you seem to think it does. While the client has a somewhat vague idea of how much work can be done in a time period, the time period to fetch work for is given by the user alone. Inadequate settings are a frequent cause of trouble. Unfortunately many users don't notice that, or don't care, or just don't understand the problem. Hence my suggestion to increase the deadlines somewhat if possible, just to make it less likely that people over-fetch when they don't adjust their settings to a new project with characteristics different from what they're used to. Or make them do just that but I wouldn't know how given the conditions above.

It effects a tiny number of results, and gets into thorny issues of identifying users trying game the system for credit.
I hadn't thought of cheating and I trust you know more about that than I do. It's your decision after all.
ID: 262 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Questions and Answers : Issue Discussion : Validation errors.

©2022 MLC@Home Team
A project of the Cognition, Robotics, and Learning (CORAL) Lab at the University of Maryland, Baltimore County (UMBC)