Test tasks not sent?

Questions and Answers : Issue Discussion : Test tasks not sent?
Message board moderation

To post messages, you must log in.

AuthorMessage
zombie67 [MM]
Avatar

Send message
Joined: 1 Jul 20
Posts: 34
Credit: 26,118,410
RAC: 0
Message 656 - Posted: 17 Oct 2020, 6:07:33 UTC

I see that there are two apps, regular and test. And the server status page shows that the test tasks have 3 in progress, with 17 unsent. Why unsent? I have 2 of the 3 tasks in progress running on my Raspberry Pi machines. So I know some are being sent. I also have windows and amd64 linux machines attached and asking for those tasks unsuccessfully. What is holding up the rest of the test tasks from being issued?
Reno, NV
Team: SETI.USA
ID: 656 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sergey Kovalchuk

Send message
Joined: 1 Jul 20
Posts: 31
Credit: 123,959
RAC: 0
Message 657 - Posted: 17 Oct 2020, 8:56:33 UTC

maybe the problem will be solved by increasing the size of the shared memory
this error is common for many projects with default settings
appears after adding a second application with a large difference in the volume of tasks
it is also advisable to tweak the parameters of the scheduler queue - so that there is guaranteed space for each application
ID: 657 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 462
Credit: 21,406,548
RAC: 0
Message 658 - Posted: 18 Oct 2020, 0:06:04 UTC

I'm aware, and its certainly possible Sergey is right. But there's some more going on too. The WUs are going out and getting worked on, returned, validated, and assimilated as they should be, but for some reason my validation log isn't capturing the validation output... or rather, it captures it sometimes, but not others. I'll look into it more as GPU testing gets under way. For now its just testing DS3 WUs with more accurate flops accounting to make sure there are no surprises there, and moving from shorter to longer DS1+2 units. on the fly (again, to make sure there's no surprises).

Thanks for the pointers on what to look for. I'll poke at it later in the weekend.
ID: 658 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 462
Credit: 21,406,548
RAC: 0
Message 671 - Posted: 21 Oct 2020, 19:48:16 UTC

For the record I changed the shmem size buffer to a (much) larger size. Orders of magnitude larger. Lets see if that improves the situation.
ID: 671 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zombie67 [MM]
Avatar

Send message
Joined: 1 Jul 20
Posts: 34
Credit: 26,118,410
RAC: 0
Message 673 - Posted: 23 Oct 2020, 2:03:53 UTC - in response to Message 671.  

For the record I changed the shmem size buffer to a (much) larger size. Orders of magnitude larger. Lets see if that improves the situation.

Hmm. It doesn't seem to be working.

Are they pre-assigned to a particular CPU/OS, and there aren't candidates of those flavors asking for tasks? Just guessing.
Reno, NV
Team: SETI.USA
ID: 673 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 462
Credit: 21,406,548
RAC: 0
Message 676 - Posted: 23 Oct 2020, 19:18:20 UTC

No, nothing special. I strongly suspect this is my error somewhere and the lack of logging is not helping me track it down. I'm still poking at it.
ID: 676 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sergey Kovalchuk

Send message
Joined: 1 Jul 20
Posts: 31
Credit: 123,959
RAC: 0
Message 677 - Posted: 23 Oct 2020, 21:34:17 UTC - in response to Message 676.  

Is there any prioritization of tasks, such as re-released ones?
Shortened deadline? "Reliable" hosts for replication?
I.e. any options that can push rare test tasks out of the queue

It is worth trying to reissue test tasks - re-add them to the queue
ID: 677 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zombie67 [MM]
Avatar

Send message
Joined: 1 Jul 20
Posts: 34
Credit: 26,118,410
RAC: 0
Message 692 - Posted: 25 Oct 2020, 15:04:58 UTC

It looks like you fixed the issue. Great!

Out of curiosity, what was the problem?
Reno, NV
Team: SETI.USA
ID: 692 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 462
Credit: 21,406,548
RAC: 0
Message 695 - Posted: 25 Oct 2020, 16:17:12 UTC - in response to Message 692.  

Yeah, pretty big facepalm on my part.

First, I cancelled out the old/stuck WUs in the queue. But the real problem was I hadn't updated the feeder command line to include "--allapps" as an argument. https://boinc.berkeley.edu/trac/wiki/BackendPrograms . That tells the feeder to interleave WUs from all apps when scheduling, instead of treating them as one.. where they were getting lost in the noise.

I also increased the queue used by the feeder up to 8192, instead of the default 100 (and the corresponding, unrelatedly-named config option to actually feed that queue with more than 200 WUs at a time). Running software built for computers circa 2000 in 2020 allows you to .. ahem.. expand a few defaults.
ID: 695 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Questions and Answers : Issue Discussion : Test tasks not sent?

©2024 MLC@Home Team
A project of the Cognition, Robotics, and Learning (CORAL) Lab at the University of Maryland, Baltimore County (UMBC)