WU Runtime for CPU WUs

Questions and Answers : Issue Discussion : WU Runtime for CPU WUs
Message board moderation

To post messages, you must log in.

AuthorMessage
entity

Send message
Joined: 7 Jul 20
Posts: 6
Credit: 6,264,619
RAC: 0
Message 1307 - Posted: 8 Aug 2021, 18:11:58 UTC

I currently have a few ( about four or five) CPU WUs that have been running for at least 12 hours. Is this normal? Most units finish within about 2 hours. These WUs are progressing and will probably finish and nothing looks abnormal about them. Others on the same machine are finishing in about the normal 2 hours.
ID: 1307 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
entity

Send message
Joined: 7 Jul 20
Posts: 6
Credit: 6,264,619
RAC: 0
Message 1309 - Posted: 9 Aug 2021, 2:47:50 UTC - in response to Message 1307.  

UPDATE: Rebooted the machine to clear any possible cache pollution issue and to sync any possible mismatch between firmware and OS kernel. After reboot almost all have returned to the normal 2 hour execution time. Have 2 that are acting strangely (not showing any CPU time in BoincTasks but progressing at .001% per second) Have been running about 7 hours.

Anyone else running more than 64 simultaneously and seeing WUs acting behaving abnormally? This is app 9.61 running DS1 Parity Machine work. I guess with the new app around the corner, might not be worth trying to diagnose this.
ID: 1309 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 462
Credit: 21,406,548
RAC: 0
Message 1315 - Posted: 11 Aug 2021, 15:09:38 UTC - in response to Message 1309.  

How much RAM do you have? Each WU takes at least half a gig... more it they're GPU WUs. What you're describing almost sounds like some of the WUs were pushed out to swap.
ID: 1315 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
entity

Send message
Joined: 7 Jul 20
Posts: 6
Credit: 6,264,619
RAC: 0
Message 1316 - Posted: 11 Aug 2021, 15:52:56 UTC - in response to Message 1315.  

The machine has 256GB. What I have noticed is, as I reduce the number of concurrently running work, it looks like they start behaving a little better. At 100 concurrent WUs, the runtimes gradually increase to over 4 hours per unit and there are 1 to 4 of them that really act strange as identified above. Dropping the concurrent work to 75 causes the units to all finish under 2 hours with no strange anomalies. I increased the concurrent work to 85 and the runtimes increased to right around 2 hours (maybe a little more for some) with still no anomalies. No swapping is happening at all. This looks more like a cache issue than external memory. Each socket has 4 memory channels and 8 DIMM slots. All populated equally.
ID: 1316 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 462
Credit: 21,406,548
RAC: 0
Message 1317 - Posted: 11 Aug 2021, 16:33:28 UTC - in response to Message 1316.  

Hyperthreading/SMT... maybe threads are fighting over shared FPU resources? might have just as high an impact as the cache issues you brought up.
ID: 1317 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Questions and Answers : Issue Discussion : WU Runtime for CPU WUs

©2022 MLC@Home Team
A project of the Cognition, Robotics, and Learning (CORAL) Lab at the University of Maryland, Baltimore County (UMBC)