Questions and Answers :
Issue Discussion :
WU Runtime for CPU WUs
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 7 Jul 20 Posts: 6 Credit: 6,264,619 RAC: 0 |
I currently have a few ( about four or five) CPU WUs that have been running for at least 12 hours. Is this normal? Most units finish within about 2 hours. These WUs are progressing and will probably finish and nothing looks abnormal about them. Others on the same machine are finishing in about the normal 2 hours. |
|
Send message Joined: 7 Jul 20 Posts: 6 Credit: 6,264,619 RAC: 0 |
UPDATE: Rebooted the machine to clear any possible cache pollution issue and to sync any possible mismatch between firmware and OS kernel. After reboot almost all have returned to the normal 2 hour execution time. Have 2 that are acting strangely (not showing any CPU time in BoincTasks but progressing at .001% per second) Have been running about 7 hours. Anyone else running more than 64 simultaneously and seeing WUs acting behaving abnormally? This is app 9.61 running DS1 Parity Machine work. I guess with the new app around the corner, might not be worth trying to diagnose this. |
|
Send message Joined: 30 Jun 20 Posts: 462 Credit: 21,406,548 RAC: 0 |
How much RAM do you have? Each WU takes at least half a gig... more it they're GPU WUs. What you're describing almost sounds like some of the WUs were pushed out to swap. |
|
Send message Joined: 7 Jul 20 Posts: 6 Credit: 6,264,619 RAC: 0 |
The machine has 256GB. What I have noticed is, as I reduce the number of concurrently running work, it looks like they start behaving a little better. At 100 concurrent WUs, the runtimes gradually increase to over 4 hours per unit and there are 1 to 4 of them that really act strange as identified above. Dropping the concurrent work to 75 causes the units to all finish under 2 hours with no strange anomalies. I increased the concurrent work to 85 and the runtimes increased to right around 2 hours (maybe a little more for some) with still no anomalies. No swapping is happening at all. This looks more like a cache issue than external memory. Each socket has 4 memory channels and 8 DIMM slots. All populated equally. |
|
Send message Joined: 30 Jun 20 Posts: 462 Credit: 21,406,548 RAC: 0 |
Hyperthreading/SMT... maybe threads are fighting over shared FPU resources? might have just as high an impact as the cache issues you brought up. |
©2022 MLC@Home Team
A project of the Cognition, Robotics, and Learning (CORAL) Lab at the University of Maryland, Baltimore County (UMBC)