MLC@home WUs using 2 CPUs

Questions and Answers : Issue Discussion : MLC@home WUs using 2 CPUs
Message board moderation

To post messages, you must log in.

AuthorMessage
ProDigit

Send message
Joined: 20 Jul 20
Posts: 23
Credit: 1,958,714
RAC: 0
Message 199 - Posted: 21 Jul 2020, 15:15:38 UTC

Stock settings, just added the project, and it's taking 2 CPUs per WU, but only using 1 CPU worth of data crunching.
I have a dual core, 4 thread Intel X86 CPU and am running Linux.
Perhaps '1 thread per WU' is misconfigured to '1CPU core'?
ID: 199 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 462
Credit: 21,406,548
RAC: 0
Message 200 - Posted: 21 Jul 2020, 16:08:18 UTC - in response to Message 199.  

The system should be using one CPU thread per WU. Is BOINC telling you it us using 2 CPUs, or are you looking and seeing that an extra thread is spawned but generally dormant while running?

There may be a brief period in the beginning where a second thread is spawned to load the dataset into memory, but then that thread lays dormant for the rest of the WU while the crunching happens. I have pytorch configured not to do this, but sometimes it thinks it knows better :/ .
ID: 200 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProDigit

Send message
Joined: 20 Jul 20
Posts: 23
Credit: 1,958,714
RAC: 0
Message 201 - Posted: 21 Jul 2020, 17:25:48 UTC

Ok, I ran into the 60 minute timeout to edit the previous post,
The real issue is that I'm getting low memory errors.
It's hard to see on Boinctui, what the exact reason is, but I found out it's memory related.

I'm running from a 2 core 4 thread, with 2GB. The OS uses up about 150MB, so there's about 1,86GB of RAM left; taken up by 2x MLC WUs.
2 CPU threads are waiting for memory.

What are my settings and options?

<app>config>
<app>
<name>mldg</name>
<gpu_versions>
<cpu_usage>1</cpu_usage>
</gpu_versions>
</app>
</app_config>
ID: 201 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 462
Credit: 21,406,548
RAC: 0
Message 202 - Posted: 21 Jul 2020, 17:33:55 UTC - in response to Message 201.  

Oh then that is correct. Each WU takes ~700MB memory. So you'll only be able to fit 2 WUs running at once on a 2GB memory machine.
ID: 202 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProDigit

Send message
Joined: 20 Jul 20
Posts: 23
Credit: 1,958,714
RAC: 0
Message 204 - Posted: 22 Jul 2020, 3:01:32 UTC
Last modified: 22 Jul 2020, 3:10:55 UTC

since the project is shared with other projects, I thought found the correct app_config.xml settings here for me:

<app_config>
<app>
<name>mlds</name>
<max_concurrent>1</max_concurrent>
</app>
</app_config>


The name is 'mlds'.

With running 1 unit, I am able to still share the remaining 2 to 3 threads to other projects (depending on memory availability).

Is there a possibility the WUs can be trimmed to use more like 500MB?
It would work out better with my servers. Even 50-100MB lower memory, is something I'd appreciate.
I have to reconfigure 20 units, to accomodate MLC, and later an additional 20 servers.
Would be nice if perhaps there was some sort of settings in each person's account on the webpage (https://www.mlcathome.org/mlcathome/prefs.php?subset=project) to set the amount of threads.
I presume it's using either Docker or VM, to get this high RAM usage?

Also, is the ram data compressible? If so, I'm thinking about installing zram on these units.
They don't have much emmc space either.

*edit: MLC is also the first project that would completely crash my units, if 3 or 4 MLC WUs are loaded right away. As soon as they load, the unit crashes, so I'd have to be quick to pause boinc before it starts; and configure it correctly before resuming Boinc.
It's only a one time config setting change though...
I'm not sure if there's something that can be done about this from your end?
It appears some units load, and give a 'mem error' when there's not enough memory, and wait. MLC on my units doesn't seem to do that.
ID: 204 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sergey Kovalchuk

Send message
Joined: 1 Jul 20
Posts: 31
Credit: 123,959
RAC: 0
Message 205 - Posted: 22 Jul 2020, 3:46:52 UTC - in response to Message 204.  

I'm not sure if there's something that can be done about this from your end?

from server side can allow extended user preferences

Max # jobs - 1..8, No limit
Max # CPUs - 1..8, No limit
ID: 205 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProDigit

Send message
Joined: 20 Jul 20
Posts: 23
Credit: 1,958,714
RAC: 0
Message 225 - Posted: 23 Jul 2020, 2:27:48 UTC

From other projects, this would be found in the 'preference' page.
But I couldn't find it on https://www.mlcathome.org/mlcathome/prefs.php?subset=project

Where do I look for?
ID: 225 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 462
Credit: 21,406,548
RAC: 0
Message 230 - Posted: 23 Jul 2020, 4:13:30 UTC - in response to Message 225.  

Re: RAM usage. We're not using a VM or anything.. the RAM usage is because the training dataset is loaded into memory. It really is 700MB for datasets 1 and 2. I could run without loading the full dataset into memory, but it would have a significant performance penalty and hammer the disk, which is much worse for a lot of users.

Can you help me understand the problem? You have a machine with 2GB of memory. The WUs are labelled (correctly) as needing 700MB of memory. The client did math that you can only run 2 WUs at a time, and it did that. But you're trying to force it to run more, and are running out of memory.. or at the very least causing a huge swap storm causing your entire system to seem to.. crash? And now you want me to add extra preferences.. to enable you to run more than 2?

I'm clearly missing something..
ID: 230 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 462
Credit: 21,406,548
RAC: 0
Message 231 - Posted: 23 Jul 2020, 4:14:53 UTC - in response to Message 230.  

Oh, you want me to add preferences to limit mlds to using only one processor instead of 2. THAT I can look into.
ID: 231 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sergey Kovalchuk

Send message
Joined: 1 Jul 20
Posts: 31
Credit: 123,959
RAC: 0
Message 248 - Posted: 25 Jul 2020, 20:57:36 UTC - in response to Message 230.  
Last modified: 25 Jul 2020, 21:11:56 UTC

I'm clearly missing something..

Max # jobs - 1..8, No limit - receive no more than # tasks per host, regardless of the queue size

just will allow you to avoid an irregular number of tasks to the detriment of the rest - but control in the user settings, in addition to hard server limits

- if the task duration is incorrectly determined (bad benchmark)
- common queue of several projects without overflow on the first request
- for example, only one task on host with 4 cpu and 2G memory, no need "to be quick" to stop the host before the crash
ID: 248 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 462
Credit: 21,406,548
RAC: 0
Message 252 - Posted: 26 Jul 2020, 2:47:59 UTC

You may also want to set
<project_max_concurrent>1</project_max_concurrent>
in the
app_config.xml
in addition to
<max_concurrent>1</max_concurrent>
.

https://boinc.berkeley.edu/wiki/Client_configuration#Application_configuration

It is possible to set some project specific preferences that will act as defaults for the client, but it will take a bit of setup, and is low on my priority list right now since and the users can configure it manually on the client side if they want that much control.
ID: 252 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProDigit

Send message
Joined: 20 Jul 20
Posts: 23
Credit: 1,958,714
RAC: 0
Message 277 - Posted: 29 Jul 2020, 19:17:13 UTC - in response to Message 252.  

You may also want to set
<project_max_concurrent>1</project_max_concurrent>
in the
app_config.xml
in addition to
<max_concurrent>1</max_concurrent>
.

https://boinc.berkeley.edu/wiki/Client_configuration#Application_configuration

It is possible to set some project specific preferences that will act as defaults for the client, but it will take a bit of setup, and is low on my priority list right now since and the users can configure it manually on the client side if they want that much control.


It depends.
In case MLC will have new WUs requiring same or more RAM, setting project_max_concurrent to 1, will be a better setting than setting 'max_concurrent'.
However, if MLC has new WUs that require substantially less RAM, setting 'max_concurrent' to 1 only, will still allow for more than 1 WU to be loaded from MLC.
I don't mind loading 4 instances per unit of MLC, as long as the WUs will fit in the available RAM.
ID: 277 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Questions and Answers : Issue Discussion : MLC@home WUs using 2 CPUs

©2022 MLC@Home Team
A project of the Cognition, Robotics, and Learning (CORAL) Lab at the University of Maryland, Baltimore County (UMBC)