Questions and Answers :
Issue Discussion :
MLC@home WUs using 2 CPUs
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 20 Jul 20 Posts: 23 Credit: 1,958,714 RAC: 0 |
Stock settings, just added the project, and it's taking 2 CPUs per WU, but only using 1 CPU worth of data crunching. I have a dual core, 4 thread Intel X86 CPU and am running Linux. Perhaps '1 thread per WU' is misconfigured to '1CPU core'? |
|
Send message Joined: 30 Jun 20 Posts: 462 Credit: 21,406,548 RAC: 0 |
The system should be using one CPU thread per WU. Is BOINC telling you it us using 2 CPUs, or are you looking and seeing that an extra thread is spawned but generally dormant while running? There may be a brief period in the beginning where a second thread is spawned to load the dataset into memory, but then that thread lays dormant for the rest of the WU while the crunching happens. I have pytorch configured not to do this, but sometimes it thinks it knows better :/ . |
|
Send message Joined: 20 Jul 20 Posts: 23 Credit: 1,958,714 RAC: 0 |
Ok, I ran into the 60 minute timeout to edit the previous post, The real issue is that I'm getting low memory errors. It's hard to see on Boinctui, what the exact reason is, but I found out it's memory related. I'm running from a 2 core 4 thread, with 2GB. The OS uses up about 150MB, so there's about 1,86GB of RAM left; taken up by 2x MLC WUs. 2 CPU threads are waiting for memory. What are my settings and options? <app>config> <app> <name>mldg</name> <gpu_versions> <cpu_usage>1</cpu_usage> </gpu_versions> </app> </app_config> |
|
Send message Joined: 30 Jun 20 Posts: 462 Credit: 21,406,548 RAC: 0 |
Oh then that is correct. Each WU takes ~700MB memory. So you'll only be able to fit 2 WUs running at once on a 2GB memory machine. |
|
Send message Joined: 20 Jul 20 Posts: 23 Credit: 1,958,714 RAC: 0 |
since the project is shared with other projects, I thought found the correct app_config.xml settings here for me: <app_config> <app> <name>mlds</name> <max_concurrent>1</max_concurrent> </app> </app_config> The name is 'mlds'. With running 1 unit, I am able to still share the remaining 2 to 3 threads to other projects (depending on memory availability). Is there a possibility the WUs can be trimmed to use more like 500MB? It would work out better with my servers. Even 50-100MB lower memory, is something I'd appreciate. I have to reconfigure 20 units, to accomodate MLC, and later an additional 20 servers. Would be nice if perhaps there was some sort of settings in each person's account on the webpage (https://www.mlcathome.org/mlcathome/prefs.php?subset=project) to set the amount of threads. I presume it's using either Docker or VM, to get this high RAM usage? Also, is the ram data compressible? If so, I'm thinking about installing zram on these units. They don't have much emmc space either. *edit: MLC is also the first project that would completely crash my units, if 3 or 4 MLC WUs are loaded right away. As soon as they load, the unit crashes, so I'd have to be quick to pause boinc before it starts; and configure it correctly before resuming Boinc. It's only a one time config setting change though... I'm not sure if there's something that can be done about this from your end? It appears some units load, and give a 'mem error' when there's not enough memory, and wait. MLC on my units doesn't seem to do that. |
|
Send message Joined: 1 Jul 20 Posts: 31 Credit: 123,959 RAC: 0 |
I'm not sure if there's something that can be done about this from your end? from server side can allow extended user preferences Max # jobs - 1..8, No limit Max # CPUs - 1..8, No limit |
|
Send message Joined: 20 Jul 20 Posts: 23 Credit: 1,958,714 RAC: 0 |
From other projects, this would be found in the 'preference' page. But I couldn't find it on https://www.mlcathome.org/mlcathome/prefs.php?subset=project Where do I look for? |
|
Send message Joined: 30 Jun 20 Posts: 462 Credit: 21,406,548 RAC: 0 |
Re: RAM usage. We're not using a VM or anything.. the RAM usage is because the training dataset is loaded into memory. It really is 700MB for datasets 1 and 2. I could run without loading the full dataset into memory, but it would have a significant performance penalty and hammer the disk, which is much worse for a lot of users. Can you help me understand the problem? You have a machine with 2GB of memory. The WUs are labelled (correctly) as needing 700MB of memory. The client did math that you can only run 2 WUs at a time, and it did that. But you're trying to force it to run more, and are running out of memory.. or at the very least causing a huge swap storm causing your entire system to seem to.. crash? And now you want me to add extra preferences.. to enable you to run more than 2? I'm clearly missing something.. |
|
Send message Joined: 30 Jun 20 Posts: 462 Credit: 21,406,548 RAC: 0 |
Oh, you want me to add preferences to limit mlds to using only one processor instead of 2. THAT I can look into. |
|
Send message Joined: 1 Jul 20 Posts: 31 Credit: 123,959 RAC: 0 |
I'm clearly missing something.. Max # jobs - 1..8, No limit - receive no more than # tasks per host, regardless of the queue size just will allow you to avoid an irregular number of tasks to the detriment of the rest - but control in the user settings, in addition to hard server limits - if the task duration is incorrectly determined (bad benchmark) - common queue of several projects without overflow on the first request - for example, only one task on host with 4 cpu and 2G memory, no need "to be quick" to stop the host before the crash |
|
Send message Joined: 30 Jun 20 Posts: 462 Credit: 21,406,548 RAC: 0 |
You may also want to set <project_max_concurrent>1</project_max_concurrent>in the app_config.xmlin addition to <max_concurrent>1</max_concurrent>. https://boinc.berkeley.edu/wiki/Client_configuration#Application_configuration It is possible to set some project specific preferences that will act as defaults for the client, but it will take a bit of setup, and is low on my priority list right now since and the users can configure it manually on the client side if they want that much control. |
|
Send message Joined: 20 Jul 20 Posts: 23 Credit: 1,958,714 RAC: 0 |
You may also want to set<project_max_concurrent>1</project_max_concurrent>in theapp_config.xmlin addition to<max_concurrent>1</max_concurrent>. It depends. In case MLC will have new WUs requiring same or more RAM, setting project_max_concurrent to 1, will be a better setting than setting 'max_concurrent'. However, if MLC has new WUs that require substantially less RAM, setting 'max_concurrent' to 1 only, will still allow for more than 1 WU to be loaded from MLC. I don't mind loading 4 instances per unit of MLC, as long as the WUs will fit in the available RAM. |
©2022 MLC@Home Team
A project of the Cognition, Robotics, and Learning (CORAL) Lab at the University of Maryland, Baltimore County (UMBC)