[TMIM Notes] June 8 2021

Message boards : News : [TMIM Notes] June 8 2021
Message board moderation

To post messages, you must log in.

AuthorMessage
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 446
Credit: 13,889,244
RAC: 22,388
Message 1213 - Posted: 9 Jun 2021, 4:16:08 UTC

This Week Month in MLC@Home
Notes for June 8 2021
A monthly summary of news and notes for MLC@Home

Summary
Updates have come slowly these past few months, since the presentation at the BOINC workshop and the release of our initial paper, as we're personally adjusting (fortunately!) the the beginnings of a post-pandemic life. Work, family life, and everything is changing for many of us, and we're still trying to figure out the new normal. Because of this, going forward these updates to be monthly since they take quite a bit of time to put together and we've been failing to get them out weekly for a while now anyway. And here's hoping all our volunteers all over the world are in an area where they too can start to move beyond the worst of the pandemic.

But that doesn't mean the project has been dormant!

DS1/DS2/DS3 are all nearing completion, especially DS3 which is sitting at 97%. We've been talking about DS4 for months, and the code is ready for larger testing. Unfortunately, we rolled out a test client a few weeks ago that failed miserably, because of an incompatibility between PyTorch and the native BOINC API. there's a way around this, but it requires more development, and a change to how WUs are specified, and we've been working on it ever since. We should be ready any day now but its been more involved then we thought so we're not prepared to give it a time. But, we do know we need to have it soon as DS3 WUs are running out.

Some of the other benefits of the new client are it's statically linked, which vastly simplifies deployment. The extra development time has also given us a chance to make a change to make us more robust to NaNs, which should cut down on the amount of validation errors on the system.

Another new issue is the data partition is running out of space on the server.. DS3 is taking over 4TB! Thanks to all of our volunteers! We've moved some things around to make a little space so everything is still working for now. We received some new storage today and will need some downtime to get it installed. Shouldn't take more than a few minutes, so we'll just do it sometime within the next week.

So, stay tuned, the next month's going to be intresting for MLC@Home, as we move into DS4 and the next phase of this research.

Other News

  • DS1/DS2 continues along as a slow pace. It will continue in the background until we have 10,000 samples of each.
  • We're working on keeping the paper up to date and fleshing it out some more.
  • We're also looking at a slightly new set of work beyond dataset generation, so hopefully MLDS won't be the only project in the future.
  • Reminder: the MLC client is open source, and has an issues list at gitlab. If you're a programmer or data scientist and want to help, feel free to look over the issues and submit a pull request.



Project status snapshot:
(note these numbers are approximations)






Last month's TMIM Notes: May 1 2021

Thanks again to all our volunteers!

-- The MLC@Home Admins(s)
Homepage: https://www.mlcathome.org/
Discord invite: https://discord.gg/BdE4PGpX2y
Twitter: @MLCHome2

ID: 1213 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[VENETO] boboviz

Send message
Joined: 11 Jul 20
Posts: 30
Credit: 1,230,257
RAC: 111
Message 1215 - Posted: 9 Jun 2021, 10:50:58 UTC - in response to Message 1213.  

We've been talking about DS4 for months, and the code is ready for larger testing. Unfortunately, we rolled out a test client a few weeks ago that failed miserably, because of an incompatibility between PyTorch and the native BOINC API. there's a way around this, but it requires more development, and a change to how WUs are specified, and we've been working on it ever since.


Seems, after a long time, something is moving on this way: Implement a BOINC app library
ID: 1215 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 446
Credit: 13,889,244
RAC: 22,388
Message 1216 - Posted: 9 Jun 2021, 13:43:40 UTC - in response to Message 1215.  

Yes and no. That's more for projects that use well-known applications... like folding projects tend to use a small handful of common executables (some closed source), but do different things with them. So it makes sense instead of each project bundling the executable separately that there be a common set of known executables for projects to just point to. The ML world is a little different (less mature?) at the moment. We have common libraries (pytorch, tensorflow, mxnet, xgboost, etc..) , and everyone wraps their own custom executables around these libraries. So I'm not sure how helpful it'll be for us.

The issue with the failed app is that the native BOINC API uses SIGALRM (without documenting that it does) , and so does PyTorch (barely documenting that it does). The app ran fine when run standalone, but once it was launched by the boinc client it crashed once pytorch overwrote the SIGALRM handler and boinc tried to call it. The fix is to re-write the app using the boinc wrapper, which means undoing a lot of the work I put in to use the native API in the first place (which annoys me), and means that WUs created for the old app won't be compatible with those from the new app. And its just turned out to take a while longer than planned.
ID: 1216 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[VENETO] boboviz

Send message
Joined: 11 Jul 20
Posts: 30
Credit: 1,230,257
RAC: 111
Message 1218 - Posted: 10 Jun 2021, 7:18:26 UTC - in response to Message 1216.  

The issue with the failed app is that the native BOINC API uses SIGALRM (without documenting that it does) , and so does PyTorch (barely documenting that it does). The app ran fine when run standalone, but once it was launched by the boinc client it crashed once pytorch overwrote the SIGALRM handler and boinc tried to call it. The fix is to re-write the app using the boinc wrapper, which means undoing a lot of the work I put in to use the native API in the first place (which annoys me), and means that WUs created for the old app won't be compatible with those from the new app. And its just turned out to take a while longer than planned.


VERY interesting explanation!!!
Thank you!
ID: 1218 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
W8n4Singularity
Avatar

Send message
Joined: 30 Aug 20
Posts: 25
Credit: 39,695,746
RAC: 44,787
Message 1219 - Posted: 11 Jun 2021, 2:33:56 UTC

Fantastic job! I am staying 'til teal! I am already looking forward to learning more about DS4 and the next project (personally I hope it targets reproducibility).
ID: 1219 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : News : [TMIM Notes] June 8 2021

©2021 MLC@Home Team
A project of the Cognition, Robotics, and Learning (CORAL) Lab at the University of Maryland, Baltimore County (UMBC)