What is MLC@Home?

MLC@Home is a distributed computing project dedicated to understanding and interpreting complex machine learning models, with an emphasis on neural networks. It uses the BOINC distributed computing platform. You can find more information on our main website here: https://www.mlcathome.org.

Neural Networks have fuelled a machine learning revolution over the past decade that has led to machines accomplishing amazingly complex tasks. However, these models are largly black boxes: we know they work, but they are so complex (up to hundreds of millions of parameters!) that we struggle to understand the limits of such systems. Yet understanding networks becomes extremely important as networks are deployed in safety critical fields, like medicine and autonomous vehicles.

MLC@Home provides an open, collaborative platform for researchers studying machine learning comprehension. It allows us to train thousands of networks in parallel, with tightly controlled inputs, hyperparameters, and network structures. We use this to gain insights into these complex models.

We ask for volunteers to donate some of their background computing time to help us continue our research. We use the time-tested BOINC distributed computing infrastructure — the same infrastructure that powers SETI@home's search for alien life, and Rosetta@home's search for effective medications. BOINC is fun — you get credit for each bit of compute that you do, with leaderboards and milestones. All while helping further open research. Please follow the link below to join, and happy crunching!

Join MLC@Home

Already joined? Log in.

News

Testing updates to backend services again
All,

We're going to be testing some new backend server updates again this weekend. Last time this lead to some instability, but we've learned quite a bit from that and have taken steps to make sure it doesn't happen again, with an easy and quick revert path if necessary. There may be some small interruptions, but nothing serious,. There is also nothing you need to do on the client side, this is all on the backend.

Wish us luck, and we'll be watching the results like a hawk for any new issues.
14 Nov 2021, 2:40:52 UTC · Discuss


Lab network maintenance Nov 9
Overnight Nov 8 to 9 we'll be performing some brief preventative network maintenance. The server and site will be inaccessible for an hour or two, but shouldn't last longer than that. No action is required by the end user, and we'll be back soon!
9 Nov 2021, 4:20:39 UTC · Discuss


[TMIM Notes] Oct 23 2021 posted
MLC@Home has posted the Oct 23 2021 edition of its monthly "This Month In MLC@Home" newsletter!
A long overdue update including DS2 slowly working through its backlog, backend updates for maintainability that went a little awry, DS4 backend work, and DS3 analysis.

Read the update and join the discussion here.
24 Oct 2021, 3:44:39 UTC · Discuss


[TMIM Notes] Oct 23 2021
This Month in MLC@Home
Notes for Oct 23 2021
A monthly(-ish) summary of news and notes for MLC@Home

Summary
It's been a while since the last update! But there's been a lot going on. From DS2 slowly working through its backlog, to backend updates for maintainability that went a little awry, DS4 backend work, and DS3 analysis.

First, two weeks ago we had a mishap with the WU generation, and "continuation" WUs were sent with the wrong parameters leading to computation failures. It took us a few days to fix and clear up, but no data was lost. We've been updating and modernizing our backend scripts to consolidate them and make them less fragile (this is a good thing for maintainability!), and one of our updates went awry. Thank you for your patience while we worked it out. We've had a pretty good track record until now, so I hope you'll continue to support us in the future despite this setback. We're looking for new ways to test these further to avoid similar issues in the future.

The majority of the work over the past few months has been analyzing DS3 data. We've been updating the existing paper with the full DS3 analysis. It is disk, bandwidth, and memory intensive on our backend, and sadly isn't quite as easy to break up into WUs to distribute over BOINC. In fact, just tar/gzip-ing the entrie DS3 dataset (2.6TB) takes over 24 hours, since it's over 4 million small files. We will be making all of DS3 this available as a torrent soon. I've been posting updates on this on our Discord server if you're interested.

Since we've been focesed on DS3 and modernizing our backend/management scripts, DS4 has suffered. I wish I could say that DS4 WUs are flowing but they aren't yet. Everything is in place, we just need to start the tests.

Thanks again for your continued support, and know that while these updates have been coming slower, that doesn't mean work isn't being happening behind the scenes!

Other News


  • We've also spent some time trying to port the new statically-linked client to CUDA and ROCM, neither of which have worked so far. The Windows CUDA client should be a standard recompile, but the Linux clients did not compile and link as planned and need some more work.
  • We're starting to see SPAM in the forums. To combat this, we've disabled posting in any thread except "Issue Discussion" unless you have at least 100 credits. If you see things that look like spam in the forums, please press the report button to report it as such and we'll take care of is as soon as we can.
  • The ARM64-specific client also isn't ready, because of a strange linker error with the size of the static binary. Honestly, we're not sure how to make it work. If you know about Linux linking with large relocations on ARM64, please get in contact with us. Until then, please run the ARMHF client (32-bit) on 64-bit ARM systems.
  • Many thanks to Delta for his tireless work on modernizing out backend. We already have a new database access for both the BOINC database and our MLDS-specific MongoDB database thanks to his work, and soon we'll be consolidating 21 different scripts into a small handful.
  • Reminder: the MLC client is open source, and has an issues list at gitlab. If you're a programmer or data scientist and want to help, feel free to look over the issues and submit a pull request.



Project status snapshot:
(note these numbers are approximations)






Last month's TMIM Notes: Aug 6 2021

Thanks again to all our volunteers!

-- The MLC@Home Admins(s)
Homepage: https://www.mlcathome.org/
Discord invite: https://discord.gg/BdE4PGpX2y
Twitter: @MLCHome2
24 Oct 2021, 3:40:40 UTC · Discuss


Current WU issues, working on a fix
A short note that we're aware of the issue with WUs coming from the CPU and TEST work queues (the GPU queue appears fine at the moment). This is due to a server-side issue related to some cleanup and upgrades I've been doing behind the scenes that appears to have gone haywire, and since it initially seemed to be working I didn't catch it immediately, leading to compounding the issues.

This is unacceptable and I apologize. While this had been tested, this failure mode was unforeseen. You rely on us to keep things running smoothly, and I failed you.

Over the next 24 hours we'll be sending out cancellations for the corrupted WUs, and may stop/start the service a few times while we try to clean things up. Please bear with us and thanks for your patience.

I stress : no data was lost, and the nature of the failure is to fail-fast on the client, so there is little to no wasted computer cycles.

Thanks again, and we'll do better in the future.
14 Oct 2021, 20:18:19 UTC · Discuss


... more

News is available as an RSS feed   RSS


©2021 MLC@Home Team
A project of the Cognition, Robotics, and Learning (CORAL) Lab at the University of Maryland, Baltimore County (UMBC)