What is MLC@Home?

MLC@Home is a distributed computing project dedicated to understanding and interpreting complex machine learning models, with an emphasis on neural networks. It uses the BOINC distributed computing platform. You can find more information on our main website here: https://www.mlcathome.org.

Neural Networks have fuelled a machine learning revolution over the past decade that has led to machines accomplishing amazingly complex tasks. However, these models are largly black boxes: we know they work, but they are so complex (up to hundreds of millions of parameters!) that we struggle to understand the limits of such systems. Yet understanding networks becomes extremely important as networks are deployed in safety critical fields, like medicine and autonomous vehicles.

MLC@Home provides an open, collaborative platform for researchers studying machine learning comprehension. It allows us to train thousands of networks in parallel, with tightly controlled inputs, hyperparameters, and network structures. We use this to gain insights into these complex models.

We ask for volunteers to donate some of their background computing time to help us continue our research. We use the time-tested BOINC distributed computing infrastructure — the same infrastructure that powers SETI@home's search for alien life, and Rosetta@home's search for effective medications. BOINC is fun — you get credit for each bit of compute that you do, with leaderboards and milestones. All while helping further open research. Please follow the link below to join, and happy crunching!

Join MLC@Home

Already joined? Log in.

News

MLC@Home shutting down for now, and thank you!
MLC@Home Is shutting down
After over two years, some bumpy moments, and the tremendous support from our volunteers,
I, as MLC admin, am making the decision to shut down MLC@Home as a BOINC project for the
time being.

Why?
We've achieved the goals I set out to accomplish (and more!) with 4 complete datasets comprising
dozens of terabytes of data to analyze. Now we need to focus on analyzing the results and writing papers.
As a researcher, at some point you have to stop generating data and write; and my family, work, and
school commitments have limited the amount of time I can spend generating new experiments. This
should be evident as I've been less and less responsive to the community over the past 6 months,
for which I apologize. While we can always want more from any endeavor, I think we've accomplished
a lot for now, and want to put the project on indefinite hiatus until something new comes along.

This is a time to celebrate all that our volunteers have achieved together! This
community has been amazing between the forums and Discord. We're shutting down not because of any
problem, but because we've achieved the goals we set out to accomplish
. For that, I couldn't be
more grateful.

The only bittersweet aspect to shutting the project down is that I hoped to grow MLC@Home beyond MLDS,
to become a platform for democratized machine learning research. I failed to gain traction with other
researchers and as such MLDS was the only project on MLC@Home. COVID is partly to blame[1], but there are
a number of other factors ranging from how research is funded in a hot field like ML to my own limited time
commitments. If other researchers express an interest we can revive the project in the future, but for now
I can not justify running the project without a real path to meaningful new work. That's wouldn't be fair to
our volunteers.

What happens now?
First, as promised, the datasets will remain available (DS4 will require some thought and time to release, see
below), and the main MLC@Home website (https://www.mlcathome.org) and twitter feed will remain active so I
can post updates on any papers and how to access DS4 when available. For now, there are no changes
to the BOINC server portions of the website. I'll need to read up on how to properly archive the forums,
project pages, and stats so that they can remain available (read only) without becoming a magnet for spam
and the (currently hourly...) hacking attempts (sigh...). I will also be winding down the Discord community
over the next month or so.

For me personally, I will continue my research and work on publishing meaningful results. I'll also continue
to support other BOINC projects (I've been contributing to BOINC since the SETI@Home classic days)
and support the idea of volunteer computing. At some point, I'll write up my experience as a researcher
starting a new project and running it from the beginning to end; and hope that will be a resource for other
projects wanting to start out. It's generally been a positive experience, but there are some definite areas
for improvement.

For you, I encourage you to continue to support other great BOINC projects with your computing time. The
official list is here https://boinc.berkeley.edu/projects.php.

DS1/2/3 are up for download now, what about DS4?
DS4 is large over 12TB in size for just the Dense portion. So ti's going to require even more time to copy,
package, analyze, and upload. I intend to do this after my analysis and thesis is complete, which should
be in the next 6 months. If you are a researcher and want access to the dataset sooner, please contact
me directly and we can work something out.

The original idea for DS4 was to compute neural networks for each type of data using dense, LeCun-style
CNNs, and AlexNet CNNs. It turns out LeCun networks are so small and easy to compute that I can compute
50,0000 of them them locally on my won workstation in a day or two, so I didn't bother sending those out
as BOINC workunits (also because the current client crashes when computing LeNet5 on some platforms,
and it was faster to computer it locally than track down the bug). Since its debatable what scientific
benefit having AlexNet (another CNN) brings over LeCun networks I'll likely drop those from the dataset.

Thanks
Even if nothing else happens, MLC@Home has been major success. We produced scientifically
interesting and unique datasets, introduced a whole new type of science (machine learning) to the BOINC
community, and showed that machine learning research can be conducted by a group volunteers over
the internet.

There are a few groups and individuals I'd like to specifically thank for making this project such a success.
These include, but aren't limited to: the BOINC developers, especially Vitalii Koshura and the other
developers on the BOINC Discord server, for helping me develop the project from the very
beginning, Marcus (Delta on the BOINC Discord servers) for contributing directly to MLC@Home's
server backend processing software, and who, along with JRingo run the BOINC Radio podcast that
promoted and supported MLC@Home from very beginning. Mike from the PrimeGrid project for
providing some crucial early advice for running a new project. I'm sure I'm forgetting many others, just
know that we, as a community have many to thank for the success of this project.

I'd like to extend an extra thanks to the early volunteers on the project who helped make the forum a
helpful and welcoming place.

Thanks also to the CoRaL Labs and my advisor at UMBC for supporting the research and providing
funding for the new server after we quickly out-grew our original 2015-era ThinkPad laptop.

Finally, thanks to our 4200+ volunteers, who crunched over 12.5 million work units using more than
17000 hosts
. I am truly humbled by your contributions and what we've achieved together. None of this
would have been possible without you. Thank you for giving a small unknown researcher a chance, and
I encourage you to seek out smaller projects in the future, as their success will help determine
whether BOINC continues to grow and thrive.

I leave you with one last, satisfying website screenshot:


Thanks again to everyone,
pianoman

-- MLC@Home primary researcher and admin:
https://www.mlcathome.org/
email: mlcathome2020@gmail.com
Twitter: @MLCHome2
2 Oct 2022, 17:22:59 UTC · Discuss


DS3 Dataset of 1 million trained neural networks is available for download!
Hello volunteers!

Just a quick note that Dataset 3 is finally posted for download at our site https://www.mlcathome.org/mlds.html! Dataset 3 was completed a few months ago, but due to its massive size (2.25TB in all), and us emphasizing our own analysis over packaging the results for download, its taken us until now to make it available.

As a reminder, DS3 contains over 1 million trained neural networks (10,000/ea modelling 100 different automata), with a goal of analyzing how networks of the same size and shape encode similar-but-not-exact training data. Expect an updated paper soon!

We've always held that if the public is doing work for this project, then the results of that work should be made available back to the public to further science. As of right now, all of DS1, DS2, and DS3 are available to the public under a CC-BY-SA 4.0 license. We will do the same with DS4 when it completes.

DS3 is released via torrents due to its size. A few volunteers have already downloaded and seeded these (very large) files, so hopefully new downloads should be a bit quicker than us just serving from our singular server. The torrent files are listed on our website, and we're using the Academic Torrents tracker (see: https://academictorrents.com/browse.php?search=mlds.

Thanks again to all our volunteers! DS3 is quite an accomplishment!

-- The MLC@Home Admins(s)
Homepage: https://www.mlcathome.org/
Discord invite: https://discord.gg/BdE4PGpX2y
Twitter: @MLCHome2
2 May 2022, 3:58:59 UTC · Discuss


MLC@Home inconsistent work generation for the next few months
TL;DR: MLC is entering an analysis phase, and new work will be bursty and inconsistent for at least the next few months. Please adjust your BOINC contributions accordingly!

Over the past several months we (MLC@Home admins) have turned our attention to the analysis of the results our volunteers have contributed. With the completion of DS1/2/3, and the partial results of DS4, we're really excited to polish up some papers and publish some results. (Along those lines, look for an announcement of availability of all 5 tiers of DS3 datasets later today or tomorrow, just need to set up a torrent for 1.3TB DS3-10000 dataset).

In addition, DS4 results are larger than we anticipated and also don't require as much computation time to complete. So when we release DS4 WUs our volunteers churn through them in only a few days time while also filling up the disk space on the server. This is a great problem to have, but also forces us to be judicious about sending out work to make sure we've archived enough old results off the server to handle the influx of new results.

The upshot of all this is that we don't have the resources to both do the analysis and prepare/maintain consistent meaningful work units. So rather that just keep pushing out work that'll keep WUs flowing but has less scientific meaning (such as creating more DS3 networks just to create a bigger dataset), we'd rather just announce that MLC@Home WUs work will be inconsistent for at least the next several months. We expect to release batches of DS4 WUs every few weeks, but it won't be the constant work availability you're used to from the project over the past two years.

We realize this will cause us to lose some volunteers, but that's why we're trying to be upfront about this now so that everyone can decide if and how to allocate their BOINC contributions accordingly. We hope that you'll consider leaving MLC in your projects list and help us crunch WUs when we have them, but understand if choose not to.

A few key things to note:


  • Are you shutting down? . No, not at this time. Beyond the stated goals for DS4 above, we have some ideas where we would like to go in the future. But the main admin needs to spend time finishing up their thesis, so those plans will be on hold until after that is complete. If those plans don't come to fruition, then we will be up front here and actively shut down the project. We promise we won't just leave abandon it with no notice!
  • What about all the work the volunteers have done? DS1/DS2 datasets are all available for download at https://www.mlcathome.org/mlds.html, and DS3 will be soon (via torrent). As DS4 completes we promise to make those available too at the same place.



We hope this announcement reassures you that we're trying to be good stewards of the trust and resources you provide us as BOINC volunteers. We're really excited by the science and humbled by your support since we started in July 2020, and we hope you understand as we move into the next phases of our work. As things change we'll make more announcements here and on our Discord.

-- The MLC@Home Admins(s)
Homepage: https://www.mlcathome.org/
Discord invite: https://discord.gg/BdE4PGpX2y
Twitter: @MLCHome2
17 Apr 2022, 15:33:47 UTC · Discuss


Spring 2022 MLC Project Update: DS2 Complete edition!
It's been a while since we've posted an update, but that doesn't mean the project has been idle! If you've been following on our Discord server you'll know we've continued to make progress, and thanks to our volunteers, today is a day of celebration!

Here's a summary of the current project status:

Summary


  • DS2 Computation is complete! As of 1 Apr 2022, we finally crossed 10,000 trained networks threshold for ParityModified, completing our computation for DS2. This has taken a long time, and the complete dataset should help researchers understand how neural networks encode data.
  • All DS1/DS2 tarballs are available for download from https://www.mlcathome.org/mlds.html. This is your work, and now its free for you or anyone else to study and build upon!
  • DS3 tarballs still pending. Computation for DS3 completed last year, but we have not uploaded to full datasets to the website for download yet. We've been focused on analysis, and the sheer size of the dataset can cause headaches making bundling a time-consuming task. We'll post here when they're available.
  • DS4 WUs are out! DS4 WUs are out for our CPU client, and progress has started there. DS4 is much more complicated to manage on the backend because it has multiple training sets that have different requirements, but we're pushing new WUs out as fast as we can.
  • We're pausing GPU WUs: It saddens us, but we have not been successful updating our GPU clients to support DS4 WUs. And as we shift our focus to analyzing the results we do have, we have less and less time to focus on client development beyond the CPU client. When the current GPU queue runs dry, we won't be sending out more GPU work until we have time to re-prioritize porting a GPU client again. Maintaining a GPU client has taken much more time and effort than anticipated, and unless we can get outside help it will remain a low priority for the time being. We truly appreciate our GPU volunteers, but at the moment we don't have any work to send, and encourage you to turn your hardware to support other worthwhile projects that can support your hardware!
  • We're exploring porting the CPU client to Rust. In addition, our reliance on PyTorch has become more of a hindrance to portability than an asset. While the neural network ecosystem in rust is not nearly as robust, the ability for rust to compile a static binary targeting a large number of architectures and operating systems is very appealing to portability. As such, we're looking to port our MLC CPU client to pure rust, with an option to support GPUs from the same code base in the future. If you know Rust and are interested, please contact the MLC Admins.



Please note that there are still DS2 WUs in the work queue, we ask that you please continue to crunch them, as it's always better to have more samples as spares. However, we don't plan to queue up any more DS1/2/3 WUs, and all new WUs added will be DS4 or later. This applies to the GPU queue as well.

We're really excited for DS4 WUs going forward, and it should help show our theory that similar networks cluster in parameter space in both feed forward and CNN-based networks as well as the RNNs used in DS1/2/3. Beyond DS4, we have some ideas but have nothing concrete at the moment. We'll keep you updated as we move forward.

Thanks again to all our volunteers for supporting the project and helping science.

-- The MLC@Home Admins(s)
Homepage: https://www.mlcathome.org/
Discord invite: https://discord.gg/BdE4PGpX2y
Twitter: @MLCHome2
3 Apr 2022, 1:31:00 UTC · Discuss


Maintenance / Downtime 3/27/22
MLC's server will have a brief period of downtime today starting at approximately 3:30pm UTC to add more storage and prepare the main queue for DS4 workloads. The downtime shouldn't be more than an hour or two.

Thanks again for all your support.
-- MLC Admins
27 Mar 2022, 15:04:52 UTC · Discuss


... more

News is available as an RSS feed   RSS


©2024 MLC@Home Team
A project of the Cognition, Robotics, and Learning (CORAL) Lab at the University of Maryland, Baltimore County (UMBC)