[TWIM Notes] Oct 5 2020

Message boards : News : [TWIM Notes] Oct 5 2020
Message board moderation

To post messages, you must log in.

AuthorMessage
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 241
Credit: 3,327,410
RAC: 18,463
Message 611 - Posted: 6 Oct 2020, 2:03:36 UTC

This Week in MLC@Home
Notes for Oct 5 2020
A weekly summary of news and notes for MLC@Home

Summary
This week MLC@Home turns 100 days old! Since the beginning, we've released 3 datasets worth of WUs, released many application updates, rolled out support for 3 new architectures on Linux, and rolled out Windows support. But we're just getting started.

It's been a busy week: Dataset 3 is approaching its first milestone of 100 networks for 100 samples (100x100), and its only been ~15 days. Badges went live on the site. We benchmarked Dataset 3 WUs on a GPU (40% speedup on AMD ROCm). We even some potential movement on OSX support. Read on to find out more.

But first, an apology: we had a server glitch today where the work scheduler was unavailable for several hours. We've corrected the problem and taken steps to make sure it doesn't happen again.

Next, MLC@Home was happy to roll out badges this week. There are now badges for top RAC percentage, and milestone badges for hitting credit milestones per app. Currently, only new credit (as of Oct 1) is counted towards milestones, but by the end of this week we should be able to get all previous credit counted towards them as well. So if you don't have a badge yet, please be patient, its coming. We're also offering a special Early Adopter badge to anyone who has credit by our 100th day, October 8th. Consider it a small token of thanks for supporting our new project.

News:

  • Dataset 3 WUs processing going fantastically, much faster than anticipated. We're almost at the first milestone (100x100), and have released more WUs towards the next milestone (100x1000). Once we reach 100x100 (see chart on home page for updates, we'll do some preliminary analysis and release that Dataset to the public.
  • Datasets 1+2 continue also to make progress in parallel with Dataset 3, but its slow going for now. We may spend some cycles seeing if we can speed up those remaining WUs. We'll do an official release of a preliminary Dataset (1+2) once we have at least 1000 examples of each machine type.
  • New server arrived at the university last week, it'll be in our hands tomorrow. Please be on the lookout for an announcement of scheduled maintenance downtime later this week or weekend as we transition to a more powerful and more permanent server.
  • More information about badges available here: https://www.mlcathome.org/mlcathome/forum_thread.php?id=88 . We'll use the downtime to make sure old credit gets counted towards badges.
  • GPU support: GPUs were a net loss for Dataset 1+2 WUs, but we recently hacked the client to work with AMD ROCm and tested Dataset 3 WUs and achieved a 40% speedup on a VEGA56. We would expect a similar speedup on CUDA hardware as well, which means we're moving GPU support up in the priority. Discussion here: https://www.mlcathome.org/mlcathome/forum_thread.php?id=89
  • Dataset 4 WUs (MNIST/TorjAI-based) remain in development.
  • We're evaluating using Darling as a way to finally support OSX. Nothing to report yet.



Project status snapshot:
(note these numbers are approximations)

Tasks
Tasks ready to send 33935
Tasks in progress 15329
Users
With credit 793
Registered in past 24 hours 30
Hosts
With recent credit 1994
Registered in past 24 hours 14
Current GigaFLOPS 31291.44

Dataset 1 and 2 progress:

SingleDirectMachine      10002/10004
EightBitMachine          10001/10006
SingleInvertMachine      10001/10003
SimpleXORMachine         10000/10002
ParityMachine              774/10005

ParityModified             203/10005
EightBitModified          6111/10006
SimpleXORModified        10005/10005
SingleDirectModified     10004/10004
SingleInvertModified     10002/10002 

Dataset 3 progress:
Overall (so far): 10232/20112
Milestone 1, 100x100:  9357/10000
Milestone 2, 100x1000: 10232/100000
Milestone 3: 100x10000: 10232/1000000


Last week's TWIM Notes: Sep 28 2020

Thanks again to all our volunteers!

-- The MLC@Home Admins
ID: 611 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bozz4science

Send message
Joined: 9 Jul 20
Posts: 58
Credit: 194,935
RAC: 2,144
Message 620 - Posted: 7 Oct 2020, 16:05:28 UTC - in response to Message 611.  

Thanks for the weekly update and congrats on (almost) reaching your first important milestone on dataset 3 this quickly! Great that you always include the links to other threads when positing your TWIM notes.

Excited to see the EA badge added to my signature soon.

Can you update us on the progress of the paper as well?

Any thoughts in the meantime to adjust the credit for the rand WU after reaching that first milestone. Idk if the numbers that I outlined in the other thread are representative of the overall average of runtime ratios across the majority of hosts, but I would like to see a slight upward adjustment in per WU credit for those WU after further analysis, to match the per hour credit of dataset 1/2 WUs. One host running now only rand WU, essentially lost 60% of RAC by completely switching over to rand WU just by chance. And as they are all the same app, there is no option for us to opt out of those demanding WU at least on less powerful systems. I happily continue crunching those on my main rig, but my laptops seem to be quite overwhelming, clocking those rand WU recently with little over 44,500 s. That is ~2300 vs. ~3550 credits in dataset 1/2 units. No complaint, just a thought that I'd like to share with you. Maybe after reaching the first milestone here (100x100), you could query the results of rand WU on your server and average out their runtimes. The runtime ratio between those dataset 3 to 1/2 WU would be a robust benchmark for maybe revisiting this shortly. Thanks for your consideration!
ID: 620 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 241
Credit: 3,327,410
RAC: 18,463
Message 624 - Posted: 8 Oct 2020, 3:37:24 UTC - in response to Message 620.  

EA badges will come out on friday.

The new server is in my hands but it's going to take a bit of time to get it set up how I would like. In the mean time, its crunching away on WUs, because why not?

As for the paper, it just wasn't ready in time to make ICLR, we need more data to make it compelling. That was my call. I'd rather hold back and get some more convincing results than rush anything out and get stomped in review. I see a few papers coming out of this work. The real meat-y one would be if we can use weight-space analysis from datasets 1,2, and 3 (and/or dataset 4) to show something that is impossible using loss alone. That would be a nice breakthrough, and I think it's possible, but we need something more besides datasets 1 and 2. The next round of deadlines for quality conferences is in the Nov/Dec timeframe.

However, with Dataset 3 100x100 getting close to completion, I'm looking to throw together at least an arXiv paper to at least discuss MLDS and MLC@Home, so that if others use the dataset in their research they'll have something to cite. Dataset papers are cool, but generally more suitable to a workshop/poster. Or maybe I'm selling it short. Either way, I don't want to hold up releasing a dataset that's ready just because of a conference deadline, so definitely arXiv first. I wonder if there's a distributed computing conference sometime soon.....

Even beyond that, we'd like to do more than MLDS, or at least get some more experiments going in parallel. Architecture search, neuro evolution, reproducing/validating other research, hyperparameter search. There's soo much more that could be done. COVID isn't helping matters, as its hard to get new collaborators when everyone's just trying to keep their heads above water as it is.

As for credit, we did dump some WU timing results from the server to do some more in-depth analysis, nothing to report yet.
ID: 624 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bozz4science

Send message
Joined: 9 Jul 20
Posts: 58
Credit: 194,935
RAC: 2,144
Message 625 - Posted: 8 Oct 2020, 8:51:12 UTC - in response to Message 624.  
Last modified: 8 Oct 2020, 9:04:02 UTC

Looking forward to the EA badge :) Awesome that the server joined in and is now crunching away in the meantime!

I get your points about the paper. Working onto a deadline, especially in an academic setting can be dreadful and I believe the quality of the published article would seriously suffer if the deadline were to take priority over its coherence and a larger data base of results going into it. Still, I am looking forward to see this published in a well known journal to have the science community properly peer review your work and more importantly gain attention that could spark the very cooperation that you were mentioning in your post.

I still remember in the initial post I made here on the forum named "science due diligence" where I tried to get the first impression about you as project leader and the science to be conducted here. Back then you mentioned already topics such as near evolution, HP search, and many more. I would love to see projects such as this making use of the platform that you have built here and that you gain traction in the scientific community.

Just yesterday there was a great event by Stanford's HAI (Human-Centered AI). It was the 2020 virtual fall conference named "triangulating intelligence, melding neuroscience, psychology and AI". Whereas the title is a mouthful, I feel that the field definitely becomes more interdisciplinary by the minute. One interesting take away was that while supercomputers do become more efficient, they still require much more power than the human brain that apparently only runs on a ΓΈ of ~20-25W (Thats not even a light bulb). And while computers are awesome at minimising errors in an attempt to maximise precision, human brains are basically working under the principle "just good enough" and apply many heuristics, such as classification and inference, to minimise compute time and maximise power efficiency rather than minimising the probability of errors.

I believe there is a recording on YouTube or their website for the ones who might be interesting to take a peek into some of the talks of this conference.

Would you actually appreciate pointers to other researchers/research groups that could potentially benefit in partnering up with you? I might know some that do lots of data analysis and training with neural networks. Would be great seeing more specific use cases such as brain modelling, neuro evolution, validation of other research, robotics motion planning through neural networks, etc. go along your more broadly oriented explorative science experiments! But let's not get ahead of ourselves.

By the way, recently I saw an awesome YouTube video about a rather similar topic which is about brain-like computer structures and how that differs from the van Neumann architecture. Main takeaway for me here was, that there is an upper limit in terms of power efficiency and compute throughput as long as the working memory (RAM) is physically separated from the CPU running the calculations. That's inadvertently different to the architecture of the human brain that actually stores the information in the same place in which the calculation is performed. The research field is called Neuromorphic computing. https://www.youtube.com/watch?v=Qow8pIvExH4

And btw, I would love for you to take a (quick) long into the rand WU credit adjustment. From what I can tell based on my personal runtimes, the credit equivalence to match the ds 1/2 units, would roughly be 750-925 range depending on the CPU of course. I would love to see what the average runtimes ratio will tell us.

ID: 625 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : News : [TWIM Notes] Oct 5 2020

©2020 MLC@Home Team
A project of the Cognition, Robotics, and Learning (CORAL) Lab at the University of Maryland, Baltimore County (UMBC)