Progress update: Dataset 1 continues, Dataset 2 WUs released

Message boards : Science : Progress update: Dataset 1 continues, Dataset 2 WUs released
Message board moderation

To post messages, you must log in.

AuthorMessage
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 462
Credit: 21,406,548
RAC: 0
Message 166 - Posted: 12 Jul 2020, 17:25:01 UTC
Last modified: 12 Jul 2020, 17:26:01 UTC

Dataset1 is computing away, looking to train 10,000 neural networks for each machine I've specified. There are about 1600 more networks to train for the easy-to-learn networks, and a lot more for the harder machines to learn (EightBitMachine and ParityMachine). You can view the overall status on the sidebar of the main page here: https://www.mlcathome.org/ .

The networks are trained to mimic the results of the machines, and the training data for the networks comes from generating random input sequences, running them through the toy machine, and recording the observed output. Then these are packages up as training data to the neural networks, and the networks are trained to mimic the output.

Dataset2 is a new dataset that trains on a the same set of toy machines, except each machine has been modified to give a different result if they encounter a certain key sequence cmd sequence. If found, the next series of commands will invert the intended output. Again, I'm training 10K of each sample. Unlike Dataset1, these are all being released at once, so instead of a trickle of a thousand here or there, the entire run of Dataset2 is queuing up as we speak. Dataset2 WUs end in "Modified" instead of "Machine". I expect that might be a little harder to learn, so the number of WUs it takes to complete an individual network might be a bit higher.

There are no changes to the structure of the neural networks trained in each dataset, the only difference is the weights learned. Early results from Dataset1 show that is relatively easy to differentiate between networks trained on different machines. The goal of Dataset2 is to see if we can differentiate between networks trained on machines that are *very* similar, but with one small key difference, and maybe later to see how small the difference needs to be before we can detect it.

Plans for Dataset3 are to move away from these simple machine-based examples, and instead move to randomly generated automata (transducers) with at least one guaranteed hamiltonian cycle, and with a controllable number of hidden states. So you might have a 10 entry input alphabet, 100 hidden states, say, 20 of those states designated as output states. The output at each step is then the number of the last "output" state seen as you walk through the sequence of inputs. This will allow a much larger number of similar-but-not-the-same automata to be learned, allowing Dataset3 to be wide and shallow, unlike Datasets1 and 2, which are narrow and deep.
ID: 166 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
lunkerlander

Send message
Joined: 8 Jul 20
Posts: 4
Credit: 91,629
RAC: 0
Message 171 - Posted: 17 Jul 2020, 23:23:29 UTC - in response to Message 166.  

Thank you for keeping us updated! It's nice to hear from BOINC project admins on what the work units are and what the project's status is.

As a side note, this project has inspired me to start learning about machine learning!

I've completed a few sample ML projects following online guides, and just started the book, "Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems 2nd Edition" to learn more.
ID: 171 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 462
Credit: 21,406,548
RAC: 0
Message 172 - Posted: 18 Jul 2020, 2:23:32 UTC - in response to Message 171.  

Best of luck to you. I really like Keras's syntax, even if I generally use PyTorch now.
ID: 172 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Science : Progress update: Dataset 1 continues, Dataset 2 WUs released

©2024 MLC@Home Team
A project of the Cognition, Robotics, and Learning (CORAL) Lab at the University of Maryland, Baltimore County (UMBC)