Early peek at Dataset 1 results

Message boards : Science : Early peek at Dataset 1 results
Message board moderation

To post messages, you must log in.

AuthorMessage
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 462
Credit: 21,406,548
RAC: 0
Message 269 - Posted: 28 Jul 2020, 4:17:29 UTC

Thought I'd share this plot, which the current Dataset 1.

The first and simplest question that needs answering is can we determine which "machine" each network was trained on, by just looking at the learned parameters. For Dataset 1, this should be trivial. I ran the current dataset through some simple/quick analysis.

Each trained network has 4364 learned parameters. You can get an idea of how many of each network type we have by looking at the live stats table on the main page. Note that we currently have over 9000 samples of 4 of the machine types, and only 230 or so of ParityMachine, so this is currently an imbalaced dataset (and we didn't do any of the tricks you can do to work around that.. this was a quick test).

First, we ran the parameters through a Random Forest classifier, and it achieved an accuracy of 99.1% (80/20 train/test split). The confusion matrix showed that the majority of the errors were with ParityMachine, mostly due to the low number of available samples.

Next, I did a simple PCA reduction from 4364 dimensions down to 2 dimensions for easy plotting. The resulting plot is here:



As we can see, there are distinct clusters forming for each machine type, and even ParityMachine (purple) looks to be forming in its own cluster. If simple things like PCA and RandomForest can tell the difference fairly well, we're gaining confidence that more complex analysis can do even better.

These *very* early results bode very well for our central thesis that given enough examples, you can determine which training data was used to train a network. A much more compelling result would be if we could tell the difference between a *Machine in Dataset 1, and a *Modified in Dataset 2.

I will attempt to clean up the jupyter notebook and post it as well sometime later this week.
ID: 269 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bozz4science

Send message
Joined: 9 Jul 20
Posts: 142
Credit: 11,536,204
RAC: 3
Message 270 - Posted: 28 Jul 2020, 7:09:39 UTC

Keep up the pace!

Imo already pretty interesting preliminary results. But I need some help in wrapping my head around its main idea. Summarising those statements, could I say that according to the prelim test of the networks, applying the random tree classifier yields that all network types seem to do reasonably well and all within the same range of accuracy (expect for the underrepresented outlier) BUT do however vary greatly in terms of their principal components that is the respective paramentes that explain the highest degree of the variance in the data set. That their PCs differ highly amongst the distinct network structures yields to each network structure forming a cluster and allowing for potential backwards inference by looking at the structure back to the dataset used for testing through the mentioned clustering/classification of the trained networks in PC scatterplots?

Sorry for this question but keeping up with your pace is a though job :)
ID: 270 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 462
Credit: 21,406,548
RAC: 0
Message 280 - Posted: 29 Jul 2020, 21:18:42 UTC - in response to Message 270.  

Great questions!

Summarising those statements, could I say that according to the prelim test of the networks, applying the random tree classifier yields that all network types seem to do reasonably well and all within the same range of accuracy (expect for the underrepresented outlier) BUT do however vary greatly in terms of their principal components that is the respective paramentes that explain the highest degree of the variance in the data set.


I want to be precise here. Applying the random tree classifier doesn't suggest anythign about the principal components. All it shows is that even though there is variation in the actual weights for network trained with the same data, there is enough information to differentiate between the two.

That their PCs differ highly amongst the distinct network structures


... the individual networks do not vary in structure or hyparameters, at least not for this round. The only difference is the data the network is trained with.

yields to each network structure forming a cluster and allowing for potential backwards inference by looking at the structure back to the dataset used for testing through the mentioned clustering/classification of the trained networks in PC scatterplots?


If you mean that you can take a net network, of the same shape, and train it using the data from one of the 5 machines specified, and then look where its plot falls in the 2D PCA "weight-space', and use that location to infer which dataset it was trained with then yes.

Stated another way:

In this case, all 40,000+ networks have the same shape and hyperparameters, and each 1 (out of 5) datasets. Each individual network achieves a regression mean squared error on the evaluation data (for its respective training-data type) that is less that 10^4. So each one of these networks correctly predicts the sequence it is intended to predict. This is what everyone here is computing for Dataset 1.

Next, we take the learned weights of each of these networks (a 4364-length vector), and the dataset used to train it (class label), and feed those into a "meta" classifier, to see if we can classify, based solely on the learned weights, what dataset was used to train the network. So that if someone comes along with a new network that's trained with one of the 5 datasets, we can predict, with high accuracy, which dataset was used to train it. That's an interesting, though not very surprising, result in an of itself.

Next, we want to visualize the spread of the weight-space in 2 dimensions, so we take our dataset of 40,000+ 4364-length vectors and projects them down to a 2D component using PCA. What is nice about this plot is that it visualizes the variation in weights that achieve the same results, and shows a clustering in "weight-space" among datasets trained on similar data. With this, one can ask a whole new set of questions:


  • What does it mean when a network's weight space is "closer" to the centroid of a particular cluster? Is it somehow better? Less prone to adversarial examples?
  • Converesly, are networks whos PCAs are further away from the centroid?
  • Does this suggest another method to evaluate network performance/robustness besides loss (remember, all these networks perform roughly equally well given loss)



The problem is I know this particular set of machines should be trivial to differentiate in this way.. so before I celebrate too much, I want to do a lot more analysis.

ID: 280 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bozz4science

Send message
Joined: 9 Jul 20
Posts: 142
Credit: 11,536,204
RAC: 3
Message 282 - Posted: 29 Jul 2020, 21:57:53 UTC - in response to Message 280.  

Thanks so much again John for going into great detail in your response. I am sure that I'll have to keep rereading your answer a couple times before understanding it fully.

Wasn't really sure about what to make of the random classifier result. That's much clearer now. After all I am glad that I wasn't completely off. Definitely used the term network structure too loosely but just wanted to convey that the resulting networks trained on the different data sets display different weights in the end. Will try to adhere more closely to the definitions of those technical terms.

Even though I knew that you already expected the differentiation between the networks to be fairly easy according to their PCs to allow for inference of what dataset was used to train it, it still is a valid result. Thrilled about the next set of research question that arise as a consequence thereof. Very interesting!

One short follow-up question. I understand that a 2-PC plot allows for an intuitive visual 2-D scatterplot inspection. But isn't only the clustering of the PCA-plot of importance but maybe at a later stage of the experiment with a more complex setting (Hyperparameters changing) the share of variance they explain in the underlying data as well - That is also keeping the regression models' error rates in mind. I could imagine with rising complexity of the experiment setting, the MSE would rise + the variance explained by only the considered 2 Pc decrease. I believe if one were to increase the number of PCs from say now 2 to a 3 or 2<N<# of weights to allow for multi-dimensional separation planes of principal component clusters, that would maybe allow for a more precise differentiation of the neural networks.

P.S. Don't know of this has any relevance to the project. But would t-SNEs maybe be an interesting solution to inspect the results of the future + more complex runs.
ID: 282 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 462
Credit: 21,406,548
RAC: 0
Message 283 - Posted: 30 Jul 2020, 4:14:46 UTC - in response to Message 282.  

P.S. Don't know of this has any relevance to the project. But would t-SNEs maybe be an interesting solution to inspect the results of the future + more complex runs.


I'll get to the rest later, but I wanted to say thanks for this suggestion. I was running experiments tonight comparing dataset 1 results to the analogous machine on dataset 2, and classification algorithms were able to tell the difference (AWESOME early result), 2D PCA plots didn't show any separation. So I read up on non-linear reduction methods a bit based on your suggestion and tried UMAP, which is very similar to t-SNE but faster to compute, and was able to see some separation. Makes me feel a lot better! UMAP info

I'll share my findings in a few days in the form of a jupyter notebook I've been using. And soon, even if its early, I want to release a snapshot of the dataset for all to do their own analysis on. Just want to get those Parity and EightBit examples up over 1000 each first.
ID: 283 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bozz4science

Send message
Joined: 9 Jul 20
Posts: 142
Credit: 11,536,204
RAC: 3
Message 287 - Posted: 30 Jul 2020, 22:34:43 UTC - in response to Message 283.  

Of course! You are more than welcome. Without seeing the data that was basically just me guessing into the blue that you might run into trouble by just applying linear dimensionality reduction algorithms on your trained networks' weight vectors.

Haven't heard of UMAP before you mentioned it in your post. It's just astonishing with how quickly this space is evolving and how creative computer scientists are in deriving such solutions.

Looking forward to your insights!
ID: 287 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 462
Credit: 21,406,548
RAC: 0
Message 294 - Posted: 1 Aug 2020, 18:17:07 UTC

I've uploaded the output from my jupyter notebook here: https://www.mlcathome.org/analysis/mlcathome-ds1-ds2-first-look.html .
ID: 294 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bozz4science

Send message
Joined: 9 Jul 20
Posts: 142
Credit: 11,536,204
RAC: 3
Message 296 - Posted: 1 Aug 2020, 18:47:49 UTC

Thanks for sharing. Awesome read and promising preliminary results! Really curious to see how things will change upon introduction of changes in the set of hyperparameters when trained networks will inevitably become much more complex.
ID: 296 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Science : Early peek at Dataset 1 results

©2024 MLC@Home Team
A project of the Cognition, Robotics, and Learning (CORAL) Lab at the University of Maryland, Baltimore County (UMBC)