Early Results for MLC@Home Datasets 1 and 2

Disclaimer

This is a VERY preliminary analysuis of an early snapshot of the resutls. While these early results are encouraging, this is not a thorough anaylsis and should not be treated as such.

Introduction

MLC@Home has been computing thousands of recurrent neural networks (RNNs) to learn simple sequences to mimic the behavior of 5 simple machine types. Each emulated machine has "memory", in that the output depends upon multiple previous inputs. Yuo can find out more about these machine in this paper: https://arxiv.org/abs/1805.07869 .

The networks are 4 GRU layers followed by 4 linear layers, have 9 inputs, 8 outputs, and a hidden layer width of 12. For a total of 4364 learned parameters.

Further datasets will be more complex, with more complex machines and larger networks. This is a partial evaluation of the current dataset as of Jul 27, 2020.

Dataset 1

For Dataset 1, each network is exactly the same shape, and trained with the same hyperparameters, the only difference is the

  • The random initial values of the network
  • The random seed for gradient descent
  • Which of the 5 machine types it was trained to mimic

Each network is trained with its dataset until it is able to predict a sequence on evaluation data to within 10^-4 of the real value. So, using loss as the sole evaluation critera, each of these networks performs roughly the same.

Dataset 1 currently in progress, and contains over 9000 examples for 4 of the 5 machines, and only ~200 examples of the ParityMachine type, as it takes much longer to compute that the others.

The learned weights of the currently complete networks in dataset 1 is in the file ds1.csv.

Dataset 2

Dataset 2 is nearly identical to Dataset 1, with the exception that the learned machines are modified to change their behavior slightly if they receive a special input sequence. This is to determine if we can detect even a minor change to the behavior of the network based solely on the shape and weights of the network.

The learned weights of the currently complete networks in dataset 1 is in the file ds2.csv.

First experiment: Classification within Dataset

Can we determine which networks were trained to mimic which machines?

To do this, we'll build a simple classifier, using the 4634 parameters of each network as featuer vectors, and the machine they were trained to mimic as a class label.

We'll use sklearn instead of going full PyTorch/Keras since this is a quick test.

In [1]:
%matplotlib inline

import sys
import os
import csv
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.neural_network import MLPClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.gaussian_process import GaussianProcessClassifier
from sklearn.gaussian_process.kernels import RBF
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
from sklearn.metrics import confusion_matrix
from sklearn.decomposition import PCA
from sklearn.model_selection import train_test_split
import umap
In [2]:
headers = ['class', 'entry']
for i in range(0, 4364):
    headers.append('val'+str(i))
doc1 = pd.read_csv('d1.csv', header=None, names=headers)
doc2 = pd.read_csv('d2.csv', header=None, names=headers)
In [3]:
doc1.head()
Out[3]:
class entry val0 val1 val2 val3 val4 val5 val6 val7 ... val4354 val4355 val4356 val4357 val4358 val4359 val4360 val4361 val4362 val4363
0 SingleInvertMachine SingleInvertMachine-1594331092-30013 -0.453560 -0.703258 -0.472655 1.217689 -0.412960 -0.773222 -0.781114 -0.588142 ... -0.053443 0.008272 0.025197 0.065849 -0.067535 -0.028521 0.175748 0.142259 0.025110 0.148995
1 SingleInvertMachine SingleInvertMachine-1593745513-1863 0.367368 0.329480 0.017657 -0.898854 -0.101669 -0.040806 0.208547 0.147422 ... -0.207324 -0.221050 0.127079 0.001427 -0.077376 -0.049500 -0.006470 -0.050620 -0.022035 0.125468
2 SingleInvertMachine SingleInvertMachine-1594397570-11850 0.206941 0.214054 0.169069 -1.259889 0.485635 0.221456 0.163483 0.157004 ... 0.029640 0.261677 0.081336 0.162375 -0.001597 0.094809 -0.074691 -0.169725 0.168352 0.213407
3 SingleInvertMachine SingleInvertMachine-1593664455-19171 0.559108 0.258271 0.051642 -1.303281 0.204134 0.339547 0.210711 0.309061 ... 0.140846 -0.025567 0.114201 0.058845 -0.098979 0.324120 -0.073421 -0.061150 0.035470 -0.033390
4 SingleInvertMachine SingleInvertMachine-1593743053-23330 0.211888 -0.313578 -0.190704 -0.697603 0.046038 -0.254627 0.088193 0.047175 ... 0.164250 -0.034136 0.166396 0.083894 -0.026291 0.230062 -0.055630 0.150785 -0.008049 -0.075735

5 rows × 4366 columns

In [4]:
doc = pd.concat([doc1, doc2])

We've now loaded the entire set of both datasets into memory. doc1 contains Dataset 1, doc2 contains dataset2, and doc contains them both.

Let's start with Dataset 1, and see how it does with a RandomForest classifier.

In [5]:
Y = doc1['class'].values
X = doc1.drop(['class', 'entry'], axis=1).values
Xt, Xe, Yt, Ye = train_test_split(X, Y, test_size=0.20)
cls = RandomForestClassifier(max_depth=16)
cls.fit(Xt, Yt)
print("Eval Accuracy: ", cls.score(Xe, Ye))
Yp = cls.predict(Xe)
confusion_matrix(Ye, Yp, labels=['EightBitMachine', 'ParityMachine', 'SimpleXORMachine',
                                 'SingleDirectMachine', 'SingleInvertMachine'])
Eval Accuracy:  0.9921412007214635
Out[5]:
array([[1961,    0,    7,    0,    0],
       [  43,    0,    9,    0,    0],
       [   0,    0, 1908,    0,    0],
       [   0,    0,    0, 1915,    0],
       [   0,    0,    2,    0, 1917]])

We can see we do quite well, able to classify which machine with 99% accuracy, and most of the confusion comes from the ParityMachine scraggler, of which there is much less data.

Lets do some simple dimensionality reduction and plot the input vectors in 2D space:

In [6]:
pca = PCA(n_components=2)
pca.fit(X)
Xpca = pca.transform(X)
df = pd.DataFrame(data = Xpca, columns = ['PC1', 'PC2'])
df['class'] = Y

sns.lmplot( x="PC1", y="PC2",
  data=df, 
  fit_reg=False, 
  hue='class', # color by cluster
  legend=True,
  scatter_kws={"s": 4}) # specify the point size
Out[6]:
<seaborn.axisgrid.FacetGrid at 0x7fd82e9ad460>

This is a very cool result, as it shows clear clustering, even in the 2D weight space, for the machine types. This confirms that classification should be relatively easy (as shown above with 99% accuracy) for machines with such different datasets. Each point represents the dimensionally-reduced weights of a network, and each one has a similar loss value, but this leads to new questions:

  • What are the consequences of a network that is far from the cluster centroid?
  • What about those closer to the cluster centroid? Is one "better" than the other? in what way?

If we do the same analysis for Dataset 2, we get very similar results, despite the much smaller number of results:

In [7]:
Y = doc2['class'].values
X = doc2.drop(['class', 'entry'], axis=1).values
Xt, Xe, Yt, Ye = train_test_split(X, Y, test_size=0.20)
cls = RandomForestClassifier(max_depth=16)
cls.fit(Xt, Yt)
print("Eval Accuracy: ", cls.score(Xe, Ye))
Yp = cls.predict(Xe)
confusion_matrix(Ye, Yp, labels=['EightBitModified', 'ParityModified', 'SimpleXORModified',
                                 'SingleDirectModified', 'SingleInvertModified'])
Eval Accuracy:  0.9959973315543695
Out[7]:
array([[ 58,   0,   2,   0,   0],
       [  0,   0,   4,   0,   0],
       [  0,   0, 478,   0,   0],
       [  0,   0,   0, 494,   0],
       [  0,   0,   0,   0, 463]])
In [8]:
pca = PCA(n_components=2)
pca.fit(X)
Xpca = pca.transform(X)
df = pd.DataFrame(data = Xpca, columns = ['PC1', 'PC2'])
df['class'] = Y

sns.lmplot( x="PC1", y="PC2",
  data=df, 
  fit_reg=False, 
  hue='class', # color by cluster
  legend=True,
  scatter_kws={"s": 4}) # specify the point size
Out[8]:
<seaborn.axisgrid.FacetGrid at 0x7fd8ac30bdc0>

So looking at each dataset individually, we can see that it should be (and is!) trivial to build a "meta classifier" that can accurately predict which machine a network was trained with.

Experiment 2: Can you classify between datasets?

Next we want to see if we can differentiate between two very similar networks, with only a small difference in the training data. This is going to be a lot more difficult. You can see this if you re-run the above experiments on the combined datasets 1 and 2. The classification is hampered by the unbalanced dataset, and the PCA plot is as well, showing lots of overlap between the 'Machine' and 'Modified' versions of the same class:

In [9]:
Y = doc['class'].values
X = doc.drop(['class', 'entry'], axis=1).values
Xt, Xe, Yt, Ye = train_test_split(X, Y, test_size=0.20)
cls = RandomForestClassifier(max_depth=16)
cls.fit(Xt, Yt)
print("Eval Accuracy: ", cls.score(Xe, Ye))
Yp = cls.predict(Xe)
confusion_matrix(Ye, Yp, labels=['EightBitMachine','EightBitModified', 
                                 'ParityMachine','ParityModified', 
                                 'SimpleXORMachine','SimpleXORModified',
                                 'SingleDirectMachine','SingleDirectModified', 
                                 'SingleInvertMachine','SingleInvertModified'])
Eval Accuracy:  0.8598272138228942
Out[9]:
array([[1891,    0,    0,    0,    6,    2,    1,    0,    0,    0],
       [  58,    0,    0,    0,    0,    0,    0,    0,    0,    0],
       [  33,    0,    0,    0,    7,    0,    0,    0,    0,    0],
       [   1,    0,    0,    0,    0,    0,    0,    0,    0,    0],
       [   0,    0,    0,    0, 2050,    0,    0,    0,    0,    0],
       [   0,    0,    0,    0,  464,   27,    0,    0,    0,    0],
       [   0,    0,    0,    0,    0,    0, 1870,    2,    0,    0],
       [   0,    0,    0,    0,    0,    0,  227,  221,    0,    0],
       [   0,    0,    0,    0,    3,    0,    0,    0, 1903,    0],
       [   0,    0,    0,    0,    0,    0,    0,    0,  494,    0]])
In [10]:
pca = PCA(n_components=2)
pca.fit(X)
Xpca = pca.transform(X)
df = pd.DataFrame(data = Xpca, columns = ['PC1', 'PC2'])
df['class'] = Y

sns.lmplot( x="PC1", y="PC2",
  data=df, 
  fit_reg=False, 
  hue='class', # color by cluster
  legend=True,
  scatter_kws={"s": 4}) # specify the point size
Out[10]:
<seaborn.axisgrid.FacetGrid at 0x7fd823ee8880>

... not very much difference there. However, lets look at each individual pair of similar machines in isolation, and try a few different classification algorithms, and with a balanced dataset (taking roughly the minimum number of available samples in datasets 1 and 2). We'll also try a non-linear dimensionality reduction algorithm (UMAP) instead of just PCA to check out the results.

In [11]:
classifiers = { \
    #"KNN": KNeighborsClassifier(3),
    "Linear SVM": SVC(kernel="linear", C=0.025),
    #"RBF SVM": SVC(gamma=2, C=1),
    "Gaussian Process": GaussianProcessClassifier(1.0 * RBF(1.0)),
    "Decision Tree": DecisionTreeClassifier(max_depth=5),
    "Random Forest": RandomForestClassifier(max_depth=32),
    "Neural Net": MLPClassifier(alpha=0.1, max_iter=1000),
    "AdaBoost": AdaBoostClassifier(),
    "NaiveBayes": GaussianNB(),
    #"QDA": QuadraticDiscriminantAnalysis()
}

def per_mach(name, N=2000, classifiers={'RandomForest':RandomForestClassifier(max_depth=32)}):
    if name in ['Modified','Machine']:
        sd = doc[doc['class'].str.contains(name)]
    else:
        Xmach = doc[doc['class'] == name+'Machine'].sample(n=N)
        Xmod = doc[doc['class'] == name+'Modified'].sample(n=N)
        sd = pd.concat([Xmach, Xmod])
    sdX = sd.drop(['class', 'entry'], axis=1).values
    sdY = sd['class'].values

    Xt, Xe, Yt, Ye = train_test_split(sdX, sdY, test_size=0.20)
    
    for k in classifiers:
        print('Classifier: ', k)
        classifiers[k].fit(Xt, Yt)
        print('- Eval accuracy: ', classifiers[k].score(Xe, Ye))
        Yp = classifiers[k].predict(Xe)
        print('- Confusion Matrix:\n', confusion_matrix(Ye, Yp))
        print('\n')
        
    print('PCA:')
    pca = PCA(n_components=2)
    pca.fit(sdX)
    Xpca = pca.transform(sdX)
    df = pd.DataFrame(data = Xpca, columns = ['PC1', 'PC2'])
    df['class'] = sdY

    sns.lmplot( x="PC1", y="PC2",
      data=df, 
      fit_reg=False, 
      hue='class', # color by cluster
      legend=True,
      scatter_kws={"s": 4}) # specify the point size
    
    print('UMAP:')
    u = umap.UMAP(n_components=2)
    Xumap = u.fit_transform(sdX)
    df = pd.DataFrame(data = Xumap, columns = ['UMAP1', 'UMAP2'])
    df['class'] = sdY

    sns.lmplot( x="UMAP1", y="UMAP2",
      data=df, 
      fit_reg=False, 
      hue='class', # color by cluster
      legend=True,
      scatter_kws={"s": 4}) # specify the point size
In [12]:
per_mach('SingleDirect', classifiers=classifiers)
Classifier:  Linear SVM
- Eval accuracy:  0.97625
- Confusion Matrix:
 [[395   5]
 [ 14 386]]


Classifier:  Gaussian Process
- Eval accuracy:  0.5
- Confusion Matrix:
 [[400   0]
 [400   0]]


Classifier:  Decision Tree
- Eval accuracy:  0.89125
- Confusion Matrix:
 [[355  45]
 [ 42 358]]


Classifier:  Random Forest
- Eval accuracy:  0.99125
- Confusion Matrix:
 [[394   6]
 [  1 399]]


Classifier:  Neural Net
- Eval accuracy:  0.97
- Confusion Matrix:
 [[390  10]
 [ 14 386]]


Classifier:  AdaBoost
- Eval accuracy:  0.98625
- Confusion Matrix:
 [[393   7]
 [  4 396]]


Classifier:  NaiveBayes
- Eval accuracy:  1.0
- Confusion Matrix:
 [[400   0]
 [  0 400]]


PCA:
UMAP:
In [13]:
per_mach('SingleInvert', classifiers=classifiers)
Classifier:  Linear SVM
- Eval accuracy:  0.52625
- Confusion Matrix:
 [[220 198]
 [181 201]]


Classifier:  Gaussian Process
- Eval accuracy:  0.50625
- Confusion Matrix:
 [[278 140]
 [255 127]]


Classifier:  Decision Tree
- Eval accuracy:  0.4975
- Confusion Matrix:
 [[203 215]
 [187 195]]


Classifier:  Random Forest
- Eval accuracy:  0.5
- Confusion Matrix:
 [[189 229]
 [171 211]]


Classifier:  Neural Net
- Eval accuracy:  0.50625
- Confusion Matrix:
 [[191 227]
 [168 214]]


Classifier:  AdaBoost
- Eval accuracy:  0.525
- Confusion Matrix:
 [[200 218]
 [162 220]]


Classifier:  NaiveBayes
- Eval accuracy:  0.53125
- Confusion Matrix:
 [[232 186]
 [189 193]]


PCA:
UMAP:
In [14]:
per_mach('SimpleXOR', classifiers=classifiers)
Classifier:  Linear SVM
- Eval accuracy:  0.88375
- Confusion Matrix:
 [[351  32]
 [ 61 356]]


Classifier:  Gaussian Process
- Eval accuracy:  0.47875
- Confusion Matrix:
 [[383   0]
 [417   0]]


Classifier:  Decision Tree
- Eval accuracy:  0.79875
- Confusion Matrix:
 [[353  30]
 [131 286]]


Classifier:  Random Forest
- Eval accuracy:  0.97125
- Confusion Matrix:
 [[362  21]
 [  2 415]]


Classifier:  Neural Net
- Eval accuracy:  0.895
- Confusion Matrix:
 [[360  23]
 [ 61 356]]


Classifier:  AdaBoost
- Eval accuracy:  0.955
- Confusion Matrix:
 [[367  16]
 [ 20 397]]


Classifier:  NaiveBayes
- Eval accuracy:  0.98625
- Confusion Matrix:
 [[379   4]
 [  7 410]]


PCA:
UMAP:
In [15]:
per_mach('EightBit', classifiers=classifiers, N=300)
Classifier:  Linear SVM
- Eval accuracy:  0.825
- Confusion Matrix:
 [[66  3]
 [18 33]]


Classifier:  Gaussian Process
- Eval accuracy:  0.575
- Confusion Matrix:
 [[69  0]
 [51  0]]


Classifier:  Decision Tree
- Eval accuracy:  0.7916666666666666
- Confusion Matrix:
 [[59 10]
 [15 36]]


Classifier:  Random Forest
- Eval accuracy:  0.9583333333333334
- Confusion Matrix:
 [[65  4]
 [ 1 50]]


Classifier:  Neural Net
- Eval accuracy:  0.825
- Confusion Matrix:
 [[66  3]
 [18 33]]


Classifier:  AdaBoost
- Eval accuracy:  0.9333333333333333
- Confusion Matrix:
 [[66  3]
 [ 5 46]]


Classifier:  NaiveBayes
- Eval accuracy:  0.975
- Confusion Matrix:
 [[66  3]
 [ 0 51]]


PCA:
UMAP:
In [16]:
per_mach('Parity', classifiers=classifiers, N=10)
Classifier:  Linear SVM
- Eval accuracy:  0.5
- Confusion Matrix:
 [[2 0]
 [2 0]]


Classifier:  Gaussian Process
- Eval accuracy:  0.5
- Confusion Matrix:
 [[2 0]
 [2 0]]


Classifier:  Decision Tree
- Eval accuracy:  0.5
- Confusion Matrix:
 [[0 2]
 [0 2]]


Classifier:  Random Forest
- Eval accuracy:  0.5
- Confusion Matrix:
 [[1 1]
 [1 1]]


Classifier:  Neural Net
- Eval accuracy:  0.25
- Confusion Matrix:
 [[1 1]
 [2 0]]


Classifier:  AdaBoost
- Eval accuracy:  0.25
- Confusion Matrix:
 [[1 1]
 [2 0]]


Classifier:  NaiveBayes
- Eval accuracy:  0.5
- Confusion Matrix:
 [[1 1]
 [1 1]]


PCA:
UMAP:

Obviously, the last two results are too early to tell, and for some reason the SingleInvert data is hard to separate at the moment, but promising results otherwise! The UMAP plots really do show some interesting separation that the PCA plots do not. I wonder what those look like when run on the full dataset?

In [17]:
print('Dataset1 UMAP:')
Y = doc1['class'].values
X = doc1.drop(['class', 'entry'], axis=1).values
u = umap.UMAP(n_components=2)
Xumap = u.fit_transform(X)
df = pd.DataFrame(data = Xumap, columns = ['UMAP1', 'UMAP2'])
df['class'] = Y

sns.lmplot( x="UMAP1", y="UMAP2",
      data=df, 
      fit_reg=False, 
      hue='class', # color by cluster
      legend=True,
      scatter_kws={"s": 4}) # specify the point size
Dataset1 UMAP:
Out[17]:
<seaborn.axisgrid.FacetGrid at 0x7fd823f18af0>
In [18]:
print('Dataset2 UMAP:')
Y = doc2['class'].values
X = doc2.drop(['class', 'entry'], axis=1).values
u = umap.UMAP(n_components=2)
Xumap = u.fit_transform(X)
df = pd.DataFrame(data = Xumap, columns = ['UMAP1', 'UMAP2'])
df['class'] = Y

sns.lmplot( x="UMAP1", y="UMAP2",
      data=df, 
      fit_reg=False, 
      hue='class', # color by cluster
      legend=True,
      scatter_kws={"s": 4}) # specify the point size
Dataset2 UMAP:
Out[18]:
<seaborn.axisgrid.FacetGrid at 0x7fd826138d60>
In [19]:
print('Dataset1+2 UMAP:')
Y = doc['class'].values
X = doc.drop(['class', 'entry'], axis=1).values
u = umap.UMAP(n_components=2)
Xumap = u.fit_transform(X)
df = pd.DataFrame(data = Xumap, columns = ['UMAP1', 'UMAP2'])
df['class'] = Y

sns.lmplot( x="UMAP1", y="UMAP2",
      data=df, 
      fit_reg=False, 
      hue='class', # color by cluster
      legend=True,
      scatter_kws={"s": 4}) # specify the point size
Dataset1+2 UMAP:
Out[19]:
<seaborn.axisgrid.FacetGrid at 0x7fd82d0f5e80>

Well, I'm not sure I can interpret them, but they sure to look pretty. And I see some more separation there than in the PCA plots.

There's a lore more to be done, but early returns are promising. Stay Tuned.

In [ ]: