This is a VERY preliminary analysuis of an early snapshot of the resutls. While these early results are encouraging, this is not a thorough anaylsis and should not be treated as such.
MLC@Home has been computing thousands of recurrent neural networks (RNNs) to learn simple sequences to mimic the behavior of 5 simple machine types. Each emulated machine has "memory", in that the output depends upon multiple previous inputs. Yuo can find out more about these machine in this paper: https://arxiv.org/abs/1805.07869 .
The networks are 4 GRU layers followed by 4 linear layers, have 9 inputs, 8 outputs, and a hidden layer width of 12. For a total of 4364 learned parameters.
Further datasets will be more complex, with more complex machines and larger networks. This is a partial evaluation of the current dataset as of Jul 27, 2020.
For Dataset 1, each network is exactly the same shape, and trained with the same hyperparameters, the only difference is the
Each network is trained with its dataset until it is able to predict a sequence on evaluation data to within 10^-4 of the real value. So, using loss as the sole evaluation critera, each of these networks performs roughly the same.
Dataset 1 currently in progress, and contains over 9000 examples for 4 of the 5 machines, and only ~200 examples of the ParityMachine type, as it takes much longer to compute that the others.
The learned weights of the currently complete networks in dataset 1 is in the file ds1.csv.
Dataset 2 is nearly identical to Dataset 1, with the exception that the learned machines are modified to change their behavior slightly if they receive a special input sequence. This is to determine if we can detect even a minor change to the behavior of the network based solely on the shape and weights of the network.
The learned weights of the currently complete networks in dataset 1 is in the file ds2.csv.
Can we determine which networks were trained to mimic which machines?
To do this, we'll build a simple classifier, using the 4634 parameters of each network as featuer vectors, and the machine they were trained to mimic as a class label.
We'll use sklearn instead of going full PyTorch/Keras since this is a quick test.
%matplotlib inline
import sys
import os
import csv
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.neural_network import MLPClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.gaussian_process import GaussianProcessClassifier
from sklearn.gaussian_process.kernels import RBF
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
from sklearn.metrics import confusion_matrix
from sklearn.decomposition import PCA
from sklearn.model_selection import train_test_split
import umap
headers = ['class', 'entry']
for i in range(0, 4364):
headers.append('val'+str(i))
doc1 = pd.read_csv('d1.csv', header=None, names=headers)
doc2 = pd.read_csv('d2.csv', header=None, names=headers)
doc1.head()
doc = pd.concat([doc1, doc2])
We've now loaded the entire set of both datasets into memory. doc1 contains Dataset 1, doc2 contains dataset2, and doc contains them both.
Let's start with Dataset 1, and see how it does with a RandomForest classifier.
Y = doc1['class'].values
X = doc1.drop(['class', 'entry'], axis=1).values
Xt, Xe, Yt, Ye = train_test_split(X, Y, test_size=0.20)
cls = RandomForestClassifier(max_depth=16)
cls.fit(Xt, Yt)
print("Eval Accuracy: ", cls.score(Xe, Ye))
Yp = cls.predict(Xe)
confusion_matrix(Ye, Yp, labels=['EightBitMachine', 'ParityMachine', 'SimpleXORMachine',
'SingleDirectMachine', 'SingleInvertMachine'])
We can see we do quite well, able to classify which machine with 99% accuracy, and most of the confusion comes from the ParityMachine scraggler, of which there is much less data.
Lets do some simple dimensionality reduction and plot the input vectors in 2D space:
pca = PCA(n_components=2)
pca.fit(X)
Xpca = pca.transform(X)
df = pd.DataFrame(data = Xpca, columns = ['PC1', 'PC2'])
df['class'] = Y
sns.lmplot( x="PC1", y="PC2",
data=df,
fit_reg=False,
hue='class', # color by cluster
legend=True,
scatter_kws={"s": 4}) # specify the point size
This is a very cool result, as it shows clear clustering, even in the 2D weight space, for the machine types. This confirms that classification should be relatively easy (as shown above with 99% accuracy) for machines with such different datasets. Each point represents the dimensionally-reduced weights of a network, and each one has a similar loss value, but this leads to new questions:
If we do the same analysis for Dataset 2, we get very similar results, despite the much smaller number of results:
Y = doc2['class'].values
X = doc2.drop(['class', 'entry'], axis=1).values
Xt, Xe, Yt, Ye = train_test_split(X, Y, test_size=0.20)
cls = RandomForestClassifier(max_depth=16)
cls.fit(Xt, Yt)
print("Eval Accuracy: ", cls.score(Xe, Ye))
Yp = cls.predict(Xe)
confusion_matrix(Ye, Yp, labels=['EightBitModified', 'ParityModified', 'SimpleXORModified',
'SingleDirectModified', 'SingleInvertModified'])
pca = PCA(n_components=2)
pca.fit(X)
Xpca = pca.transform(X)
df = pd.DataFrame(data = Xpca, columns = ['PC1', 'PC2'])
df['class'] = Y
sns.lmplot( x="PC1", y="PC2",
data=df,
fit_reg=False,
hue='class', # color by cluster
legend=True,
scatter_kws={"s": 4}) # specify the point size
So looking at each dataset individually, we can see that it should be (and is!) trivial to build a "meta classifier" that can accurately predict which machine a network was trained with.
Next we want to see if we can differentiate between two very similar networks, with only a small difference in the training data. This is going to be a lot more difficult. You can see this if you re-run the above experiments on the combined datasets 1 and 2. The classification is hampered by the unbalanced dataset, and the PCA plot is as well, showing lots of overlap between the 'Machine' and 'Modified' versions of the same class:
Y = doc['class'].values
X = doc.drop(['class', 'entry'], axis=1).values
Xt, Xe, Yt, Ye = train_test_split(X, Y, test_size=0.20)
cls = RandomForestClassifier(max_depth=16)
cls.fit(Xt, Yt)
print("Eval Accuracy: ", cls.score(Xe, Ye))
Yp = cls.predict(Xe)
confusion_matrix(Ye, Yp, labels=['EightBitMachine','EightBitModified',
'ParityMachine','ParityModified',
'SimpleXORMachine','SimpleXORModified',
'SingleDirectMachine','SingleDirectModified',
'SingleInvertMachine','SingleInvertModified'])
pca = PCA(n_components=2)
pca.fit(X)
Xpca = pca.transform(X)
df = pd.DataFrame(data = Xpca, columns = ['PC1', 'PC2'])
df['class'] = Y
sns.lmplot( x="PC1", y="PC2",
data=df,
fit_reg=False,
hue='class', # color by cluster
legend=True,
scatter_kws={"s": 4}) # specify the point size
... not very much difference there. However, lets look at each individual pair of similar machines in isolation, and try a few different classification algorithms, and with a balanced dataset (taking roughly the minimum number of available samples in datasets 1 and 2). We'll also try a non-linear dimensionality reduction algorithm (UMAP) instead of just PCA to check out the results.
classifiers = { \
#"KNN": KNeighborsClassifier(3),
"Linear SVM": SVC(kernel="linear", C=0.025),
#"RBF SVM": SVC(gamma=2, C=1),
"Gaussian Process": GaussianProcessClassifier(1.0 * RBF(1.0)),
"Decision Tree": DecisionTreeClassifier(max_depth=5),
"Random Forest": RandomForestClassifier(max_depth=32),
"Neural Net": MLPClassifier(alpha=0.1, max_iter=1000),
"AdaBoost": AdaBoostClassifier(),
"NaiveBayes": GaussianNB(),
#"QDA": QuadraticDiscriminantAnalysis()
}
def per_mach(name, N=2000, classifiers={'RandomForest':RandomForestClassifier(max_depth=32)}):
if name in ['Modified','Machine']:
sd = doc[doc['class'].str.contains(name)]
else:
Xmach = doc[doc['class'] == name+'Machine'].sample(n=N)
Xmod = doc[doc['class'] == name+'Modified'].sample(n=N)
sd = pd.concat([Xmach, Xmod])
sdX = sd.drop(['class', 'entry'], axis=1).values
sdY = sd['class'].values
Xt, Xe, Yt, Ye = train_test_split(sdX, sdY, test_size=0.20)
for k in classifiers:
print('Classifier: ', k)
classifiers[k].fit(Xt, Yt)
print('- Eval accuracy: ', classifiers[k].score(Xe, Ye))
Yp = classifiers[k].predict(Xe)
print('- Confusion Matrix:\n', confusion_matrix(Ye, Yp))
print('\n')
print('PCA:')
pca = PCA(n_components=2)
pca.fit(sdX)
Xpca = pca.transform(sdX)
df = pd.DataFrame(data = Xpca, columns = ['PC1', 'PC2'])
df['class'] = sdY
sns.lmplot( x="PC1", y="PC2",
data=df,
fit_reg=False,
hue='class', # color by cluster
legend=True,
scatter_kws={"s": 4}) # specify the point size
print('UMAP:')
u = umap.UMAP(n_components=2)
Xumap = u.fit_transform(sdX)
df = pd.DataFrame(data = Xumap, columns = ['UMAP1', 'UMAP2'])
df['class'] = sdY
sns.lmplot( x="UMAP1", y="UMAP2",
data=df,
fit_reg=False,
hue='class', # color by cluster
legend=True,
scatter_kws={"s": 4}) # specify the point size
per_mach('SingleDirect', classifiers=classifiers)
per_mach('SingleInvert', classifiers=classifiers)
per_mach('SimpleXOR', classifiers=classifiers)
per_mach('EightBit', classifiers=classifiers, N=300)
per_mach('Parity', classifiers=classifiers, N=10)
Obviously, the last two results are too early to tell, and for some reason the SingleInvert data is hard to separate at the moment, but promising results otherwise! The UMAP plots really do show some interesting separation that the PCA plots do not. I wonder what those look like when run on the full dataset?
print('Dataset1 UMAP:')
Y = doc1['class'].values
X = doc1.drop(['class', 'entry'], axis=1).values
u = umap.UMAP(n_components=2)
Xumap = u.fit_transform(X)
df = pd.DataFrame(data = Xumap, columns = ['UMAP1', 'UMAP2'])
df['class'] = Y
sns.lmplot( x="UMAP1", y="UMAP2",
data=df,
fit_reg=False,
hue='class', # color by cluster
legend=True,
scatter_kws={"s": 4}) # specify the point size
print('Dataset2 UMAP:')
Y = doc2['class'].values
X = doc2.drop(['class', 'entry'], axis=1).values
u = umap.UMAP(n_components=2)
Xumap = u.fit_transform(X)
df = pd.DataFrame(data = Xumap, columns = ['UMAP1', 'UMAP2'])
df['class'] = Y
sns.lmplot( x="UMAP1", y="UMAP2",
data=df,
fit_reg=False,
hue='class', # color by cluster
legend=True,
scatter_kws={"s": 4}) # specify the point size
print('Dataset1+2 UMAP:')
Y = doc['class'].values
X = doc.drop(['class', 'entry'], axis=1).values
u = umap.UMAP(n_components=2)
Xumap = u.fit_transform(X)
df = pd.DataFrame(data = Xumap, columns = ['UMAP1', 'UMAP2'])
df['class'] = Y
sns.lmplot( x="UMAP1", y="UMAP2",
data=df,
fit_reg=False,
hue='class', # color by cluster
legend=True,
scatter_kws={"s": 4}) # specify the point size
Well, I'm not sure I can interpret them, but they sure to look pretty. And I see some more separation there than in the PCA plots.
There's a lore more to be done, but early returns are promising. Stay Tuned.