Science due diligence / research goal

Message boards : Science : Science due diligence / research goal
Message board moderation

To post messages, you must log in.

AuthorMessage
bozz4science

Send message
Joined: 9 Jul 20
Posts: 141
Credit: 11,533,376
RAC: 61,455
Message 149 - Posted: 9 Jul 2020, 14:48:52 UTC

Having recently stumbled upon your new project, I immediately liked the research question as well as the fact that the platform is open to the general public. I really think that studying this question could yield some interesting results. Really interested though in the background information of how you implement this project.

For me there are 2 important aspects when it comes to complex models, especially in the neural networks space. Model explainability and model interpretability. The former one being able to explain why a model yielded a certain prediction (came up with a specific solution) and the second one being all about knowing how the model actually came up with this solution. Usually practitioners are only very much concerned with the first aspect as this is basically the exploration of the optimal solution/forecast (aka least misclassification error) and thus want to understand why the results that an optimally trained network structure yields are found to be optimal and how they transfer into the real world. Basically this is concerned with causal probabilistic inference to derive a cause-effect relationship to make sense of the optimal solution.

The second component about model interpretability is usually very much neglected and is the reason why many call sophisticated network structures black-box models, as they only know what happens technically inside the network/model but can not really fathom/transfer it to the meaning in the real world in how the model comes up with the solution. Whereas the explanation of an optimal structure might be feasible, it is not really intuitive what the data transformation really means (feature selection, activation of data plus bias, weighting schemes of neurons, network topology, activation functions, learning rate, moment coefficients, gradient ascent/descent, stochastic optimisation of gradient method, error back propagation and weight updates etc.)

This is where to my understanding people turn towards partial dependency plots, global or local surrogate variables or Shapley value to get at least an intuitive understanding up to a certain level of what variables tend to explain the largest part of the overall predicted solution. You could also perform any kind of dimension reduction algorithm such as PCA to get a better intuition about the data input, however the NN training is still not more intuitive nonetheless. So for me this is really the issue with model interpretability. This is where you want to tackle the problem right?

Coming back to your description page you that numerous neural Networks are trained in parallel with tightly controlled input, hyperparameters, and network structures. For me personally, many questions remain.

- what technique is used to train the Networks (gradient ascent/descent or any stochastically improved technique) ?
- will those techniques vary according to the WU or batches?
- are any specific neural Network structures considered or simple plain vanilla ANNs?
- what data/data set is used for the training so to understand what the trained models try to predict? I don’t want to train some models on questionable data / with questionable purpose.
- do we focus on neural networks for classification (categorial output) or do we train neural network regression models (numerical output)
- what does tight control of hyperparameters mean? is a subset of HP fixed for all WUs and only 1 HP is changed at a time is each WU?
- are many hyperparameters changed simultaneously per WU, meaning a or a subset of the overall HP set? How do you control for overlapping effects?
- what hyperparameters do you consider? Number of hidden layers, neurons per layer, different activation functions, different gradient methods, bias, weighting schemes, weight initialisation, different split ratios of data set into training and test set, introduction of noise to data, sampling methods of training data, stratified sampling/holdout methods, regularisation, introduction of a momentum coefficient for learning, feature selection, feature subset sampling methods, feature engineering, etc. which in my understanding all qualifies as a hyper parameter which is any parameter whose value has to be decided and agreed upon before training commences? Doesn’t have to be an extensive list but just to get this gist where this project is heading to.
- will this set of considered hyperparameters change over time or is it static?
- will other types of networks be trained in the future? convolutional, recurrent or autoencoder networks for example which all have a rising number of real-world use cases?
- how do you intend to analyse the results produced by our crunched WU? According to which process/criteria do you analyse them? Any specific focus on some HP? how do you assess the generated model data set? Just the classification error rates for accuracy? How do you judge complexity? Any intention to render an algorithm which basically tries to find a model from the overall generated data set that (similar to PCA or regression) retain most of the overall variance in the data but with as few inputs as possible (basically at a given complexity (defined according to some weighted measure) would always yield the „simplest network structure“ at a given complexity (say some fixed/constrained hyperparameters) —> I just wonder what you want to analyse?

Would appreciate any additional information you can share with us :)
ID: 149 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 454
Credit: 14,284,444
RAC: 3,179
Message 152 - Posted: 10 Jul 2020, 16:29:05 UTC - in response to Message 149.  

Sorry for the delay in response, Thursdays are my busy days.

First of all, fantastic questions. I'll try to cover them all here as best I can. For a bit more information, I'm making a web page to cover most of what you ask.. you can see it off the main https://www.mlcathome.org/index.html, and click on "MLDS Datasets" on the sidebar. That page is still under construction!, but it answer some of your questions, and will be where the dataset is available for download.

I agree that I'm much more interested in interpretability of the network than explaining why a particular input produces a particular output. LRP, integrated gradients, etc.. all do a pretty good job already of showing what parts of the input contributed most the the particular output. But in order to do those, you need inputs. and you're essentially observing how the network transforms that particular input. Great, useful, but I want to look at a network without relying on a particular input. I believe that's in line with what you're interested in too. One area of research I've seen is Weiss et. al., where they're trying to extract automata directly from RNNs using grammatical inference models. Note: I was able to reproduce their work for classifiers, but not networks modelling a transducer, where every state is an accepting state. There are many other papers in this field as well, I simply use them as an example of people looking solely at the structure and weights of the network.

That said, at the moment, I'm simply creating a dataset, there's no reason it couldn't be used for both. Or something completely different.

Now onto your specific questions:

- what technique is used to train the Networks (gradient ascent/descent or any stochastically improved technique) ?

Stochastic gradient descent. Adam optimizer. Code is available on gitlab

- will those techniques vary according to the WU or batches?

For this first round, no, but for future work absolutely.

- are any specific neural Network structures considered or simple plain vanilla ANNs?

I'm really interested in RNNs, so this first batch is very simple stacked RNNs based on the ones from this paper , not because it's a rgeat paper per say, but it contains some very simple machines that are easy to model with an RNN and that are easy to train quickly. They provide a good testbed for getting the kinks out of the system.

- what data/data set is used for the training so to understand what the trained models try to predict? I don’t want to train some models on questionable data / with questionable purpose.

See the above paper, and the code on gitlab.com. The training data is random input sequences and observed output sequences to these toy "machines".

- do we focus on neural networks for classification (categorial output) or do we train neural network regression models (numerical output)

Regression. See above, we're predicting output sequences.

- what does tight control of hyperparameters mean? is a subset of HP fixed for all WUs and only 1 HP is changed at a time is each WU?

This first batch is fixed hyperparameters, but subsequent batches could change hyperparameters. If so, we don't just generate one, we'll generate thousands with the same set of hyperparameters. And those hyperparameters will be documented with that dataset so everyone knows what changes, and we have multiple examples for comparison.

- are many hyperparameters changed simultaneously per WU, meaning a or a subset of the overall HP set? How do you control for overlapping effects?

Since we have the power to run so many in parallel, I'd argue we should run several batches changing one at a time, and then combine them , so the resulting dataset has all that information captured for later analysis.

- what hyperparameters do you consider? Number of hidden layers, neurons per layer, different activation functions, different gradient methods, bias, weighting schemes, weight initialisation, different split ratios of data set into training and test set, introduction of noise to data, sampling methods of training data, stratified sampling/holdout methods, regularisation, introduction of a momentum coefficient for learning, feature selection, feature subset sampling methods, feature engineering, etc. which in my understanding all qualifies as a hyper parameter which is any parameter whose value has to be decided and agreed upon before training commences? Doesn’t have to be an extensive list but just to get this gist where this project is heading to.
[quote]
All will be considered. The current release of the client can change the number of hidden layers, hidden layer width, and different ratios of training and test. It's all pytorch under the hood, so anything they support, we can support. Adding more is simple a matter of adding more knobs to the binary.


- will this set of considered hyperparameters change over time or is it static?

The client will evolve over time and that set will certainly grow.

- will other types of networks be trained in the future? convolutional, recurrent or autoencoder networks for example which all have a rising number of real-world use cases?

Yes. I'm particularly interested in transformers...

- how do you intend to analyse the results produced by our crunched WU? According to which process/criteria do you analyse them? Any specific focus on some HP? how do you assess the generated model data set? Just the classification error rates for accuracy? How do you judge complexity? Any intention to render an algorithm which basically tries to find a model from the overall generated data set that (similar to PCA or regression) retain most of the overall variance in the data but with as few inputs as possible (basically at a given complexity (defined according to some weighted measure) would always yield the „simplest network structure“ at a given complexity (say some fixed/constrained hyperparameters) —> I just wonder what you want to analyse?

Well, I have some ideas for what I want to do with the dataset, but the dataset itself will be made public for all to analyze as they wish.

Please read both the front page at https://www.mlcathome.org and the mlds specific page at https://www.mlcathome.org/mlds.html, that'll give you more insight.

I hope that answered most of your questions, and again, thanks for your interest!
ID: 152 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 454
Credit: 14,284,444
RAC: 3,179
Message 153 - Posted: 10 Jul 2020, 16:44:14 UTC

Note that the above reply applies only to the MLDS app running on MLC@Home. I can see this infrastructure used for other apps as well, and I am talking with other researchers about it. A few other ideas that immediately come to mind are:


  • Neural architecture search
  • Neuro-evolution
  • Hyperparameter search for a specific goal



But as of right now, MLDS is the only application running on MCL@Home. As other projects come online, I'll be asking them to post more information here so others can choose whether to contribute to those subprojects or not.

ID: 153 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bozz4science

Send message
Joined: 9 Jul 20
Posts: 141
Credit: 11,533,376
RAC: 61,455
Message 154 - Posted: 11 Jul 2020, 12:12:52 UTC - in response to Message 153.  
Last modified: 11 Jul 2020, 12:16:14 UTC

Thank you very much for taking the time to go into great detail in your answer. Your response post is a very interesting read. Skimming through the technical paper you referred to, training the RNNs with the different simple architectures is basically just a starting point for your analysis, basically building the initial dataset of trained models that you want to study right?

Awesome that you agree with me on focusing mostly on the interpretability aspect of NN-derived predictions rather than the explainability aspect. Gitlab pointer is appreciated. Will take a closer look there. Really thrilled about seeing HPs changing over time with each model training batch and even more so about the potential results a comparison could yield.

You definitely have some hot research topics in mind for the future as well for subsequent applications. Definitely has some merit and your research pipeline deserves my support to enable further explorative science in this space.

Thanks as well for pointing out the MLDS website as I wasn't aware of it before. Glad to see that some effort is being put into explaining to others what kind of science is being done and particularly what goals the project has in mind.

Wish other project administrators were showing similarly swift response times and eloquent answers. Hope to see this standard being maintained over the course of the project and not just at the beginning as things are ramping up to get volunteers on board.

Like what I am seeing here and will stick around as I am interested in how things will develop even though I can only contribute with 12 threads of my old Xeon. Best of luck with this project!

Keep crunching away!

ID: 154 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[VENETO] boboviz

Send message
Joined: 11 Jul 20
Posts: 30
Credit: 1,230,257
RAC: 1
Message 207 - Posted: 22 Jul 2020, 7:02:44 UTC - in response to Message 154.  

I'm not expert of machine learning, so, i still not understand the purpose of the project.
For what we are simulate machine learning? For future Physics? Astronomy? Chemistry??
ID: 207 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
adrianxw

Send message
Joined: 1 Jul 20
Posts: 11
Credit: 286,780
RAC: 0
Message 210 - Posted: 22 Jul 2020, 12:56:11 UTC

I asked about what the "training" was about, no reply. I am, however, getting so many unexplained "validate errors" I am thinking of dumping the project already anyway.
ID: 210 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bozz4science

Send message
Joined: 9 Jul 20
Posts: 141
Credit: 11,533,376
RAC: 61,455
Message 217 - Posted: 22 Jul 2020, 19:14:01 UTC

To my understanding (I'll try to do shorten it thereby risking to make some generalisation errors) the basic idea is not training neural networks models here for any specific use case. The models are basically trained on random toy data (not having any semantic meaning). However the trained networks exhibit a different neural architecture that they are trained on. The idea is basically to open up the black box by inspecting how the inclusion and adjustment of certain parameters in neural network learning can effect its explanatory power and accuracy.
In this run (dataset 1 + 2) we focus on only changing the architecture of the network but leaving all else equal. Other parameters that affect training are then changed in subsequent runs.

The purpose is again not to train the model using real world data to model and forecast a real world use case but to gain more understanding in why certain changes in the parameters that define training, lead to a certain outcome in the trained model. Becoming able to interpret not only why a derived/predicted solution of the network is said to be optimal but also better be able to understand how exactly the solution was derived and how this solution differs upon changes in the set of hyperparameters. The implication of this research is thus far more reaching than only saying a certain application use case within a specific field but could help advance the understanding of neural network training in general/all domains leading to more expressive models, higher accuracy and foremost a better and more intuitive understanding of the process how the network generates particular solutions.

Hope I got it right and at least to a certain extent understandable without too much repetition. Pls let me know and I can try again.
ID: 217 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 454
Credit: 14,284,444
RAC: 3,179
Message 232 - Posted: 23 Jul 2020, 4:29:13 UTC - in response to Message 217.  

bozz4science is correct.

We're conducting fundamental research into understanding machine learning models. We're doing this with "toy" data that doesn't apply to any one field, but the understanding we hope to gain will help all fields that use machine learning better understand the limits and improve models that are used in those fields.... and ML is used in pretty much every field at the moment.
ID: 232 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[VENETO] boboviz

Send message
Joined: 11 Jul 20
Posts: 30
Credit: 1,230,257
RAC: 1
Message 236 - Posted: 24 Jul 2020, 20:50:06 UTC - in response to Message 232.  

Thank you!!
ID: 236 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Science : Science due diligence / research goal

©2021 MLC@Home Team
A project of the Cognition, Robotics, and Learning (CORAL) Lab at the University of Maryland, Baltimore County (UMBC)