Message boards :
News :
[TWIM Notes] Aug 31 2020
Message board moderation
Author | Message |
---|---|
Send message Joined: 30 Jun 20 Posts: 356 Credit: 7,867,308 RAC: 77,580 ![]() ![]() ![]() ![]() ![]() |
This Week in [email protected] Notes for Aug 31 2020 A weekly summary of news and notes for [email protected] The major news this week is that we tweaked the priority a little bit, which allowed a lot of progress on the simpler workunits, and as you'll notice 6 of the 10 network types in Datasets 1 and 2 have reached their goal of 10000 samples! This is a big milestone for the project, and is getting us closer to releasing the first official dataset from this effort. News:
SingleDirectMachine 10002/10004 EightBitMachine 9801/10006 SingleInvertMachine 10000/10003 SimpleXORMachine 10000/10002 ParityMachine 484/10005 ParityModified 70/10005 EightBitModified 2971/10006 SimpleXORModified 10005/10005 SingleDirectModified 10004/10004 SingleInvertModified 10002/10002 Last week's TWIM Notes: Aug 23 2020 Thanks again to all our volunteers! -- The [email protected] Admins |
Send message Joined: 11 Jul 20 Posts: 22 Credit: 520,875 RAC: 1,668 ![]() ![]() ![]() |
Lots of time this week spend investigating why older AMD and Intel machines had issues on Linux, which turned out to be related PyTorch v1.6 using MKL-DNN (now oneAPI) and hardcoding sse4 as a minimum requirement. Do you think, before or later, to introduce an optimized app for cpu (with sse2, for example)? |
Send message Joined: 30 Jun 20 Posts: 356 Credit: 7,867,308 RAC: 77,580 ![]() ![]() ![]() ![]() ![]() |
So the app already uses OpenBLAS and FBGEMM as the core for its algorithms, both of which dynamically check CPU capabilities at runtime and uses code paths that are fastest on that CPU. So if you have AVX512, it's using AVX512, if you have AVX2, it uses that, if you have sse4, it uses that, etc. So there's no need for a specific "sse2" or "avx2" version of the app, you're already using the best capabilities of your CPU. By default, the newest release of PyTorch, which v9.5x uses, *also* uses MKLDNN, a library from Intel that's specifically made to speed up neural networks on Intel CPUs. However, this library isn't as good at dynamically checking and adapting, and (by default) is compiled to require sse4. First, we tried changing the compiler settings to only require sse2, but that didn't work. There is little to no performance penalty for running without MKLDNN (intel now calls this lib oneAPI, after changing the name 3 times in the past year) for our current WUs (and likely future networks), so we've disabled MKLDNN at this time. And if we did eventually have WUs big enough to benefit from MKLDNN, then we'd benefit more from rolling out GPU support. Besides, my personal bias is against using Intel-specific libraries. Intel has a history of only optimizing all their libraries (including MKL and MKLDNN) for Intel processors, to the detriment of AMD processors or even older Intel processors. Both of which make up a significant portion of our contributor base. |
Send message Joined: 11 Jul 20 Posts: 22 Credit: 520,875 RAC: 1,668 ![]() ![]() ![]() |
Thanks for the answer, is very interesting!! |
Send message Joined: 9 Jul 20 Posts: 105 Credit: 2,157,126 RAC: 18,827 ![]() ![]() ![]() ![]() ![]() |
Thanks for the input and clarification! Personally, I think this it a great summary of the ongoing optimization/app issues. Maybe in the future you can just point people towards this post. :) Looking forward to the rollout of dataset 3 and the exciting possibility of more complex use cases that would maybe even require a GPU supported app version. Wish you a great week |
©2021 [email protected] Team
A project of the Cognition, Robotics, and Learning (CORAL) Lab at the University of Maryland, Baltimore County (UMBC)