Questions and Answers :
Unix/Linux :
NPU and TPU AI Co-processors
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 1 Jul 20 Posts: 22 Credit: 4,873,126 RAC: 84 |
Will this project support NPUs (Neural Processing Unit) and/or TPUs (Tensor Processing Unit), such as the ARM Ethos-N37/N57/N77, the Lightspeeur 2801S in the Orange Pi 4B or the Google Edge TPU in e.g. the Coral Dev Board, the Rock Pi N10 and the Asus Tinker Edge T and Tinker Edge R ? |
|
Send message Joined: 30 Jun 20 Posts: 462 Credit: 21,406,548 RAC: 0 |
Definitely interested in this, although I would say its lower priority than windows or GPU support. MLDS uses libtorch (PyTorch C++ frontend) under the hood, so in theory, whatever PyTorch supports, we can support. The caveat to that is that a) the current round of networks being trained barely benefit from accelerators because they're so (relatively) small, and b) I'd need to make sure it is OK to distribute any libraries those systems depend on. For NVidia, PyTorch needs cudann, which unlike the rest of cuda, requires an nvidia sign-in to download from their site. I don't know if NPUs/TPUs have similar restrictions or not. |
|
Send message Joined: 3 Jul 20 Posts: 2 Credit: 1,920,428 RAC: 0 |
But you will eventually have projects using nvidia gpus? Some of the older teslas like the m40 and the k80 are fairly cheap these days and have a lot of horsepower. Not to mention a lot of folks who run boinc are also gamers or devs and have some beefy gpus |
|
Send message Joined: 1 Jul 20 Posts: 22 Credit: 4,873,126 RAC: 84 |
But you will eventually have projects using nvidia gpus? Some of the older teslas like the m40 and the k80 are fairly cheap these days and have a lot of horsepower. Not to mention a lot of folks who run boinc are also gamers or devs and have some beefy gpus If PyTorch can be used to accelerate computing for this project then it will automatically turn to Nvidia Tensor-capable GPUs -and perhaps to ARM-based Nvidia Jetson SBCs |
|
Send message Joined: 20 Jul 20 Posts: 23 Credit: 1,958,714 RAC: 0 |
But you will eventually have projects using nvidia gpus? Some of the older teslas like the m40 and the k80 are fairly cheap these days and have a lot of horsepower. Not to mention a lot of folks who run boinc are also gamers or devs and have some beefy gpus Those GPU core frequencies are very low (below 1Ghz). For the price of a second hand M40 or k80, I would preferably get a new 2060, 2060 Super, or 2060 KO. Same price range, higher performance but at a way lower TDP (125-130W depending on the model; vs 200-250W on the M40, and 250-300W on the K80). In fact I could run a single RTX 2080 (or 2080 Super) at the same speed as the 2 together, all at under 160W for the GPU (vs 500W on the other 2). Not that it matters, as both of them use the same drivers as the RTX series. What'll work on the K/M-series accelerators, will work on an RTX-series GPU. The way things are going, I think they should focus on the RTX 2080 series of GPUs first. (2080,2080 Super, 2080 Ti), as by next year, these will be considered mid-grade GPUs. And in 2 years, this type of performance will be 'the new normal' in people's PCs (especially with a die shrink, running higher performance, on lower TDPs). |
|
Send message Joined: 3 Jul 20 Posts: 10 Credit: 1,303,039 RAC: 0 |
Hi all, I'd like to resume this post and ask pianoman if he has some plan about NPU support. I am asking this because I am considering to buy an Orange PI 4 very soon (to dedicate it H24 to Boinc), and I read about its Lightspeeur 2801S NPU Coprocessor (supported by PyTorch). I think this should be taken in serious consideration for this project (perhaps with future and more complex datasets) because the teorical performance gain is huge. This coprocessor, for instance, provides 5.6TOPS with an efficiency of 9.3TOPS/watt. It is available not only with the Orange Pi but also in the form of a USB accelerator for the cost of only 16$. There are several other coprocessors available now in the market, such as the Coral from Google (PCIe and USB). I don't know how does "TOPS" converts to "TFLOPS", as far as I understood from googling a bit on the subjects it appears that the two mesaurs aren't comparable. But, just to give some order of magnitude, the entire MLC@Home infrastructure provides (in the Boinc estimation, which, AFAIK, it's often inaccurate) 31TFLOPS. If TOPS=~TFLOPS then less than 10 of these accelerators could outperform all the current project's hosts, by consuming only 3W!!! I hope that someone that understands this subject better than me could redo the calculations (I think they are wrong). Moreover, if this projects will supports NPUs this would be a real breakthrough for Boinc and the world of distributed volounteer computing, by opening the way to other projects to make use of these powerful (and cheap) devices... the same kind of innovation when GPUs became supported by Boinc and Folding. This could start from a "nci" application of the Machine Learning Dataset Generator applications, and then if everything works some small modifications to the Boinc client could enable the software to treat NPUs in the same way of GPUs, thus enabling useful features such as deciding CPU usage, suspend the NPU and give more accurate stats. A NPU-capable Boinc would mean a lot for the future of distributed computing. I think that, given the very low cost of these devices, a lot of Boinc-enthusiats users will buy these accelerators, even just for the fun and excitement to crunch on such an innovative device... Considered that there are several people who spend hundreds of dollars for GPUs to dedicate to Boinc I cannot see why not a lot of people would join MLC@Home and buy an accelerator for 16$. IMHO this could be more interesting than a GPU capable client: NPUs looks to be way more efficient in terms of energy and cost than GPUs, plus there are many other important projects already working on GPUs while this could be the first and unique DC project to make use of NPUs/TPUs. Cheers ;) |
|
Send message Joined: 30 Jun 20 Posts: 462 Credit: 21,406,548 RAC: 0 |
I'm glad you're excited about NPUs, they are pretty cool. There are a few issues here that limit NPU support at the moment. The biggest concerns are developer bandwidth and access to devices. We have a limited number of developers, and any time we add support for a new set of devices, it takes ongoing engineering effort to support and test them. It really helps that the NPUs you are discussing are supported by PyTorch, but its not a zero sum game. The second is that these NPUs are normally optimized for inference rather than training, and training is what we do in MLDS at the moment. The target market for these accelerators is running a pre-trained-on-a-large-system neural network directly on the "edge" device, such as having you camera be able to do object detection right on the device without having to pass it off to a backend server to do the work for it. They're awesome for that use case. However, what MLDS does is training. We're interested in precision of the internal representation in the model, and as far as I can tell, the way these AI accelerators achieve high "TOPS" is they don't use floating point (note lack of "FL"), they use 4 bit or 8 bit ints to represent weights. Basically they take the floating point weights in the original, trained-off-the-device model, convert them to ints (losing a lot of precision), and hope the resulting model is still good enough to correctly classify things. That's fine for course grained things like image processing, but for the sequence to sequence prediction (regression) like we're doing in MLDS, that lost precision can be a real problem. Think of it this way.. AMD and NVidia have a *huge* financial incentive to get the highest performance at the lowest wattage. An Nvidia 2080Ti tops out at a theoretical 13 TFLOPS of 32-bit floating point compute... and needs 250W to achieve that. That's not to say there's no benefit to what NPUs do, just that for what MLDS does, there's not a lot of benefit. In the end, GPU support and Android support are much higher on the "bang-for-your-developers-time" priority list than NPUs, so unless there's some really motivated developer who wants to step up and do just that, it'll probably stay that way. |
|
Send message Joined: 3 Jul 20 Posts: 10 Credit: 1,303,039 RAC: 0 |
Thank you so much for this explanation, now it's very clear. I wish sometime in the future you will find a way to use these devices too for some new and exciting projects ;) |
|
Send message Joined: 9 Jul 20 Posts: 142 Credit: 11,536,204 RAC: 3 |
Very interesting read and loads of information in here. Haven't heard about these devices before but whereas they don't seem to be well invested into MLC, they still seem legitimate as they offer potentially large advantages for very narrow and specific use cases! However keep in mind the financial incentives pianoman mentioned. |
|
Send message Joined: 1 Jul 20 Posts: 34 Credit: 26,118,410 RAC: 0 |
[...]In the end, GPU support and Android support are much higher on the "bang-for-your-developers-time" priority list than NPUs, so unless there's some really motivated developer who wants to step up and do just that, it'll probably stay that way. Any ETA for GPU apps here? Reno, NV Team: SETI.USA
|
|
Send message Joined: 1 Jul 20 Posts: 34 Credit: 26,118,410 RAC: 0 |
|
©2022 MLC@Home Team
A project of the Cognition, Robotics, and Learning (CORAL) Lab at the University of Maryland, Baltimore County (UMBC)