Pi4

Questions and Answers : Unix/Linux : Pi4
Message board moderation

To post messages, you must log in.

AuthorMessage
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 462
Credit: 21,406,548
RAC: 0
Message 237 - Posted: 24 Jul 2020, 21:44:08 UTC

I have a Pi4, and while its not my top priority, I should be able to get at least a test app for arm32/arm64 relatively soon. Only time will tell about performance.
ID: 237 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
wolfman1360

Send message
Joined: 7 Jul 20
Posts: 23
Credit: 39,708,780
RAC: 358
Message 239 - Posted: 25 Jul 2020, 5:11:52 UTC - in response to Message 237.  

I have a Pi4, and while its not my top priority, I should be able to get at least a test app for arm32/arm64 relatively soon. Only time will tell about performance.

Awesome!
I've got a few of them somewhere that I gave up on, but this might be the excuse to get them out and crunching again.
ID: 239 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProDigit

Send message
Joined: 20 Jul 20
Posts: 23
Credit: 1,958,714
RAC: 0
Message 278 - Posted: 29 Jul 2020, 19:31:44 UTC

The Pi 4 will be rather slow, compared to an x86/64 CPU.
The overclocked Pi 4 (quad core) to 2Ghz, equals about the performance of a quad core atom processor running at 1,66Ghz.
x86 instructions are performance oriented.
ARM instructions are power optimized.

The Pi4 isn't particularly very power efficient either, due to being built on 28nm.
You might have better results from an AMLOGIC S905X3 TV box, found online for about $30. It uses 12nm process, and runs at about 3W (vs almost 8 Watts for the Pi 4).

Concerning ARM, recent Snapdragons replace the 4 BIG cores with 2, and the 4 small cores with 6.
This is a significant increase in performance compared to previous designs.
Even AMLOGIC's S922X TV boxes (sold for around $80), now incorporate a 2/6 core design,where the little cores are actually running faster than the big cores!

So if you succeed in running it on a Pi,
Perhaps the next phase will be Snapdragon CPUs (like found in the Pixel 3a, Samsung Galaxy A and S- series), and AMLOGIC TV boxes.
They're very underestimated, and perform pretty good for the price!
ID: 278 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dirk Broer
Avatar

Send message
Joined: 1 Jul 20
Posts: 22
Credit: 4,873,126
RAC: 84
Message 455 - Posted: 7 Sep 2020, 22:47:30 UTC - in response to Message 278.  

The four cores of the Raspberry Pi 4 may be built on a 28nm process, they are Cortex-A72 cores and at that they beat the Amlogic S905X3 Cortex-A55 cores running @2000 MHz. They even beat the A55 cores running at their own rated clockspeed (1500 MHz), let alone when overclocked to 2000 MHz ExplainingComputers.com: Odroid C4 vs Raspberry Pi 4.

The Amlogic S922X as used in the Odroid-N2+ (and no doubt soon in TV boxes as well) is a better option than the Pi4, performance-wise: the four big Cotex-A73 cores can be overclocked to 2400 MHz, while the two LITTLE Cortex-A53 cores can run @2000 MHz (so the speed of the big vs LITTLE has changed from the original N2 model).
ID: 455 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dirk Broer
Avatar

Send message
Joined: 1 Jul 20
Posts: 22
Credit: 4,873,126
RAC: 84
Message 456 - Posted: 8 Sep 2020, 0:09:19 UTC
Last modified: 8 Sep 2020, 0:33:54 UTC

My personal tip for an ARM-based SBC to be used in this project would be the nVidia Jetson Nano (quad-core Cortex-A57 @1430 Mhz, stock), coupled to a good cooling fan so it can be overclocked to 2000 MHz.

You can augment the Jetson Nano using a CORAL USB Accelerator, so you can use not only the TensorFlow, PyTorch, Caffe en MXNet AI framework as can be used with the nVidia Jetson Nano, but also have a TPU-coprocessor in the form of the CORAL USB stick. That stick can also be used in other ARM SBC's such as the Raspberry Pi 4 or the Odroid-N2+ of course.
ID: 456 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
W8n4Singularity
Avatar

Send message
Joined: 30 Aug 20
Posts: 25
Credit: 47,025,926
RAC: 0
Message 630 - Posted: 9 Oct 2020, 21:55:38 UTC

It seems to works well on RPi4 4GB. Most tasks have a run time of about 7 hours, while the rand_automata tasks take about 31 hours. Very roughly, for the rand_automata tasks, the Pi4 uses about 0.25 kWh for 4 tasks (~0.063 kWh/task) while Ryzen 3950x uses 1.6 kWh for 32 tasks (~0.05 kWh per task). (However irl I run the Ryzen at 75% cores for multitasking).
ID: 630 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProDigit

Send message
Joined: 20 Jul 20
Posts: 23
Credit: 1,958,714
RAC: 0
Message 686 - Posted: 25 Oct 2020, 3:46:32 UTC - in response to Message 456.  
Last modified: 25 Oct 2020, 3:52:00 UTC

My personal tip for an ARM-based SBC to be used in this project would be the nVidia Jetson Nano (quad-core Cortex-A57 @1430 Mhz, stock), coupled to a good cooling fan so it can be overclocked to 2000 MHz.

You can augment the Jetson Nano using a CORAL USB Accelerator, so you can use not only the TensorFlow, PyTorch, Caffe en MXNet AI framework as can be used with the nVidia Jetson Nano, but also have a TPU-coprocessor in the form of the CORAL USB stick. That stick can also be used in other ARM SBC's such as the Raspberry Pi 4 or the Odroid-N2+ of course.


Apparently all Cortex A50 series CPUs are very, very slow, an not worth doing compute on.
That includes any of the Nvidia developer boards.
The only thing why they may be somewhat good, is their GPU capabilities.
But even those are equivalent of a GT730, which is half the speed of a 1030, which is half the speed of a 1050, which is half the speed of a 1660, which is half the speed of an RTX2060, which is about half the speed of an RTX 3080.
So if you're going to do GPU computations, a 3080 is about 40 to 60x faster as a GT730.

For ARM CPUs, you'll have to rely on Cortex A70 series CPUs.
The 72 is mostly only found in quad core configuration.
The higher end cores (eg: A77) are mostly found in Big-Little configuration, which android disables for program access due to device overheating. Meaning, if you want to crunch data on a big-little in android, the tasks will be shifted to the little cores.
Unless you can find a Linux image for the device, and make sure it has sufficient cooling.

At this point, I'd say that the only ARM CPUs worth investing time over are the A70 cores with Neon instructions, as well as the Neonverse (which should have come out this year already, but hasn't).
These chips also can't be cellphones, tablets, or anything with a battery.
They have to be top boxes,or built into a server or desktop where sufficient cooling is present.
Think Pi4, which throttles unless there's either an active cooling, or a large heat sink cooling the CPU.
And that's only a cortex A72.
The higher end models are built on 12 and 10nm, but can reach over 3Ghz speeds.
No one makes them yet.
ID: 686 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zombie67 [MM]
Avatar

Send message
Joined: 1 Jul 20
Posts: 34
Credit: 26,118,410
RAC: 0
Message 689 - Posted: 25 Oct 2020, 4:30:33 UTC
Last modified: 25 Oct 2020, 4:34:12 UTC

FWIW, I have 5x Raspberry Pi 4 machines crunching quite happily. Right now they are working on WCG, but will come back around here sooner or later. They are OC to 2ghz and have coolers. But then again, x86/x64 machines have coolers too. OTOH, they use less energy combined than a dim light bulb.
Reno, NV
Team: SETI.USA
ID: 689 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProDigit

Send message
Joined: 20 Jul 20
Posts: 23
Credit: 1,958,714
RAC: 0
Message 730 - Posted: 28 Oct 2020, 17:13:21 UTC - in response to Message 689.  

FWIW, I have 5x Raspberry Pi 4 machines crunching quite happily. Right now they are working on WCG, but will come back around here sooner or later. They are OC to 2ghz and have coolers. But then again, x86/x64 machines have coolers too. OTOH, they use less energy combined than a dim light bulb.

True, but their cortex A72 cpus are also much slower than comparable x86 cpus.
I went with an atomic pi. It's 1.6-1.7Ghz cpu can crunch data about as fast as 4x CM4 modules.
On top of that they use the beignet Intel OpenCL drivers, and I run projects on those as well.
All the while each Atomic pi unit pulls about 12.5W (@100% CPU & 100% GPU load).
That's on the wall, with psu inefficiencies and fan.
It is very comparative to 4x Raspberry pi 4Bs, but with added gpu crunching.
The intel cpu is 14nm, vs the pi is 28nm.

Arm is about as good as Atom in terms of efficiency.

And 14nm Atom cpus are about equal to Ryzen 9 3950x cpus (both in speed (as long as you have enough units), price, and efficiency).
That being said, a Ryzen 3900x or 3950x pulls only 150W at the wall (with watercooling), at it's stock 3,5Ghz.
To get that kind of performance out of a Raspberry pi, you'd have to pair 48 to 64 units together.
The cost of such server would be far more than even a threadripper or epic server.
The power draw would also; estimated to be at 500-750W.
If you'd compare a 8 unit pi cluster, it would perform about the same as a 35W celeron dual core laptop.
ID: 730 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dirk Broer
Avatar

Send message
Joined: 1 Jul 20
Posts: 22
Credit: 4,873,126
RAC: 84
Message 857 - Posted: 19 Nov 2020, 9:54:16 UTC - in response to Message 686.  
Last modified: 19 Nov 2020, 9:56:48 UTC

My personal tip for an ARM-based SBC to be used in this project would be the nVidia Jetson Nano (quad-core Cortex-A57 @1430 Mhz, stock), coupled to a good cooling fan so it can be overclocked to 2000 MHz.

You can augment the Jetson Nano using a CORAL USB Accelerator, so you can use not only the TensorFlow, PyTorch, Caffe en MXNet AI framework as can be used with the nVidia Jetson Nano, but also have a TPU-coprocessor in the form of the CORAL USB stick. That stick can also be used in other ARM SBC's such as the Raspberry Pi 4 or the Odroid-N2+ of course.


Apparently all Cortex A50 series CPUs are very, very slow, an not worth doing compute on.
That includes any of the Nvidia developer boards.
The only thing why they may be somewhat good, is their GPU capabilities.
But even those are equivalent of a GT730, which is half the speed of a 1030, which is half the speed of a 1050, which is half the speed of a 1660, which is half the speed of an RTX2060, which is about half the speed of an RTX 3080.
So if you're going to do GPU computations, a 3080 is about 40 to 60x faster as a GT730.


There are several misconceptions here, amongst them that ALL Cortex A50 series CPUs are very, very slow. Firstly, the oldest of them, the Cortex-A57 has more in common with the later A70 series than with the A53 and A55 and, secondly, the A57 can be run at 2000 MHz, far faster than the A53 of e.g. the Raspberry Pi 3.
Another misconception is that the Nvidia developer boards all use Cortex A50 SOCs. Only the Jetson Nano does, but the far better A57. The Jetson Xavier NX uses a 6-core NVIDIA Carmel comparable with the A70 series, as does the Jetson AGX Xavier -but then as an octa-core. The Nano has 128 Maxwell CUDA cores, but the Xavier NX has 384 Volta CUDA cores and the AGX Xavier has 512 Volta CUDA cores. These Volta cores are quite something different.

For ARM CPUs, you'll have to rely on Cortex A70 series CPUs.
The 72 is mostly only found in quad core configuration.
The higher end cores (eg: A77) are mostly found in Big-Little configuration, which android disables for program access due to device overheating. Meaning, if you want to crunch data on a big-little in android, the tasks will be shifted to the little cores.
Unless you can find a Linux image for the device, and make sure it has sufficient cooling.


Of course you need to cool an ARM device if you want to crunch on it. And when sufficiently cooled, all cores work, both big and LITTLE. -they do in Android on an Odroid-N2+ of me as well as in Linux on another Odroid-N2+. Both have a 80mm fan and a huge heatsink.

At this point, I'd say that the only ARM CPUs worth investing time over are the A70 cores with Neon instructions, as well as the Neonverse (which should have come out this year already, but hasn't).
These chips also can't be cellphones, tablets, or anything with a battery.
They have to be top boxes,or built into a server or desktop where sufficient cooling is present.
Think Pi4, which throttles unless there's either an active cooling, or a large heat sink cooling the CPU.
And that's only a cortex A72.
The higher end models are built on 12 and 10nm, but can reach over 3Ghz speeds.
No one makes them yet.


You can reach 2147 MHz presently on a Pi 4, if you use a good cooler. If you have spare parts from days past: use a Nortbridge cooler with heatpipes (e.g. a Noctua NC-U6) and a 80mm fan.
The Odroid-N2+ (with four Cortex-A73's and two A-53's) reaches 2400 MHz on the big cores when the fan is added, the LITTLE cores reach 2000 MHz.

NEON is a 32-bit ARMv7 instruction and useless for the 64-bit ARMv8 environment
ID: 857 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dirk Broer
Avatar

Send message
Joined: 1 Jul 20
Posts: 22
Credit: 4,873,126
RAC: 84
Message 872 - Posted: 21 Nov 2020, 11:13:12 UTC - in response to Message 686.  

At this point, I'd say that the only ARM CPUs worth investing time over are the A70 cores with Neon instructions, as well as the Neonverse (which should have come out this year already, but hasn't).


Sorry to correct you yet again: it is Neoverse, not Neonverse. It has nothing to do with NEON instructions.
ID: 872 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dirk Broer
Avatar

Send message
Joined: 1 Jul 20
Posts: 22
Credit: 4,873,126
RAC: 84
Message 1178 - Posted: 26 Apr 2021, 20:29:00 UTC

Perhaps enlightening in the case of x86-64 Intel Atom vs ARM is the view offered on the page CPU performance of this project
The lowest five entries are for four ARM CPUs and the Intel Atom x5-Z8350, where the Atom is wedged between the Cortex-A53 (ARMv7 Processor rev 4 (v7l)[Impl 0x41 Arch 7 Variant 0x0 Part 0xd03 Rev 4]) of the Raspberry P 3+ and Cortex-A57 (ARMv8 Processor rev 1 (v8l)[Impl 0x41 Arch 8 Variant 0x1 Part 0xd07 Rev 1]) of the Nvidia Jetson Nano below and both the 32-bit (ARMv7 Processor rev 3 (v7l)[Impl 0x41 Arch 7 Variant 0x0 Part 0xd08 Rev 3]) and 64-bit (BCM2835[Impl 0x41 Arch 8 Variant 0x0 Part 0xd08 Rev 3]) running Cortex-72 of the Raspberry Pi 4 above.
ID: 1178 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dirk Broer
Avatar

Send message
Joined: 1 Jul 20
Posts: 22
Credit: 4,873,126
RAC: 84
Message 1475 - Posted: 12 Feb 2022, 0:22:43 UTC

Currently there are three applications (Machine Learning Dataset Generator, Machine Learning Dataset Generator (test) and Machine Learning Dataset Generator (GPU)).

The first two have an ARM application, but it is only Linux running on ARM (eabi-hf), so a 32-bit application. It is perhaps useful to check how many members of this project
use an ARM board, and what OS it is using. And when it appears that the majority runs a 64-bit OS, or is at least using a 64-bit SOC, would it be possible the have at least
the first two appications ported to ARM (Aarch64) and 64-bit ARM/Android? This was lately done at the WEP-M+2 Project (aka Wanless), and it speeded up the calculations
with a factor two to four for those boards able to run an 64-bit OS. WEP-M+2 choose to serve the ARM/Android community through the use of an Android virtual machine
called 'UserLAnd' to run the 64-bit ARM application though, so a separate 64-bit Android app was not needed.

There is unfortunately no Machine Learning Dataset Generator for ARM GPUs yet, though the most modern ARM SOCs have increasingly better GPUs, and boards like the
Raspberry Pi 4 Compute Module (CM4) might even be able to run an external GPU -using e.g. the Raspberry Pi 4 Compute Module IO Board- in the future as well.

In the meantime the combination of ARM platform and Nvidia GPU could be tested on the Jetson Nano, but preferably the Jetson Xavier NX or Jetson Xavier AGX, as they
have stronger CPUs and GPUs within their SOCs.
ID: 1475 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Questions and Answers : Unix/Linux : Pi4

©2024 MLC@Home Team
A project of the Cognition, Robotics, and Learning (CORAL) Lab at the University of Maryland, Baltimore County (UMBC)