Linux/armhf and Linux/arm64 support status thread

Questions and Answers : Unix/Linux : Linux/armhf and Linux/arm64 support status thread
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 454
Credit: 14,284,704
RAC: 3,190
Message 291 - Posted: 1 Aug 2020, 17:54:15 UTC
Last modified: 2 Aug 2020, 1:59:00 UTC

Current status (as of Aug 1):

Base mlds client 32-bit (armhf/armv7l) up and running on an old CuBox 4-core Cortex A9 1Ghz system. Takes about 7.5 minutes per epoch, or about 16 hours to complete a WU.

Will also test on a Pi3.

System requirements are debian buster (or any linux/arm system running GLIBC 2.28 or later).

Also building for Linux/arm64 (armv8) on a Pi4, 4xCortex-A72 1.5Ghz. Still waiting on compiles to finish. Again using Raspberry Pi OS 64-bit snapshot from May 2020. Preliminary results show approx 102minutes per epoch, or about 4 hours to complete a WU. That's quite reasonable.

AppImage may not support ARM, so I may need to re-think deployment.[/s]
ID: 291 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
manalog

Send message
Joined: 3 Jul 20
Posts: 10
Credit: 1,303,039
RAC: 0
Message 292 - Posted: 1 Aug 2020, 18:06:28 UTC - in response to Message 291.  
Last modified: 1 Aug 2020, 18:06:53 UTC

It's very good that you are developing the software also for the arm considered that crunching on arm devices is becoming more and more common.

Perhaps you could find this information useful: if the binary is statically linked, then there are high chances that it will run on Android too. We tried this on Tn-Grid and it is working.
Here some references:
http://gene.disi.unitn.it/test/forum_thread.php?id=278
http://gene.disi.unitn.it/test/forum_thread.php?id=270
ID: 292 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 454
Credit: 14,284,704
RAC: 3,190
Message 297 - Posted: 2 Aug 2020, 2:01:48 UTC

Updated first post with preliminary pi4 results.. Not too bad at all.

PyTorch can't be compiled statically, at least completely. What I can look to do is just ship a bundle of binaries in one go instead of using appimage...which would remove the FUSE requirement anyway. This is what I do on Windows.

Android support I'd need to read up on, thanks for the pointers.
ID: 297 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 454
Credit: 14,284,704
RAC: 3,190
Message 310 - Posted: 10 Aug 2020, 4:38:01 UTC

Eagle-eyed users might notice that MLDS is now available for 64-bit ARM on this server (if you look *really* hard, you'll also see a misconfigured, but working 32-bit binary too).

Please don't use it yet unless you *really* want to test, as there's at least one glaring issue: its multi-threaded instead of single threaded, leading it to overcommit platforms.

But we're getting closer. I'm working on a small batch of fixes, and automating the build process, for dataset 3; so unless you really want to be on the bleeding edge, wait until the next client release and an official announcement.
ID: 310 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
JagDoc

Send message
Joined: 10 Aug 20
Posts: 13
Credit: 6,159,690
RAC: 16
Message 311 - Posted: 10 Aug 2020, 7:44:27 UTC - in response to Message 310.  

I run Odroid-N2+ and Odroid-C4 for testing and app runs fine but notice some things:

The 64-bit app runs multi-threaded but Boincmanager handeld it like single-threaded.

The Odroid-C4 got also 32-bit app and crashed. (All host run with alt_platform_string to get other projects running.)
ID: 311 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
JagDoc

Send message
Joined: 10 Aug 20
Posts: 13
Credit: 6,159,690
RAC: 16
Message 312 - Posted: 10 Aug 2020, 10:24:10 UTC

First wu on Odroid-N2+ finished with "bestätigungsfehler"
https://www.mlcathome.org/mlcathome/result.php?resultid=890335
ID: 312 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 454
Credit: 14,284,704
RAC: 3,190
Message 313 - Posted: 10 Aug 2020, 15:03:16 UTC - in response to Message 312.  

First, thanks for testing, and I'm glad at least the aarch64 version ran.

Yes the app isn't supposed to be multithreaded (its actually slower multi-threaded, at least on amd64), we need to fix that.

More worrying is the validation error. I'll need to look into that later today. .

Thanks again for testing.
ID: 313 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
JagDoc

Send message
Joined: 10 Aug 20
Posts: 13
Credit: 6,159,690
RAC: 16
Message 314 - Posted: 10 Aug 2020, 15:28:32 UTC - in response to Message 313.  
Last modified: 10 Aug 2020, 15:34:05 UTC

The next finished and pending wu from the Odroid-N2, running with all 6 cores:
https://www.mlcathome.org/mlcathome/result.php?resultid=894445

Wu finished and pending from Odroid-C4:
https://www.mlcathome.org/mlcathome/result.php?resultid=891155

The only problem is most of the wu i get are 32-bit wich are crashing.

Glad to help testing.
ID: 314 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 454
Credit: 14,284,704
RAC: 3,190
Message 315 - Posted: 11 Aug 2020, 15:10:47 UTC

Sorry for the delay, I'll have to look at this tonight. Life got in the way.
ID: 315 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 454
Credit: 14,284,704
RAC: 3,190
Message 316 - Posted: 12 Aug 2020, 20:02:04 UTC

I think I know what's going on with the validation problems for 64-bit at least. Fix should be straight forward but will take a few hours to implement.

As for 32-bit crashes, I'm still looking into it.
ID: 316 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
JagDoc

Send message
Joined: 10 Aug 20
Posts: 13
Credit: 6,159,690
RAC: 16
Message 317 - Posted: 12 Aug 2020, 20:18:33 UTC - in response to Message 316.  

Thank you.

I think 32 bit crashing is a missing lib.

projects/www.mlcathome.org_mlcathome/mlds_0.920_arm-unknown-linux-gnueabihf: error while loading shared libraries: libz.so: cannot open shared object file: No such file or directory
ID: 317 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 454
Credit: 14,284,704
RAC: 3,190
Message 318 - Posted: 12 Aug 2020, 21:11:10 UTC - in response to Message 317.  

That would do it. I've deprecated the 32-bit binary for now. You might be able to work around the issue but installing zlib1g on the host. I'll need to rebuild to bundle it with the app itself.

Also, I see a lot of arm errors in the db because the 'fuse' package isn't installed. Until I can figure out a way to not use fuse, you'll need that package installed to run the app (which is bundled as an AppImage, which uses FUSE).

So, prerequisites for testing ARM support at the moment:


  • 64-bit arm (I've disabled the 32-bit armhf binary until I can fix it)
  • GLIBC 2.28 or greater (Debian 10+ (buster)+ tested, in theory Ubuntu 20.04, Fedora 29+, Centos 8+ should all work)
  • zlib installed (sudo apt install zlib1g)
  • FUSE installed (sudo apt install fuse)



I have rolled out a fix to the validator that *fingers crossed* should fix the aarch64 validation issues. But there may be other issues. Again, do not run this unless you're *really* interested in testing.

ID: 318 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
JagDoc

Send message
Joined: 10 Aug 20
Posts: 13
Credit: 6,159,690
RAC: 16
Message 319 - Posted: 13 Aug 2020, 8:15:08 UTC - in response to Message 318.  

Great.
On the Odroid-N2 are 2 WUs valid.
https://www.mlcathome.org/mlcathome/result.php?resultid=935887
https://www.mlcathome.org/mlcathome/result.php?resultid=932234

On the Odroid-N2 i have removed the alt_platform string to get only 64 bit wu.

On the Odroid-C4 was zlib1g and fuse installed. Additionally i installed zlib1g:armhf.
But if i start the 32-bit app outside boinc i get the error libz.so not found.
ID: 319 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
PoppaGeek

Send message
Joined: 3 Jul 20
Posts: 13
Credit: 10,301,199
RAC: 1
Message 321 - Posted: 13 Aug 2020, 19:32:37 UTC - in response to Message 319.  
Last modified: 13 Aug 2020, 19:33:16 UTC

Ran 2 on Odroid c4 Ubuntu 20.04 had required libs. Multi-thread, ran just those 2 work units at a time. Memory usage ran 550mb to 610.

Run time 6 hours 49 min 47 sec
CPU time 10 hours 46 min 46 sec

https://www.mlcathome.org/mlcathome/result.php?resultid=947290
https://www.mlcathome.org/mlcathome/result.php?resultid=947346

Cheers!
ID: 321 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
fzs600

Send message
Joined: 1 Jul 20
Posts: 3
Credit: 8,711,679
RAC: 0
Message 322 - Posted: 15 Aug 2020, 14:26:41 UTC - in response to Message 321.  

with a PI4 debian buster 10 is going pretty well.
11 Aug 2020, 4:58:59 UTC 15 Aug 2020, 13:44:46 UTC Terminé et validé 41,827.50 41,287.31 260.00 Machine Learning Dataset Generator v9.20
aarch64-unknown-linux-gnu

11 Aug 2020, 4:58:59 UTC 15 Aug 2020, 13:24:20 UTC Terminé et validé 42,030.59 41,470.29 260.00 Machine Learning Dataset Generator v9.20
aarch64-unknown-linux-gnu
ID: 322 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 454
Credit: 14,284,704
RAC: 3,190
Message 323 - Posted: 15 Aug 2020, 16:30:56 UTC

Great reports. I'm trying to get an update out this weekend that will fix the multithread issue (and a few other fixes for all clients in prep for a new round of datasets). I'll also take another crack at fixing the bundling issue with arm32.
ID: 323 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Vato
Avatar

Send message
Joined: 1 Jul 20
Posts: 4
Credit: 2,299,058
RAC: 1
Message 362 - Posted: 22 Aug 2020, 8:55:33 UTC

i have a raspberry pi4, running raspbian 64bit
https://www.mlcathome.org/mlcathome/show_host_detail.php?hostid=2144

it runs arm32 fine, but has exec errors with aarch64 tasks
https://www.mlcathome.org/mlcathome/result.php?resultid=1113838

aarch64 works ok on some other boinc projects, but i haven't been able to work out what the task is trying to do when it fails
ID: 362 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
PoppaGeek

Send message
Joined: 3 Jul 20
Posts: 13
Credit: 10,301,199
RAC: 1
Message 363 - Posted: 22 Aug 2020, 15:08:34 UTC - in response to Message 362.  
Last modified: 22 Aug 2020, 15:08:52 UTC

My Jetson Nano is just the opposite. Does aarch64-unknown-linux-gnu fine errors out on arm-unknown-linux-gnueabihf

https://www.mlcathome.org/mlcathome/results.php?hostid=1984&offset=0&show_names=0&state=0&appid=

Cheers!
ID: 363 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
PoppaGeek

Send message
Joined: 3 Jul 20
Posts: 13
Credit: 10,301,199
RAC: 1
Message 365 - Posted: 22 Aug 2020, 17:20:04 UTC - in response to Message 363.  

Odroid c4 same as Jetson. OK with 64bit, 32bit errors out. Ubuntu 20.04 aarch64

https://www.mlcathome.org/mlcathome/results.php?hostid=1985
ID: 365 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 454
Credit: 14,284,704
RAC: 3,190
Message 366 - Posted: 22 Aug 2020, 17:42:50 UTC

For 64 bit or 32-bit support you need a compatible userland, or multi-arch support within that userland. MLDS is not statically linked, so it relies on the host OS for things like the dynamic loader, and that and all library dependencies must be installed (ie, both a 32-bit and 64-bit libc) for both architectures' clients to work on the same system.

If you have a 64-bit kernel, and a 32-bit userland, you should (in theory) be able to run static and dynamic 32-bit application, and (untested) statically linked 64-bit applications, but not dynamically linked 64-bit applications (what MLDS is). Vice versa, if you have a 64 bit kernel and a 64-bit userland with multiarch support for 32-bit, and 32-bit versions of all dependent libraries, then you should be able to run both on the same system.

That's a lot of "ifs" in the above statement. It took years for distros to get multi-arch right for amd64. It wouldn't surprise me if something wasn't quite right yet for a distro on ARM.

We would love to statically link MLDS, but PyTorch, the underlying neural network library, doesn't work with static linking.

Not sure if that helps, but it's a possible explanation why both clients might not work on the same system. We try to ship most dependencies needed for mlds in the wrapped binary itself, but some system libraries like -lpthread are tied to the host's libc so closely we can't ship our own or lots of things break.

We can look at it again later, but right now we (I) am fighting with the x86_64 client to either get it to link against an updated version of openblas and/or mkl, on a franken install of ubuntu 14.04 updated with gcc-9 and binutils that supports avx512. You want an old distro as a base to have an old libc to support as many linux variants as possible, but you want all the up-to-date versions to fix things like misdetection of opterons. And all of this would be much easier if I could just link statically, but pytorch won't do it (bug) .


  • I tried compiling the latest version of openblas statically from source as part of the build, but pytorch couldn't find it in my build tree.
  • I tried linking against mkl, but the resulting bundled binary missed a bunch of dependent libraries because intel can't do anything the simple way
  • I tried building the new openblas for as a .deb, but the assembler for ubuntu 14.04 doesn't support avx512
  • I tried building a new binutils and replacing *just* the as binary, but then the old linker doesn't understand relocations used by the new assembler.



What I thought would take a few hours has turned into another full day. And I still need to fix deployment of the windows v9.50 app.

Sorry for the rant, I'm just ready to throw some VMs out the window. lol.

ID: 366 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Questions and Answers : Unix/Linux : Linux/armhf and Linux/arm64 support status thread

©2021 MLC@Home Team
A project of the Cognition, Robotics, and Learning (CORAL) Lab at the University of Maryland, Baltimore County (UMBC)