Questions and Answers :
Unix/Linux :
Linux/armhf and Linux/arm64 support status thread
Message board moderation
Author | Message |
---|---|
Send message Joined: 30 Jun 20 Posts: 462 Credit: 21,406,548 RAC: 0 ![]() ![]() ![]() ![]() |
Current status (as of Aug 1): Base mlds client 32-bit (armhf/armv7l) up and running on an old CuBox 4-core Cortex A9 1Ghz system. Takes about 7.5 minutes per epoch, or about 16 hours to complete a WU. Will also test on a Pi3. System requirements are debian buster (or any linux/arm system running GLIBC 2.28 or later). Also building for Linux/arm64 (armv8) on a Pi4, 4xCortex-A72 1.5Ghz. AppImage may not support ARM, so I may need to re-think deployment.[/s] |
Send message Joined: 3 Jul 20 Posts: 10 Credit: 1,303,039 RAC: 0 ![]() ![]() ![]() ![]() |
It's very good that you are developing the software also for the arm considered that crunching on arm devices is becoming more and more common. Perhaps you could find this information useful: if the binary is statically linked, then there are high chances that it will run on Android too. We tried this on Tn-Grid and it is working. Here some references: http://gene.disi.unitn.it/test/forum_thread.php?id=278 http://gene.disi.unitn.it/test/forum_thread.php?id=270 |
Send message Joined: 30 Jun 20 Posts: 462 Credit: 21,406,548 RAC: 0 ![]() ![]() ![]() ![]() |
Updated first post with preliminary pi4 results.. Not too bad at all. PyTorch can't be compiled statically, at least completely. What I can look to do is just ship a bundle of binaries in one go instead of using appimage...which would remove the FUSE requirement anyway. This is what I do on Windows. Android support I'd need to read up on, thanks for the pointers. |
Send message Joined: 30 Jun 20 Posts: 462 Credit: 21,406,548 RAC: 0 ![]() ![]() ![]() ![]() |
Eagle-eyed users might notice that MLDS is now available for 64-bit ARM on this server (if you look *really* hard, you'll also see a misconfigured, but working 32-bit binary too). Please don't use it yet unless you *really* want to test, as there's at least one glaring issue: its multi-threaded instead of single threaded, leading it to overcommit platforms. But we're getting closer. I'm working on a small batch of fixes, and automating the build process, for dataset 3; so unless you really want to be on the bleeding edge, wait until the next client release and an official announcement. |
Send message Joined: 10 Aug 20 Posts: 13 Credit: 6,703,099 RAC: 3 ![]() ![]() ![]() ![]() |
I run Odroid-N2+ and Odroid-C4 for testing and app runs fine but notice some things: The 64-bit app runs multi-threaded but Boincmanager handeld it like single-threaded. The Odroid-C4 got also 32-bit app and crashed. (All host run with alt_platform_string to get other projects running.) |
Send message Joined: 10 Aug 20 Posts: 13 Credit: 6,703,099 RAC: 3 ![]() ![]() ![]() ![]() |
First wu on Odroid-N2+ finished with "bestätigungsfehler" https://www.mlcathome.org/mlcathome/result.php?resultid=890335 |
Send message Joined: 30 Jun 20 Posts: 462 Credit: 21,406,548 RAC: 0 ![]() ![]() ![]() ![]() |
First, thanks for testing, and I'm glad at least the aarch64 version ran. Yes the app isn't supposed to be multithreaded (its actually slower multi-threaded, at least on amd64), we need to fix that. More worrying is the validation error. I'll need to look into that later today. . Thanks again for testing. |
Send message Joined: 10 Aug 20 Posts: 13 Credit: 6,703,099 RAC: 3 ![]() ![]() ![]() ![]() |
The next finished and pending wu from the Odroid-N2, running with all 6 cores: https://www.mlcathome.org/mlcathome/result.php?resultid=894445 Wu finished and pending from Odroid-C4: https://www.mlcathome.org/mlcathome/result.php?resultid=891155 The only problem is most of the wu i get are 32-bit wich are crashing. Glad to help testing. |
Send message Joined: 30 Jun 20 Posts: 462 Credit: 21,406,548 RAC: 0 ![]() ![]() ![]() ![]() |
Sorry for the delay, I'll have to look at this tonight. Life got in the way. |
Send message Joined: 30 Jun 20 Posts: 462 Credit: 21,406,548 RAC: 0 ![]() ![]() ![]() ![]() |
I think I know what's going on with the validation problems for 64-bit at least. Fix should be straight forward but will take a few hours to implement. As for 32-bit crashes, I'm still looking into it. |
Send message Joined: 10 Aug 20 Posts: 13 Credit: 6,703,099 RAC: 3 ![]() ![]() ![]() ![]() |
Thank you. I think 32 bit crashing is a missing lib. projects/www.mlcathome.org_mlcathome/mlds_0.920_arm-unknown-linux-gnueabihf: error while loading shared libraries: libz.so: cannot open shared object file: No such file or directory |
Send message Joined: 30 Jun 20 Posts: 462 Credit: 21,406,548 RAC: 0 ![]() ![]() ![]() ![]() |
That would do it. I've deprecated the 32-bit binary for now. You might be able to work around the issue but installing zlib1g on the host. I'll need to rebuild to bundle it with the app itself. Also, I see a lot of arm errors in the db because the 'fuse' package isn't installed. Until I can figure out a way to not use fuse, you'll need that package installed to run the app (which is bundled as an AppImage, which uses FUSE). So, prerequisites for testing ARM support at the moment:
|
Send message Joined: 10 Aug 20 Posts: 13 Credit: 6,703,099 RAC: 3 ![]() ![]() ![]() ![]() |
Great. On the Odroid-N2 are 2 WUs valid. https://www.mlcathome.org/mlcathome/result.php?resultid=935887 https://www.mlcathome.org/mlcathome/result.php?resultid=932234 On the Odroid-N2 i have removed the alt_platform string to get only 64 bit wu. On the Odroid-C4 was zlib1g and fuse installed. Additionally i installed zlib1g:armhf. But if i start the 32-bit app outside boinc i get the error libz.so not found. |
Send message Joined: 3 Jul 20 Posts: 13 Credit: 13,421,453 RAC: 0 ![]() ![]() ![]() ![]() |
Ran 2 on Odroid c4 Ubuntu 20.04 had required libs. Multi-thread, ran just those 2 work units at a time. Memory usage ran 550mb to 610. Run time 6 hours 49 min 47 sec CPU time 10 hours 46 min 46 sec https://www.mlcathome.org/mlcathome/result.php?resultid=947290 https://www.mlcathome.org/mlcathome/result.php?resultid=947346 Cheers! |
Send message Joined: 1 Jul 20 Posts: 4 Credit: 19,167,001 RAC: 294 ![]() ![]() ![]() ![]() ![]() |
with a PI4 debian buster 10 is going pretty well. 11 Aug 2020, 4:58:59 UTC 15 Aug 2020, 13:44:46 UTC Terminé et validé 41,827.50 41,287.31 260.00 Machine Learning Dataset Generator v9.20 11 Aug 2020, 4:58:59 UTC 15 Aug 2020, 13:24:20 UTC Terminé et validé 42,030.59 41,470.29 260.00 Machine Learning Dataset Generator v9.20 |
Send message Joined: 30 Jun 20 Posts: 462 Credit: 21,406,548 RAC: 0 ![]() ![]() ![]() ![]() |
Great reports. I'm trying to get an update out this weekend that will fix the multithread issue (and a few other fixes for all clients in prep for a new round of datasets). I'll also take another crack at fixing the bundling issue with arm32. |
![]() Send message Joined: 1 Jul 20 Posts: 4 Credit: 2,364,110 RAC: 0 ![]() ![]() ![]() ![]() |
i have a raspberry pi4, running raspbian 64bit https://www.mlcathome.org/mlcathome/show_host_detail.php?hostid=2144 it runs arm32 fine, but has exec errors with aarch64 tasks https://www.mlcathome.org/mlcathome/result.php?resultid=1113838 aarch64 works ok on some other boinc projects, but i haven't been able to work out what the task is trying to do when it fails |
Send message Joined: 3 Jul 20 Posts: 13 Credit: 13,421,453 RAC: 0 ![]() ![]() ![]() ![]() |
My Jetson Nano is just the opposite. Does aarch64-unknown-linux-gnu fine errors out on arm-unknown-linux-gnueabihf https://www.mlcathome.org/mlcathome/results.php?hostid=1984&offset=0&show_names=0&state=0&appid= Cheers! |
Send message Joined: 3 Jul 20 Posts: 13 Credit: 13,421,453 RAC: 0 ![]() ![]() ![]() ![]() |
Odroid c4 same as Jetson. OK with 64bit, 32bit errors out. Ubuntu 20.04 aarch64 https://www.mlcathome.org/mlcathome/results.php?hostid=1985 |
Send message Joined: 30 Jun 20 Posts: 462 Credit: 21,406,548 RAC: 0 ![]() ![]() ![]() ![]() |
For 64 bit or 32-bit support you need a compatible userland, or multi-arch support within that userland. MLDS is not statically linked, so it relies on the host OS for things like the dynamic loader, and that and all library dependencies must be installed (ie, both a 32-bit and 64-bit libc) for both architectures' clients to work on the same system. If you have a 64-bit kernel, and a 32-bit userland, you should (in theory) be able to run static and dynamic 32-bit application, and (untested) statically linked 64-bit applications, but not dynamically linked 64-bit applications (what MLDS is). Vice versa, if you have a 64 bit kernel and a 64-bit userland with multiarch support for 32-bit, and 32-bit versions of all dependent libraries, then you should be able to run both on the same system. That's a lot of "ifs" in the above statement. It took years for distros to get multi-arch right for amd64. It wouldn't surprise me if something wasn't quite right yet for a distro on ARM. We would love to statically link MLDS, but PyTorch, the underlying neural network library, doesn't work with static linking. Not sure if that helps, but it's a possible explanation why both clients might not work on the same system. We try to ship most dependencies needed for mlds in the wrapped binary itself, but some system libraries like -lpthread are tied to the host's libc so closely we can't ship our own or lots of things break. We can look at it again later, but right now we (I) am fighting with the x86_64 client to either get it to link against an updated version of openblas and/or mkl, on a franken install of ubuntu 14.04 updated with gcc-9 and binutils that supports avx512. You want an old distro as a base to have an old libc to support as many linux variants as possible, but you want all the up-to-date versions to fix things like misdetection of opterons. And all of this would be much easier if I could just link statically, but pytorch won't do it (bug) .
|
©2023 MLC@Home Team
A project of the Cognition, Robotics, and Learning (CORAL) Lab at the University of Maryland, Baltimore County (UMBC)