Fuse/read-only filesystem issue with newer distributions (and new client update)

Questions and Answers : Unix/Linux : Fuse/read-only filesystem issue with newer distributions (and new client update)
Message board moderation

To post messages, you must log in.

AuthorMessage
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 462
Credit: 21,406,548
RAC: 0
Message 1184 - Posted: 27 Apr 2021, 21:14:57 UTC

We've seen a few people report here in the pas about a weird fuse error that keeps the client from starting and the error from the task is something about "can't mount, read only filesystem" or something similar.

I don't know what's causing this, but after upgrading to ubuntu 21.04 I'm now getting this issue myself. So now that I'm experiencing it myself, I can at least debug it. But I have no idea what's causing it. What's worse, running the program outside of boinc seems to work.

The next release of the client will be statically linked and drop fuse entirely. The other main feature of the client, DS4 support, is almost ready (training the network works, I'm just tuning the runtimes and number of epochs to work on MNIST and Fashion-MNIST) so I'm going to hurry a release of that ASAP.

It's a *big* change from what we were doing before, but I think it'll be worth it. That said, also expect some bumps in the road and some more time testing than usual as we make sure the new client works as designed.
ID: 1184 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sagittarius Lupus
Avatar

Send message
Joined: 4 Apr 21
Posts: 7
Credit: 415,093
RAC: 5
Message 1185 - Posted: 27 Apr 2021, 23:36:29 UTC

Hey, there. I can't be certain this is the same issue, but you mentioned a distribution upgrade, and that creates an opportunity for systemd to get involved with the BOINC client if you're running it as a service. I banged my head against the problem of tasks running in the client being unable to access various parts of the host filesystem that definitely were not read-only -- in particular, it couldn't reach into various control groups that the boinc user should have had exclusive access to, and worse, if I ran the tasks outside of BOINC they had no such problem.

My investigation led to this thread over on the LHC@Home forums, which is mostly me talking to myself about the problem, where I eventually managed to solve it: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5121

TL;DR: if you're running BOINC as a service in systemd, the service unit installed by your distribution may enforce certain sandboxing features that mask access to parts of the host file system for processes running inside the service. In my case, I had to set a ProtectControlGroups=false override, but in your case, I suspect it may be sufficient to add the path to the FUSE filesystem you're trying to access to a ReadWritePaths override in your boinc-client.service unit file.

Of course, this means that if any of your volunteers are running BOINC on Linux distros with systemd, they will in general have to make the same sandbox accommodations you do. This only tends to be relevant to particularly advanced BOINC projects that reach into unusual parts of the host operating system. You might also consider reporting the permissions conflict to your distribution as a bug, if they are willing to review the security implications of modifying their packaging of the BOINC client for Ubuntu as a whole.
ID: 1185 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 462
Credit: 21,406,548
RAC: 0
Message 1187 - Posted: 28 Apr 2021, 1:45:06 UTC - in response to Message 1185.  

That... would certainly fit the observed behavior. Thanks for the pointer.

I'm aware the systemd is capable of sandboxing, but didn't think they would keep the boinc user from either writing to /tmp or mounting a fuse filesystem (which is what AppImage does: it creates a temporary mountpoint in /tmp, mounts the embedded squashfs image, then runs the binary from the mount point). But it would explain why it works fine when I go into the boinc directory and run it manually.

It's always been a little ugly that we have to use appimage. It's a kludge around the fact that pytorch can't (couldn't) be compiled as a static library. I can't wait to move away from that. I'll test this out and work on a workaround to post, and then get on with the next client release to put this all behind us.

Note: currently, static compiling works for the next release for CPU only. I haven't tried cuda, and rocm with its hard-coded paths is going to be even worse.
ID: 1187 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sagittarius Lupus
Avatar

Send message
Joined: 4 Apr 21
Posts: 7
Credit: 415,093
RAC: 5
Message 1190 - Posted: 28 Apr 2021, 2:44:58 UTC

In case it's helpful, this is my override file at /etc/systemd/system/boinc-client.service.d/override.conf:

[Service]
PrivateTmp=false
ProtectControlGroups=false
ReadWritePaths=-/tmp


With this modification, I don't have any trouble with BOINC tasks writing to /tmp, modifying their own control groups, or interacting with FUSE filesystems (LHC@Home does this with CVMFS).
ID: 1190 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 462
Credit: 21,406,548
RAC: 0
Message 1191 - Posted: 28 Apr 2021, 2:53:26 UTC - in response to Message 1190.  

Perfect, I just confirmed adding -/tmp works on my machine as well.

Time to write up instructions and post a news update. I'll sticky this post. thank you!
ID: 1191 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Questions and Answers : Unix/Linux : Fuse/read-only filesystem issue with newer distributions (and new client update)

©2024 MLC@Home Team
A project of the Cognition, Robotics, and Learning (CORAL) Lab at the University of Maryland, Baltimore County (UMBC)