Bug 4042 - sshfs (and probably other) filesystems without "noauto" flag will prevent smooth boot with systemd.
Summary: sshfs (and probably other) filesystems without "noauto" flag will prevent smo...
Status: RESOLVED DUPLICATE of bug 10179
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: Cauldron
Hardware: x86_64 Linux
Priority: High critical
Target Milestone: Mageia 5
Assignee: Colin Guthrie
QA Contact:
URL:
Whiteboard: FOR_ERRATA
Keywords: NEEDINFO, USABILITY
Depends on:
Blocks:
 
Reported: 2012-01-06 12:52 CET by Helge Hielscher
Modified: 2015-06-02 20:28 CEST (History)
14 users (show)

See Also:
Source RPM:
CVE:
Status comment:


Attachments
menu.lst (2.82 KB, text/plain)
2012-01-06 12:57 CET, Helge Hielscher
Details
/etc/fstab (1.39 KB, text/plain)
2012-01-07 11:19 CET, Helge Hielscher
Details

Description Helge Hielscher 2012-01-06 12:52:29 CET
Description of problem: boot with systemd failes in 2012

Version-Release number of selected component (if applicable):
$ rpm -qa|grep udev
lib64udev0-devel-175-2.mga2
lib64udev0-175-2.mga2
lib64gudev1.0_0-175-2.mga2
libudev0-175-2.mga2
udev-175-2.mga2
libgudev1.0_0-175-2.mga2
system-config-printer-udev-1.3.6-1.mga2
$ rpm -qa|grep systemd
lib64systemd-login0-37-18.mga2
systemd-units-37-18.mga2
systemd-37-18.mga2
lib64systemd-daemon0-37-18.mga2

How reproducible: always

Steps to Reproduce:
1. boot into mageia

video of boot:
http://www.youtube.com/watch?v=jCJGAYN28hc
http://www.youtube.com/watch?v=j0d-hjI3_L4
Comment 1 Helge Hielscher 2012-01-06 12:57:48 CET
Created attachment 1336 [details]
menu.lst
Comment 2 Manuel Hiebel 2012-01-06 21:59:56 CET
Hi, thanks for reporting this bug.
As there is no maintainer for this package I added the committers in CC.

(Please set the status to 'assigned' if you are working on it)

CC: (none) => anssi.hannula, dmorganec, eugeni, mageia, mageia, misc, pterjan, thierry.vignaud, tmb

Comment 3 Thomas Backlund 2012-01-06 22:08:26 CET
Maybe because of this: "title Ubuntu 10.04 LTS" :)

Nah, seriously, what is your hw, disk setup and so on ?

lvm? dmraid? mdadm? ...

From what I can see on the films, you get to systemd rescue shell.

If you enter your root login there and do:

mount -a

systemd default


does it boot then ?
Comment 4 Helge Hielscher 2012-01-07 11:19:11 CET
No RAID here.
It doesn't boot, this is what I get:
# mount -a
read: Connection reset by peer
read: Connection reset by peer
read: Connection reset by peer
# sytemd default
Excess arguments.
Comment 5 Helge Hielscher 2012-01-07 11:19:50 CET
Created attachment 1341 [details]
/etc/fstab
Comment 6 Colin Guthrie 2012-01-07 11:41:28 CET
Can you try commenting out the SSH mounts for now and see if that helps?
Comment 7 Helge Hielscher 2012-01-07 12:48:29 CET
Adding noauto to the SSH mounts helped.
Comment 8 Colin Guthrie 2012-01-09 10:50:58 CET
Yeah, that makes sense.

I'm not quite sure how to deal with these, kind of issues as this is effectively intended behaviour.

We should possibly try and warn people with such lines in their fstab or automatically add the noauto flag on systemd installation?
Colin Guthrie 2012-01-09 10:51:38 CET

Summary: boot with systemd failes => sshfs (and probably other) filesystems without "noauto" flag will prevent smooth boot with systemd.

Comment 9 Helge Hielscher 2012-01-09 11:20:08 CET
Why is it intentional that boot fails due to problems with non-system mount points? Shouldn't it throw an error and continue gracefully?
Comment 10 Colin Guthrie 2012-01-09 11:26:58 CET
Define "non-system mount points"? Really anything in fstab is a "system mount point". It's impossible to say what random units you may have installed that depend on a given mount point with any degree of accuracy. Therefore the local-fs.target will consider anything in fstab that is designed to be mounted at boot (i.e. without noauto) as something that needs to be mounted. If it doesn't, then it considers that something is wrong and drops you to an emergency shell so you can fix the problem.

It *could* carry on with some kind of whitelist (e.g. if the mount point starts /boot, /usr, /var or /home then consider it a "system mount point"), but perhaps you have a startup unit that rsyncs your /home/$USER from /mnt/orig/home/$USER with --delete and the disk that contains /mnt/orig/home/$USER is corrupted real bad so it doesn't mount it and carries on (as /mnt mount are not "system") with an empty /mnt/orig/home/$USER folder - thus a pretty good backup of /home/$USER gets overwritten with a blank folder... ugg.

OK, so this is a contrived example, but the "play it safe" approach is IMO quite sensible.
Comment 11 Colin Guthrie 2012-01-09 11:28:19 CET
That said.... sshfs, is of course not a "local" fs... so it shouldn't be mounted in local-fs.target anyway.... Will look into that.
Comment 12 Colin Guthrie 2012-01-09 11:35:53 CET
Actually, this could all be a red herring as I might have misinterpreded some of the above comments.

When you say "adding noauto" helped, did you mean generally with the whole boot or did you mean just with the "mount -a" bit?

Does the system boot smoothly from start to finish for you now (with the noauto)?

If so could you do the following (simple) test for me:

 1. Remove the "noauto" from one or more of your sshfs mounts defined in fstab.
 2. Reboot and confirm the problem.
 3. Give root p/w for maintenance mode.
 4. Run: systemctl status local-fs.target
 5. Does it report an error?
 6. Run: systemctl start local-fs.target
 7. Wait a bit - does it say failed?
 8. Edit fstab and add back in the noauto.
 9. Run systemctl start local-fs.target
 10. Does it work this time?
 11. If so, "systemctl start graphical.target" should continue to a normal boot.

Many thanks.

Status: NEW => ASSIGNED
Assignee: bugsquad => mageia

Comment 13 Colin Guthrie 2012-01-09 11:56:06 CET
Gah, sorry for so many comments this morning. Looking at the code, the fuse mounts will be considered "local" and will be mounted by local-fs.target, so the above test is probably redundant.

I'll see if we can patch the code to include fuse+sshfs in the list of "network" file systems.
Comment 14 Helge Hielscher 2012-01-09 15:05:37 CET
(In reply to comment #12)
> Does the system boot smoothly from start to finish for you now (with the
> noauto)?

yes

> If so could you do the following (simple) test for me:
> 
>  1. Remove the "noauto" from one or more of your sshfs mounts defined in fstab.
>  2. Reboot and confirm the problem.
>  3. Give root p/w for maintenance mode.
>  4. Run: systemctl status local-fs.target
>  5. Does it report an error?

Loaded: loaded (/lib/systemd/system/local-fs.target;static)
Active: inactive (dead)


>  6. Run: systemctl start local-fs.target
>  7. Wait a bit - does it say failed?

A dependency job failed, See system logs for details.

>  8. Edit fstab and add back in the noauto.
>  9. Run systemctl start local-fs.target
>  10. Does it work this time?

A dependency job failed, See system logs for details.
Comment 15 Colin Guthrie 2012-01-11 16:10:38 CET
OK, so here is the thread on systemd-devel where I discussed this problem on your behalf:

http://thread.gmane.org/gmane.comp.sysutils.systemd.devel/4124/focus=4140

I've linked to the outcome mail in particular.

The long and short of it is that rather than "noauto" specifically, you should add _netdev to the options. This will correctly flag the mount as a network one rather than have to maintain a whitelist at the systemd side.

In order to ease migration, I might look into some automatic changes we could do to the user's fstab on upgrade to automatically add _netdev option. WDYT? Worth the hassle?
Comment 16 Helge Hielscher 2012-01-11 16:36:28 CET
Personally an entry in the errata would do for me.
Comment 17 Colin Guthrie 2012-01-11 16:41:34 CET
OK, I'll leave this bug open for now to remind me to do $something about it which would be, at minimum, an errata/release notes entry.
Manuel Hiebel 2012-01-11 17:27:07 CET

Keywords: (none) => USABILITY
Whiteboard: (none) => Errata

Comment 18 Anssi Hannula 2012-01-15 05:42:47 CET
I don't think failing boot due to filesystems not getting mounted is a good idea at all. Consider e.g.

1. dualboot systems where the user repartitions some disk in another OS - If the previous fs no longer exists, the boot fails.
2. user removes a data harddisk, e.g. to move it to another system or to replace it with a bigger one.
3. non-system HD failure

All of these are quite common occurrences, and especially for inexperienced users a failure to boot would be very confusing (which doesn't happen in "the other OS", btw).


As for myself, I do want my system to come up even if the external data storage arrays are disconnected - having to add _netdev for that seems rather silly to me.
Comment 19 Colin Guthrie 2012-01-17 11:38:15 CET
I don't totally disagree with you Anssi, but this is really something that you should discuss upstream on systemd mailing list. I'd much prefer not to change the behaviour locally in this regard, but am more than open to providing a more user friendly experience in other ways.

I thin it only makes sense to set _netdev for network mounts. If you do not want other mounts to break things, the "noauto" option can be used.

I'd say for most external drives, the Desktop Environment being run is responsible for mounting and generally speaking you do not want to include static listings for these in fstab anyway.

As systemd's approach is fully hotplug, I don't really see a nice way to allow the overall boot on a mount failure - i.e. how to classify what partitions are "critical" and which ones are not. i.e. is /home considered critical? what about /var? or how about /data/mirror/home? It's almost impossible to classify what is considered a "critical" drive on a given setup and thus going into an emergency shell when a problem occurs seems to me to be the safest and most sensible thing to do.

I think we should aim to provided a more user friendly report of why the boot failed and instructions on potential fixes. Also perhaps providing a pre-reboot "check" script of some kind would be nice to avoid some of the potential problems. This could perhaps be run in advance during systemd installation and pre-warn the user.
Comment 20 Marja Van Waes 2012-05-26 13:03:17 CEST
Hi,

This bug was filed against cauldron, but we do not have cauldron at the moment.

Please report whether this bug is still valid for Mageia 2.

Thanks :)

Cheers,
marja

Keywords: (none) => NEEDINFO

Comment 21 Florian Hubold 2014-02-07 21:45:30 CET
Still valid with Mageia 4 and cauldron, see https://bugs.mageia.org/show_bug.cgi?id=7673 and for nofail also https://bugs.mageia.org/show_bug.cgi?id=12305 and https://bugs.mageia.org/show_bug.cgi?id=12631

All pretty prominent bugs, and occur really often (system fails to boot because some partition UUID changed, which seems to happen often with swap on upgrades for whatever reason, see linked bug - OR some non-critical filesystem cannot be mounted, be it remote filesystems or windows partitions that cannot be mounted for whatever reason) and all could be fixed easily, in the sense of allowing the system to boot normally by adding at least nofail for all mounts that are not Mageia system partitions.

@Colin: As all those issues are linked, do you want a tracker bug for those?

Also more on-topic, can probably not be directly taken to upstream, as those are downstream modifications IMHO. Probably nofail should be added somewhere as systemd default for all non-system partitions for the host OS.

Marking this as release_blocker for M5.

Source RPM: udev-175-2.mga2 => (none)
Keywords: NEEDINFO => (none)
Priority: Normal => release_blocker
CC: (none) => doktor5000
Target Milestone: --- => Mageia 5
Severity: major => critical

Florian Hubold 2014-02-07 21:56:38 CET

See Also: (none) => https://bugs.mageia.org/show_bug.cgi?id=7673, https://bugs.mageia.org/show_bug.cgi?id=10179, https://bugs.mageia.org/show_bug.cgi?id=12566, https://bugs.mageia.org/show_bug.cgi?id=12631

Comment 22 AL13N 2014-09-14 20:23:44 CEST
what about marking the mount point "nofail" ?

since this is really the option that's to be used for non-boot filesystems

CC: (none) => alien

Florian Hubold 2014-09-14 21:55:39 CEST

Blocks: (none) => 14069

Comment 23 Florian Hubold 2014-09-14 21:58:43 CEST
@coling: Maybe nofail can be made the default for all non-system filesystems? Among others, should also fix the issues with newer win8 filesystems that make Mageia fail to boot due to their implementation of hybrid shutdown, this also applies to win filesystems on GPT partitions. Remember that the installer adds fstab entries for all win partitions by default.
Comment 24 AL13N 2014-09-15 10:11:13 CEST
i second that, but we need to be careful not to mark nofail too much too...

maybe mount points for

/
/usr
/usr/bin
/usr/lib
/usr/lib64
/usr/share
/usr/sbin
/tmp
/etc
/var
/var/lock
/var/run
/var/tmp
/root

should not have nofail, and the others would have it?

(maybe i've forgotten a few)
Anne Nicolas 2015-01-21 21:48:54 CET

CC: (none) => ennael1

Comment 25 Rémi Verschelde 2015-02-03 22:26:26 CET
We're now nearing the Mageia 5 release. What should be do about this bug and the ones linked in comment 21? Be content with a mention in the errata, or can we implement a proper fix, and if so where?

CC: (none) => remi

Comment 26 Florian Hubold 2015-02-04 22:36:08 CET
Well, we have this in errata since mga2:
https://wiki.mageia.org/en/Mageia_2_Errata#Boot_fails_when_webdav.2C_sshfs_etc._entries_exist_in_fstab

So we could keep it like that, including this bug that's now more then 3 years old. But then the next question would be, for what upcoming Mageia release do we think a fix is viable?

From my point of view this should be done:

- parse fstab, for all mountpoints _not_ on a whitelist as e.g. in comment 24 and add nofail option
OR
- change systemd to use nofail as default option for all foreign filesystems
OR
- ... just forgot the other option :)

For the one cornercase about swap partition from bug 12305 , we should be safe when comparing UUID of swap against UUID in fstab before reboot to installed system. Then we could close 4 bugs at once.
Comment 27 AL13N 2015-02-05 07:58:42 CET
actually, it looks like mounted btrfs subvolumes are not mounted properly, as they are not honoring the subvol= flag (except if you also add "nofail", or the one where grub has the subvol= flag)...
Comment 28 Anne Nicolas 2015-02-05 22:29:10 CET
Decreasing priority as it was already there in Mageia 4. It can be fixed as an update later.

Priority: release_blocker => High

claire robinson 2015-02-19 22:45:37 CET

CC: (none) => eeeemail

AL13N 2015-03-25 18:48:58 CET

Blocks: 14069 => (none)

Samuel Verschelde 2015-05-20 13:47:28 CEST

Whiteboard: Errata => FOR_ERRATA

Comment 29 Thierry Vignaud 2015-06-02 11:42:04 CEST
Isn't this one fixed by bug #10179 fix?

Keywords: (none) => NEEDINFO

Comment 30 Florian Hubold 2015-06-02 20:28:30 CEST
I think this one should be closed, the OP reported that an errata entry is OK for him, and we have the fix as you mentioned: http://gitweb.mageia.org/software/drakx/commit/?id=745849cdace7ed86ce12a9a7564bffb42edf0ef3

We can still open a new one for mga6 if such an issue occurs frequently.

*** This bug has been marked as a duplicate of bug 10179 ***

Status: ASSIGNED => RESOLVED
Resolution: (none) => DUPLICATE


Note You need to log in before you can comment on or make changes to this bug.