Description of problem: boot with systemd failes in 2012 Version-Release number of selected component (if applicable): $ rpm -qa|grep udev lib64udev0-devel-175-2.mga2 lib64udev0-175-2.mga2 lib64gudev1.0_0-175-2.mga2 libudev0-175-2.mga2 udev-175-2.mga2 libgudev1.0_0-175-2.mga2 system-config-printer-udev-1.3.6-1.mga2 $ rpm -qa|grep systemd lib64systemd-login0-37-18.mga2 systemd-units-37-18.mga2 systemd-37-18.mga2 lib64systemd-daemon0-37-18.mga2 How reproducible: always Steps to Reproduce: 1. boot into mageia video of boot: http://www.youtube.com/watch?v=jCJGAYN28hc http://www.youtube.com/watch?v=j0d-hjI3_L4
Created attachment 1336 [details] menu.lst
Hi, thanks for reporting this bug. As there is no maintainer for this package I added the committers in CC. (Please set the status to 'assigned' if you are working on it)
CC: (none) => anssi.hannula, dmorganec, eugeni, mageia, mageia, misc, pterjan, thierry.vignaud, tmb
Maybe because of this: "title Ubuntu 10.04 LTS" :) Nah, seriously, what is your hw, disk setup and so on ? lvm? dmraid? mdadm? ... From what I can see on the films, you get to systemd rescue shell. If you enter your root login there and do: mount -a systemd default does it boot then ?
No RAID here. It doesn't boot, this is what I get: # mount -a read: Connection reset by peer read: Connection reset by peer read: Connection reset by peer # sytemd default Excess arguments.
Created attachment 1341 [details] /etc/fstab
Can you try commenting out the SSH mounts for now and see if that helps?
Adding noauto to the SSH mounts helped.
Yeah, that makes sense. I'm not quite sure how to deal with these, kind of issues as this is effectively intended behaviour. We should possibly try and warn people with such lines in their fstab or automatically add the noauto flag on systemd installation?
Summary: boot with systemd failes => sshfs (and probably other) filesystems without "noauto" flag will prevent smooth boot with systemd.
Why is it intentional that boot fails due to problems with non-system mount points? Shouldn't it throw an error and continue gracefully?
Define "non-system mount points"? Really anything in fstab is a "system mount point". It's impossible to say what random units you may have installed that depend on a given mount point with any degree of accuracy. Therefore the local-fs.target will consider anything in fstab that is designed to be mounted at boot (i.e. without noauto) as something that needs to be mounted. If it doesn't, then it considers that something is wrong and drops you to an emergency shell so you can fix the problem. It *could* carry on with some kind of whitelist (e.g. if the mount point starts /boot, /usr, /var or /home then consider it a "system mount point"), but perhaps you have a startup unit that rsyncs your /home/$USER from /mnt/orig/home/$USER with --delete and the disk that contains /mnt/orig/home/$USER is corrupted real bad so it doesn't mount it and carries on (as /mnt mount are not "system") with an empty /mnt/orig/home/$USER folder - thus a pretty good backup of /home/$USER gets overwritten with a blank folder... ugg. OK, so this is a contrived example, but the "play it safe" approach is IMO quite sensible.
That said.... sshfs, is of course not a "local" fs... so it shouldn't be mounted in local-fs.target anyway.... Will look into that.
Actually, this could all be a red herring as I might have misinterpreded some of the above comments. When you say "adding noauto" helped, did you mean generally with the whole boot or did you mean just with the "mount -a" bit? Does the system boot smoothly from start to finish for you now (with the noauto)? If so could you do the following (simple) test for me: 1. Remove the "noauto" from one or more of your sshfs mounts defined in fstab. 2. Reboot and confirm the problem. 3. Give root p/w for maintenance mode. 4. Run: systemctl status local-fs.target 5. Does it report an error? 6. Run: systemctl start local-fs.target 7. Wait a bit - does it say failed? 8. Edit fstab and add back in the noauto. 9. Run systemctl start local-fs.target 10. Does it work this time? 11. If so, "systemctl start graphical.target" should continue to a normal boot. Many thanks.
Status: NEW => ASSIGNEDAssignee: bugsquad => mageia
Gah, sorry for so many comments this morning. Looking at the code, the fuse mounts will be considered "local" and will be mounted by local-fs.target, so the above test is probably redundant. I'll see if we can patch the code to include fuse+sshfs in the list of "network" file systems.
(In reply to comment #12) > Does the system boot smoothly from start to finish for you now (with the > noauto)? yes > If so could you do the following (simple) test for me: > > 1. Remove the "noauto" from one or more of your sshfs mounts defined in fstab. > 2. Reboot and confirm the problem. > 3. Give root p/w for maintenance mode. > 4. Run: systemctl status local-fs.target > 5. Does it report an error? Loaded: loaded (/lib/systemd/system/local-fs.target;static) Active: inactive (dead) > 6. Run: systemctl start local-fs.target > 7. Wait a bit - does it say failed? A dependency job failed, See system logs for details. > 8. Edit fstab and add back in the noauto. > 9. Run systemctl start local-fs.target > 10. Does it work this time? A dependency job failed, See system logs for details.
OK, so here is the thread on systemd-devel where I discussed this problem on your behalf: http://thread.gmane.org/gmane.comp.sysutils.systemd.devel/4124/focus=4140 I've linked to the outcome mail in particular. The long and short of it is that rather than "noauto" specifically, you should add _netdev to the options. This will correctly flag the mount as a network one rather than have to maintain a whitelist at the systemd side. In order to ease migration, I might look into some automatic changes we could do to the user's fstab on upgrade to automatically add _netdev option. WDYT? Worth the hassle?
Personally an entry in the errata would do for me.
OK, I'll leave this bug open for now to remind me to do $something about it which would be, at minimum, an errata/release notes entry.
Keywords: (none) => USABILITYWhiteboard: (none) => Errata
I don't think failing boot due to filesystems not getting mounted is a good idea at all. Consider e.g. 1. dualboot systems where the user repartitions some disk in another OS - If the previous fs no longer exists, the boot fails. 2. user removes a data harddisk, e.g. to move it to another system or to replace it with a bigger one. 3. non-system HD failure All of these are quite common occurrences, and especially for inexperienced users a failure to boot would be very confusing (which doesn't happen in "the other OS", btw). As for myself, I do want my system to come up even if the external data storage arrays are disconnected - having to add _netdev for that seems rather silly to me.
I don't totally disagree with you Anssi, but this is really something that you should discuss upstream on systemd mailing list. I'd much prefer not to change the behaviour locally in this regard, but am more than open to providing a more user friendly experience in other ways. I thin it only makes sense to set _netdev for network mounts. If you do not want other mounts to break things, the "noauto" option can be used. I'd say for most external drives, the Desktop Environment being run is responsible for mounting and generally speaking you do not want to include static listings for these in fstab anyway. As systemd's approach is fully hotplug, I don't really see a nice way to allow the overall boot on a mount failure - i.e. how to classify what partitions are "critical" and which ones are not. i.e. is /home considered critical? what about /var? or how about /data/mirror/home? It's almost impossible to classify what is considered a "critical" drive on a given setup and thus going into an emergency shell when a problem occurs seems to me to be the safest and most sensible thing to do. I think we should aim to provided a more user friendly report of why the boot failed and instructions on potential fixes. Also perhaps providing a pre-reboot "check" script of some kind would be nice to avoid some of the potential problems. This could perhaps be run in advance during systemd installation and pre-warn the user.
Hi, This bug was filed against cauldron, but we do not have cauldron at the moment. Please report whether this bug is still valid for Mageia 2. Thanks :) Cheers, marja
Keywords: (none) => NEEDINFO
Still valid with Mageia 4 and cauldron, see https://bugs.mageia.org/show_bug.cgi?id=7673 and for nofail also https://bugs.mageia.org/show_bug.cgi?id=12305 and https://bugs.mageia.org/show_bug.cgi?id=12631 All pretty prominent bugs, and occur really often (system fails to boot because some partition UUID changed, which seems to happen often with swap on upgrades for whatever reason, see linked bug - OR some non-critical filesystem cannot be mounted, be it remote filesystems or windows partitions that cannot be mounted for whatever reason) and all could be fixed easily, in the sense of allowing the system to boot normally by adding at least nofail for all mounts that are not Mageia system partitions. @Colin: As all those issues are linked, do you want a tracker bug for those? Also more on-topic, can probably not be directly taken to upstream, as those are downstream modifications IMHO. Probably nofail should be added somewhere as systemd default for all non-system partitions for the host OS. Marking this as release_blocker for M5.
Source RPM: udev-175-2.mga2 => (none)Keywords: NEEDINFO => (none)Priority: Normal => release_blockerCC: (none) => doktor5000Target Milestone: --- => Mageia 5Severity: major => critical
See Also: (none) => https://bugs.mageia.org/show_bug.cgi?id=7673, https://bugs.mageia.org/show_bug.cgi?id=10179, https://bugs.mageia.org/show_bug.cgi?id=12566, https://bugs.mageia.org/show_bug.cgi?id=12631
what about marking the mount point "nofail" ? since this is really the option that's to be used for non-boot filesystems
CC: (none) => alien
Blocks: (none) => 14069
@coling: Maybe nofail can be made the default for all non-system filesystems? Among others, should also fix the issues with newer win8 filesystems that make Mageia fail to boot due to their implementation of hybrid shutdown, this also applies to win filesystems on GPT partitions. Remember that the installer adds fstab entries for all win partitions by default.
i second that, but we need to be careful not to mark nofail too much too... maybe mount points for / /usr /usr/bin /usr/lib /usr/lib64 /usr/share /usr/sbin /tmp /etc /var /var/lock /var/run /var/tmp /root should not have nofail, and the others would have it? (maybe i've forgotten a few)
CC: (none) => ennael1
We're now nearing the Mageia 5 release. What should be do about this bug and the ones linked in comment 21? Be content with a mention in the errata, or can we implement a proper fix, and if so where?
CC: (none) => remi
Well, we have this in errata since mga2: https://wiki.mageia.org/en/Mageia_2_Errata#Boot_fails_when_webdav.2C_sshfs_etc._entries_exist_in_fstab So we could keep it like that, including this bug that's now more then 3 years old. But then the next question would be, for what upcoming Mageia release do we think a fix is viable? From my point of view this should be done: - parse fstab, for all mountpoints _not_ on a whitelist as e.g. in comment 24 and add nofail option OR - change systemd to use nofail as default option for all foreign filesystems OR - ... just forgot the other option :) For the one cornercase about swap partition from bug 12305 , we should be safe when comparing UUID of swap against UUID in fstab before reboot to installed system. Then we could close 4 bugs at once.
actually, it looks like mounted btrfs subvolumes are not mounted properly, as they are not honoring the subvol= flag (except if you also add "nofail", or the one where grub has the subvol= flag)...
Decreasing priority as it was already there in Mageia 4. It can be fixed as an update later.
Priority: release_blocker => High
CC: (none) => eeeemail
Blocks: 14069 => (none)
Whiteboard: Errata => FOR_ERRATA
Isn't this one fixed by bug #10179 fix?
I think this one should be closed, the OP reported that an errata entry is OK for him, and we have the fix as you mentioned: http://gitweb.mageia.org/software/drakx/commit/?id=745849cdace7ed86ce12a9a7564bffb42edf0ef3 We can still open a new one for mga6 if such an issue occurs frequently. *** This bug has been marked as a duplicate of bug 10179 ***
Status: ASSIGNED => RESOLVEDResolution: (none) => DUPLICATE