Mageia Bugzilla – Bug 7375
XFS partition failure
Last modified: 2013-01-18 01:54:02 CET
Description of problem:
After a fresh installation of Mga2 on a VMWare server (/boot in ext4, lvm with /, /var in ext4 and /home, /projets and /tmp in XFS)
at boot, i have the following error:
"Failed to start /tmp"
and then i'm proposed to enter rescue mode. No traces in /var/log/messages, /var/log/dmsg.
systemctl status tmp.mount gives :
Active: failed (Result: exit-code) since...
Process: 583 ExecMount=/bin/mount /tmp (code=exited, status=32)
checking for /projets and /home i have no pb for /home but /projets gives same error than /tmp
Version-Release number of selected component (if applicable):
kernel-server-latest (updated today, but can't give you the version as rpm -qa gives nothing)
i think xfs module or mount helpers not present in initrd (dracut)
perhaps in that direction?
@colin: Hi, maybe you'll have an idea about this problem ? (if not, sorry to have bothered you)
After 2 more reboots i can complete a bit this bug report:
* what ever I choose between normal boot or failsafe, i ended with the proposition of entering emergency mode.
* in normal boot, if i skip it (ctrl+D) i can see the cauldron boiling (more than 5 min to have the fifth buble) but the boot never ends. Can't do ssh on it (no route to host) and did not find a way to switch to a tty under vmware :/ (first time i use it and i don't have a proper acces to it...)
* in failsafe boot, after skipping emergency mode, it finish booting in 10sec. Every partition is properly mounted. (so it's not so bad, but it's failsafe and we still need to skip emergency mode)
i rebooted the machine this morning and entered the emergency mode to test mounting the partitions as you asked yesterday on IRC.
* First thing I saw during service launch was :
Failed to start Load legacy module configuration [FAILED]
See 'systemctl status fedora-loadmodules.service' for details.
here is the output of 'systemctl status fedora-loadmodules.service' (copied by hand, i hope i didn't write errors):
Loaded: Loaded (/lib/systemd/system/fedora-loadmodules.service; static)
Active: failed (Result: exit-code) since Tue, 11 sep 2012 10:57:18 +0200; 7min ago
Process: 398 ExecStart=/lib/systemd/fedora-loadmodules (codes=exited, status=1/FAILURE)
Sep 11 10:57:17 forge-lpc2e /etc/rc.modules: Loading modules: speedstep-…
* For the partitions mount and start:
/dev/vg-forge/forge-projets (ie /projets) (XFS) mount was OK
/dev/vg-forge/forge-pgsql (ie /var/lib/pgsql) (ext4) mount was OK too
one of my XFS partition start failling as usual (/home this time):
Failed to start /home [FAILED]
See 'systemctl status home.mount' for details.
the output of 'systemctl status home.mount' gives the same error number than the one in my firt post.
typing 'mount /home' works :)
'mount /tmp' and 'mount /projets' said they were already mounted
But for forge-pgsql (which seemd to have mounted properly) :
LC_ALL=C mount /dev/vg-forge/forge-pgsql
mount: mount point /var/lib/pgsql does not exist
I find it weird that it's not always the same partition that failed to start, i feel it like a parallel work that should not be. (FYI: the administrator gives only 1 CPU to this box)
The failure of the fedora-loadmodules.service happens when one of the modules listed in /etc/modprobe.preload or /etc/modprobe.preload.d/* is invalid.
You can maybe debug this by removing/commenting out modules you know are invalid in these files (keeping a note as to which ones).
These modules should, in theory at least, be somewhat irrelevant to this problem.
With the actual mount points it's hard to work out exactly what's wrong. I am suspecting that there is some kind of general async problem with XFS. In the old days, the mounts would be done one after the other in sequence. These days, everything is much more asynchronous. As soon as udev knows the device exists, it can be signalled to systemd which can then start the mount operation immediately. This means that everything can happen at once. I wonder if XFS just has an issue generally with that?
I'll try and ask upstream to see if others have seen this behaviour. A work around might be to make them automounting rather than static mounting. To do this, just put "x-systemd.automount" in the mount options (I think this is the valid syntax for mga2) It may help things, but then again, it might not. I'll try and look into it more.
One other random thought: try *adding* xfs to the modprobe.preload files somewhere (after tidying them up!)
I wonder if the first mount command has to load the kernel module for the fs. If a second mount command comes in before the module is loaded, then perhaps it also tries to load it and fails (because the first mount is already doing it).
This is just a random guess, but it's worth trying I guess :)
OK, seems there is a bug in RedHat open for this same issue and my work around suggestion in comment 5 should apparently work. Please confirm when you can.
I'll keep tracking the upstream bug and hopefully an updated kmod package (when available) will eventually solve the problem properly.
For bug report consistency, the work around has now been confirmed via IRC. So things are working well if the module is loaded early.
For the full fix, this is the kernel change required to address this issue.
Thomas, I'll had this over to you now. If you decide not to include it in mga2, then we should likely put a note on the eratta to this bug and the instructions on how to solve it.
(In reply to comment #7)
> For bug report consistency, the work around has now been confirmed via IRC. So
> things are working well if the module is loaded early.
> For the full fix, this is the kernel change required to address this issue.
> Thomas, I'll had this over to you now. If you decide not to include it in mga2,
> then we should likely put a note on the eratta to this bug and the instructions
> on how to solve it.
Needed patches added to SVN and will be part of upcoming 3.4.15
Awesome thanks :)
kernel-3.4.15-1.mga2 is now available in updates_testing media