Bug 7375 - XFS partition failure
: XFS partition failure
Status: RESOLVED FIXED
Product: Mageia
Classification: Unclassified
Component: RPM Packages
: 2
: x86_64 Linux
: Normal Severity: major
: ---
Assigned To: Thomas Backlund
:
:
:
:
: 8068
:
  Show dependency treegraph
 
Reported: 2012-09-06 15:01 CEST by Claire Revillet
Modified: 2013-01-18 01:54 CET (History)
2 users (show)

See Also:
Source RPM:
CVE:


Attachments

Description Claire Revillet 2012-09-06 15:01:00 CEST
Description of problem:
After a fresh installation of Mga2 on a VMWare server (/boot in ext4, lvm with /, /var in ext4 and /home, /projets and /tmp in XFS)

at boot, i have the following error:
"Failed to start /tmp"
and then i'm proposed to enter rescue mode. No traces in /var/log/messages, /var/log/dmsg.

systemctl status tmp.mount gives :
Loaded: loaded
Active: failed (Result: exit-code) since...
Where /tmp
What /dev/vg-forge/tmp
Process: 583 ExecMount=/bin/mount /tmp (code=exited, status=32)
CGroup: name=systemd:/system/tmp.mount

checking for /projets and /home i have no pb for /home but /projets gives same error than /tmp


Version-Release number of selected component (if applicable):

kernel-server-latest (updated today, but can't give you the version as rpm -qa gives nothing)
Comment 1 AL13N 2012-09-10 16:39:46 CEST
i think xfs module or mount helpers not present in initrd (dracut)

perhaps in that direction?
Comment 2 Claire Revillet 2012-09-10 17:15:02 CEST
@colin: Hi, maybe you'll have an idea about this problem ? (if not, sorry to have bothered you)

After 2 more reboots i can complete a bit this bug report:
* what ever I choose between normal boot or failsafe, i ended with the proposition of entering emergency mode.

* in normal boot, if i skip it (ctrl+D) i can see the cauldron boiling (more than 5 min to have the fifth buble) but the boot never ends. Can't do ssh on it (no route to host) and did not find a way to switch to a tty under vmware :/ (first time i use it and i don't have a proper acces to it...)

* in failsafe boot, after skipping emergency mode, it finish booting in 10sec. Every partition is properly mounted. (so it's not so bad, but it's failsafe and we still need to skip emergency mode)

kernel: 3.3.8-server-2.mga2
Comment 3 Claire Revillet 2012-09-11 11:37:07 CEST
Hi,
i rebooted the machine this morning and entered the emergency mode to test mounting the partitions as you asked yesterday on IRC.



* First thing I saw during service launch was :

Failed to start Load legacy module configuration [FAILED]
See 'systemctl status fedora-loadmodules.service' for details.

here is the output of 'systemctl status fedora-loadmodules.service' (copied by hand, i hope i didn't write errors):
***
Loaded: Loaded (/lib/systemd/system/fedora-loadmodules.service; static)
Active: failed (Result: exit-code) since Tue, 11 sep 2012 10:57:18 +0200; 7min ago
Process: 398 ExecStart=/lib/systemd/fedora-loadmodules (codes=exited, status=1/FAILURE)
CGroup: name=systemd:/system/fedora-loadmodules.service

Sep 11 10:57:17 forge-lpc2e /etc/rc.modules[435]: Loading modules: speedstep-…
***



* For the partitions mount and start:
/dev/vg-forge/forge-projets (ie /projets) (XFS) mount was OK
/dev/vg-forge/forge-pgsql (ie /var/lib/pgsql) (ext4) mount was OK too

one of my XFS partition start failling as usual (/home this time):
***
Failed to start /home [FAILED]
See 'systemctl status home.mount' for details.
***
the output of 'systemctl status home.mount' gives the same error number than the one in my firt post.

typing 'mount /home' works :)
'mount /tmp' and 'mount /projets' said they were already mounted
But for forge-pgsql (which seemd to have mounted properly) :
***
LC_ALL=C mount /dev/vg-forge/forge-pgsql
mount: mount point /var/lib/pgsql does not exist
***


I find it weird that it's not always the same partition that failed to start, i feel it like a parallel work that should not be. (FYI: the administrator gives only 1 CPU to this box)
Comment 4 Colin Guthrie 2012-09-11 12:32:00 CEST
The failure of the fedora-loadmodules.service happens when one of the modules listed in /etc/modprobe.preload or /etc/modprobe.preload.d/* is invalid.

You can maybe debug this by removing/commenting out modules you know are invalid in these files (keeping a note as to which ones).

These modules should, in theory at least, be somewhat irrelevant to this problem.


With the actual mount points it's hard to work out exactly what's wrong. I am suspecting that there is some kind of general async problem with XFS. In the old days, the mounts would be done one after the other in sequence. These days, everything is much more asynchronous. As soon as udev knows the device exists, it can be signalled to systemd which can then start the mount operation immediately. This means that everything can happen at once. I wonder if XFS just has an issue generally with that?

I'll try and ask upstream to see if others have seen this behaviour. A work around might be to make them automounting rather than static mounting. To do this, just put "x-systemd.automount" in the mount options (I think this is the valid syntax for mga2) It may help things, but then again, it might not. I'll try and look into it more.
Comment 5 Colin Guthrie 2012-09-11 12:36:24 CEST
One other random thought: try *adding* xfs to the modprobe.preload files somewhere (after tidying them up!)

I wonder if the first mount command has to load the kernel module for the fs. If a second mount command comes in before the module is loaded, then perhaps it also tries to load it and fails (because the first mount is already doing it). 

This is just a random guess, but it's worth trying I guess :)
Comment 6 Colin Guthrie 2012-09-11 13:18:45 CEST
OK, seems there is a bug in RedHat open for this same issue and my work around suggestion in comment 5 should apparently work. Please confirm when you can.

I'll keep tracking the upstream bug and hopefully an updated kmod package (when available) will eventually solve the problem properly.
Comment 7 Colin Guthrie 2012-09-14 14:54:41 CEST
For bug report consistency, the work around has now been confirmed via IRC. So things are working well if the module is loaded early.


For the full fix, this is the kernel change required to address this issue.

http://thread.gmane.org/gmane.linux.kernel/1358707/focus=1358709

Thomas, I'll had this over to you now. If you decide not to include it in mga2, then we should likely put a note on the eratta to this bug and the instructions on how to solve it.
Comment 8 Thomas Backlund 2012-10-21 12:07:15 CEST
(In reply to comment #7)
> For bug report consistency, the work around has now been confirmed via IRC. So
> things are working well if the module is loaded early.
> 
> 
> For the full fix, this is the kernel change required to address this issue.
> 
> http://thread.gmane.org/gmane.linux.kernel/1358707/focus=1358709
> 
> Thomas, I'll had this over to you now. If you decide not to include it in mga2,
> then we should likely put a note on the eratta to this bug and the instructions
> on how to solve it.

Needed patches added to SVN and will be part of upcoming 3.4.15
Comment 9 Colin Guthrie 2012-10-21 13:15:53 CEST
Awesome thanks :)
Comment 10 Thomas Backlund 2012-10-22 09:41:53 CEST
kernel-3.4.15-1.mga2 is now available in updates_testing media
Comment 11 Thomas Backlund 2013-01-18 01:54:02 CET
Update pushed:
https://wiki.mageia.org/en/Support/Advisories/MGASA-2013-0010

Note You need to log in before you can comment on or make changes to this bug.