Bug 7892 - systemd mdadm.service not mounting raid devices
Summary: systemd mdadm.service not mounting raid devices
Status: RESOLVED OLD
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: 2
Hardware: x86_64 Linux
Priority: Normal normal
Target Milestone: ---
Assignee: Colin Guthrie
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-10-24 03:10 CEST by Robert Riches
Modified: 2013-11-23 16:13 CET (History)
2 users (show)

See Also:
Source RPM:
CVE:
Status comment:


Attachments
output of "df -h" (315 bytes, text/plain)
2012-10-24 04:14 CEST, Robert Riches
Details
/etc/mdadm.conf (649 bytes, text/plain)
2012-10-24 04:15 CEST, Robert Riches
Details
/etc/fstab (390 bytes, text/plain)
2012-10-24 04:16 CEST, Robert Riches
Details
/proc/mdstat showing partial RAIDs (666 bytes, text/plain)
2012-10-24 04:16 CEST, Robert Riches
Details
screenshot of boot-up sequence (22.49 KB, image/png)
2012-10-25 05:49 CEST, Robert Riches
Details
Output of "systemd-analyze" (142.71 KB, image/svg+xml)
2012-10-27 04:43 CEST, Robert Riches
Details
Systemd-analyze plot after applying the patch (355.31 KB, image/svg+xml)
2012-10-27 07:16 CEST, Robert Riches
Details
Plot after booting after doing "dracut -f" (286.50 KB, image/svg+xml)
2012-10-27 23:29 CEST, Robert Riches
Details

Description Robert Riches 2012-10-24 03:10:12 CEST
In a new installation of Mageia 2, systemd apparently does not not activate mdadm-style RAIDs in time to mount the filesystems, and boot-up gets thrown into emergency mode.  This is a 64-bit Qemu VM being used to test the release before I put it on my main production machine.  There are several RAID10 arrays on four identically-partitioned Virtio disks.  (The installer handled it perfectly, as did the Mageia 1 installer on bare metal.)  The RAIDs are supposed to be mounted on /, /home, /usr/local, /tmp, and swap.  (/boot is a non-RAID partition, and there's one more RAID that is not normally mounted.)

After pressing ESC to see a text console while booting, I see a complaint that a dependency for mounting /usr/local has failed.  I'll attach the output of 'df -h' and a few RAID-related things.  Syslog is not started, so there's nothing in /var/log to shed any additional light.

From the emergency mode shell, /proc/mdstat shows only the RAIDs for / and swap activated.  The other RAIDs show less than four devices.  IIRC, the number of devices showing differs between boot-up attempts.  All RAIDs are listed in /etc/mdadm.conf, correctly as far as I can see.  As an experiment, I have done 'mdadm --stop' on the RAIDs that were partial, then 'mdadm -As', and all RAIDs activated without complaint.  After that, 'mount -a' correctly mounted all filesystems without difficulty.

In the newsgroup, it was suggested that this would likely be assigned to Colin Guthrie.  My (mostly-uneducated) guess is systemd fumbles something in the dependency chain between letting udev activate all the partitions, then activating the RAIDs, then mounting the filesystems.
Comment 1 Robert Riches 2012-10-24 04:14:47 CEST
Created attachment 2984 [details]
output of "df -h"
Comment 2 Robert Riches 2012-10-24 04:15:34 CEST
Created attachment 2985 [details]
/etc/mdadm.conf
Comment 3 Robert Riches 2012-10-24 04:16:01 CEST
Created attachment 2986 [details]
/etc/fstab
Comment 4 Robert Riches 2012-10-24 04:16:30 CEST
Created attachment 2987 [details]
/proc/mdstat showing partial RAIDs
Sander Lepik 2012-10-24 11:16:27 CEST

CC: (none) => sander.lepik
Assignee: bugsquad => mageia

Comment 5 Colin Guthrie 2012-10-24 11:29:53 CEST
I'll need to look in more depth but the basic scenario is that the raids for / and swap are assembled in the initrd and then the rest should be handled by mdadm.service as you said (I have a similar setup on my home server so I know it's working generally - just need to work out what's going on here!)

At the point in which things go into emergency mode can you do a "systemd-analyze plot >boot.svg" (you need the systemd-tools pkg installed). This will give a good visual insight into the order of the units run.

I'd be particularly interested to see if fedora-wait-storage.service has run properly.

There are several workarounds that could be added here depending on the problem.

One is to ensure the raids are assembled and started in the initrd which is pretty easy to achieve via dracut config, but I'd really like to find out why things don't work so if you're able to do a bit more digging I'd appreciate it.
Comment 6 Thomas Backlund 2012-10-24 12:07:30 CEST
IIRC there were some fixes done in upstream mdadm for mdadm not always telling systemd to wait waiting for all md devices to be available...

(I dont remember if we have all known fixes in mga2)

There is also a mdadm-3.2.6 maintenance release scheduled upstream soon, so we need to check/decide if we switch to it depending on the fixes it will bring.

CC: (none) => tmb

Comment 7 Thomas Backlund 2012-10-24 12:29:05 CEST
Ah, yes it was this one I remembered:

udev-rules: prevent systemd from mount devices before they are ready.
http://git.neil.brown.name/git?p=mdadm.git;a=commit;h=090900c3d2eb5b3aef5251a21228483c32246cc7
Comment 8 Colin Guthrie 2012-10-24 14:23:38 CEST
Ahh indeed that looks like it could fix the problem.

@Robert can you try applying that patch and see if it helps?

You can apply it easily by running the following command as root:

 wget -q -O - "http://git.neil.brown.name/git?p=mdadm.git;a=commitdiff_plain;h=090900c3d2eb5b3aef5251a21228483c32246cc7;hp=517f135d32cfe44a9ac8b0686dbb7be1bcabc867" | patch --no-backup-if-mismatch -F3 /lib/udev/rules.d/64-md-raid.rules

I checked on my Mageia 2 install and despite the fuzz it does apply correctly.


After patching the udev rule, it is a good idea to regenerate your initrd as it contains a copy of this file. Normally a "dracut -f" should be enough to regenerate the initrd for the current boot, but this does overwrite it, so make sure to do a backup if you feel it's necessary.

Let us know how you get on!
Comment 9 Robert Riches 2012-10-25 04:07:09 CEST
Thanks all for the reponses.

I'll try to attach the output of the "systemd-analyze plot" command and then attempt to apply the patch as soon as I can scrape together a few minutes, but it might be a few days before I'll be able to do that.
Comment 10 Robert Riches 2012-10-25 05:49:58 CEST
Created attachment 2989 [details]
screenshot of boot-up sequence

In case it might help in the mean time while I'm working on installing the needed package, here's a screenshot of the console messages.  It sat for a long time with "Started Load legacy module configuration" as the last line showing.  Then, immediately after the "Starting /dev/md120", everything else printed.  Might the dependency code be waiting for only the _first_ RAID to become available when it should be waiting for _all_ relevant RAIDs to become available?

<opinion>I would have thought a package that is required in order to do initial debug of systemd problems ought to be installed by default.  I'll make a note to manually select it for installation next time.</opinion>
Comment 11 Robert Riches 2012-10-26 06:50:43 CEST
Colin, I attempted to get the output of 'systemd-analyze plot' but was not successful.  The first two times I ran it, it quickly complained that DBus was not running--because I don't normally run root DBus and had set it to disabled during installation.  To resolve that, I attempted "systemctl start dbus.service" but that failed due to the RAIDs not being ready for action.  I manually got the RAIDs started and then tried again to start DBus.  This time, that switched from text console to bootsplash a few times, said, "Starting netprofile ... OK" and then "Checking for new hardware".  Since then, 28 minutes have passed without any more indication of activity.  The machine appears to have completely hung.  It won't respond to Alt-Ctrl-F(n).  I'll have to forcibly shut it down.

Oh, as near as I could tell, systemd-tools is not on the installation DVD, but I found it on a mirror.  I would think that 20KB RPM should really be on the installation DVD and should be installed by default.
Comment 12 Colin Guthrie 2012-10-26 10:43:41 CEST
Hmm, you disabled dbus? That's generally not a good ideal. It's a static service these days (i.e. you cannot disable it without masking it and even then "bad things"(tm) would likely happen) as it's a primary mechanism for IPC for numerous system tools (including systemd itself). The installer shouldn't have given you the option to disable it (as it wouldn't have worked anyway) - perhaps I've got a bug there in the service management stuff that doesn't properly spot static services - will add that to my "to check" list!

After mounting the raids manually, I was expecting you just to run "systemctl default" to go to your default target (as I presume you've been doing to get a usable system) and then run the analzye from there. Theoretically you should have been left in the emergency console after starting dbus, but I guess it somehow exited. I'll try and reproduce this scenario to see if it can be made more friendly.

systemd-tools package itself may just be 20KB but it pulls in the a good chunk of the python stack so for minimal installs it's a really bad idea to have it included by default. This is just a handy visual tool anyway, and for "proper" debug, it's simply a matter of kernel command line adjustments and log file copying, but I just wanted a quick and easy overview.


That said, I suspect the udev rules patch Thomas mentioned is key here and I'd try applying that patch first before worrying about any debug output. After all if it works, then any debug of the issue is kinda pointless :)
Comment 13 Robert Riches 2012-10-27 04:43:32 CEST
Created attachment 2993 [details]
Output of "systemd-analyze"

Colin, thanks for the tip to do 'systemctl default' after manually mounting the rest of the filesystems.  That got to a (runlevel 3) login prompt, the first time this installation has gone to a login prompt.  Until that point, this installation was about as good as DOA.  From the login prompt, I was able to capture the SVG file.  I'll try the patch as soon as I can get another few minutes, probably tomorrow at the earliest.

By the way, in case it might be relevant in some way, with Mageia 1, a similar hardware system with RAID10s of partitions gets stuck during boot for about 4-5 minutes.  Each minute there's a message to the effect of "Waiting for /dev/sd{a,b,c,d}11 to appear; timeout 1 minute".  (The machine stays up 24x7 and boots only for long power outages, new kernels, etc.)

To answer the question about disabling system-level, root-owned DBus, I have 
done so ever since DBus appeared in RedHat, Mandrake, or Mandriva.  A simple "chkconfig messagebus off" has always done the job.  I'm apparently not the only one who considers root-owned DBus to be a serious security hazard, creating not just a covert channel but a covert freeway.  I don't use KDE or Gnome; preferring runlevel 3 and startx with straight FVWM2.  GTK2 (required by Firefox and a few others), along with Gimp, insist on starting user-owned DBus.  I tolerate that but kill off the DBus processes before and after doing online banking, online shopping, etc.  If root-owned DBus becomes mandatory in some but not all distributions, that will provide at least 60% of the motivation it would take to get me to switch to a distribution that doesn't have that requirement.

Similarly, I'm a very big non-fan of systemd.  From what I see, there's no need for it.  Init scripts of the SysV and BSD varieties have worked well enough for me since 1987 on VAX Ultrix until today.  From what I see, it is rather opaque with poorly documented, unnecessarily verbose, but still cryptic commands.  To my nose, it has a similar smell to the Windows Registry.  I'm grateful Mageia has reportedly made it possible (even if they don't document the method) to use SysV initscripts for Mageia 2.  Frankly, if that weren't possible, I would probably be preparing to switch to Debian or some other non-systemd distribution instead of moving Mageia 2.  When SysV initscripts are no longer available for Mageia, I'll have a decision to make, which will be based on whether systemd proves itself to be trustworthy and user-friendly enough between now and then.

Anyway, Colin, those two paragraphs are probably much more than you wanted when you asked the question about why I disabled DBus.  Thank you for the info that has allowed this installation to make it to a login prompt.  Again, I'll try the patch when I have a chance.
Comment 14 Robert Riches 2012-10-27 07:16:32 CEST
Created attachment 2994 [details]
Systemd-analyze plot after applying the patch

Okay, burning a little late-night oil.  The patch applied successfully.  However, there is no change in symptoms; the mounting still fails.  After verifying the symptoms, I verified that the patch was still correctly in place, and it was.  This attachment was taken after a boot that used the patched file.

At this point, I'll keep the systemd-based VM image around in case there's further testing to do (in case it might help someone else with similar symptoms), and as a backup in case my efforts to switch to sysvinit fail.

Thank you for getting me to a login prompt so I can make progress from here.
Sander Lepik 2012-10-27 13:48:56 CEST

Attachment 2993 mime type: application/octet-stream => image/svg+xml

Sander Lepik 2012-10-27 13:51:42 CEST

Attachment 2994 mime type: application/octet-stream => image/svg+xml

Comment 15 Sander Lepik 2012-10-27 13:54:02 CEST
Just to be sure, did you also run "dracut -f" after patching?
Comment 16 Robert Riches 2012-10-27 16:12:19 CEST
Oops.  I missed that step.  Will attempt to do that this afternoon.

(Thanks for fixing the file type on the SVG attachments.  "image/svg+xml" is not in the selection list when making an attachment, and I would not have known how to spell it.)
Comment 17 Robert Riches 2012-10-27 23:29:31 CEST
Created attachment 3000 [details]
Plot after booting after doing "dracut -f"

Did "dracut -f".  Verified the patch was still in effect, and verified that initrd had changed contents (shrank by ~20KB, IIRC).  Curiously, when I did "shutdown -h now" after doing "dracut -f", it yielded a kernel panic while in process systemd-shutdow (pid: 1, ...).  That was odd.  Booted up, and it had the same symptoms of failing to mount the filesystems because the RAIDs were not complete.  This plot is from the post-dracut boot.
Comment 18 Colin Guthrie 2012-10-28 12:17:05 CET
Wow, the timings involved here are pretty crazy. This machine seems to take a loooong time to initialise. 

It seems that the timeout before hitting the emergency.target is simply too short. As you can see from the plot the various devices appear shortly after the emergency target is reached. I presume this is just luck rather than you doing anything specific, but but please do correct me if that's an incorrect analysis.

What looks odd to me is that udev.service doesn't even kick in until emergency.service is loaded. 

So I think it is actually looking like a problem of very slow device initialisation. I'll have to ask upstream on how best to debug/analyse this problem.
Comment 19 Colin Guthrie 2012-10-28 12:20:09 CET
(the above said, it the post-patch output does look much better than the pre-patch output - the devices are all showing up correctly etc in the output now - so I call that progress of sorts!)
Comment 20 Robert Riches 2012-10-28 18:46:58 CET
The slowness may be due to the fact that this is a VM using qemu, because libvirt 0.9 doesn't support kvm, as far as I could tell.  From what I have seen, emulation by qemu is somewhere around an order of magnitude slower for CPU-intensive stuff than bare metal or kvm.  The existing timeouts may be appropriate for bare metal or a VM that uses kvm.

Might the devices showing up be a result of me using the emergency mode shell to do "mdadm --stop ..." and then "mdadm -As" (followed by "mount -a")?  I would think you would have taken that into account, but just in case...

Thanks for the word that the patch does improve things in the plot.  I'll keep it around.
Comment 21 Manuel Hiebel 2013-10-22 12:10:01 CEST
This message is a reminder that Mageia 2 is nearing its end of life.
Approximately one month from now Mageia will stop maintaining and issuing updates for Mageia 2. At that time this bug will be closed as WONTFIX (EOL) if it remains open with a Mageia 'version' of '2'.

Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Mageia version prior to Mageia 2's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Mageia 2 is end of life.  If you would still like to see this bug fixed and are able to reproduce it against a later version of Mageia, you are encouraged to click on "Version" and change it against that version of Mageia.

Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Mageia release includes newer upstream software that fixes bugs or makes them obsolete.

-- 
The Mageia Bugsquad
Comment 22 Manuel Hiebel 2013-11-23 16:13:37 CET
Mageia 2 changed to end-of-life (EOL) status on ''22 November''. Mageia 2 is no
longer maintained, which means that it will not receive any further security or
bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of Mageia
please feel free to click on "Version" change it against that version of Mageia
and reopen this bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

--
The Mageia Bugsquad

Status: NEW => RESOLVED
Resolution: (none) => OLD


Note You need to log in before you can comment on or make changes to this bug.