Bug 3972 - dracut stops on boot, drops to shell
Summary: dracut stops on boot, drops to shell
Status: RESOLVED FIXED
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: Cauldron
Hardware: x86_64 Linux
Priority: Normal normal
Target Milestone: ---
Assignee: Colin Guthrie
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 4298
  Show dependency treegraph
 
Reported: 2011-12-31 19:21 CET by Thomas Backlund
Modified: 2012-03-11 21:39 CET (History)
3 users (show)

See Also:
Source RPM: dracut
CVE:
Status comment:


Attachments

Description Thomas Backlund 2011-12-31 19:21:44 CET
I have a system running on dmraid with two raid 1 disks:
/dev/mapper/isw_bbbhbhihde_rd1
/dev/mapper/isw_bbbhbhihde_rd2

it mounts /, but complains about not being able to mount the rest, and drops to a shell.

There I just.
- enter root passwd
- mount -a
- systemctl default

and the system continues the boot and comes up fully working.

FSTAB:
/dev/mapper/isw_bbbhbhihde_rd1p1 / ext4 acl,relatime 1 1
/dev/mapper/isw_bbbhbhihde_rd1p5 /mnt/data ext4 acl,relatime 1 2
/dev/mapper/isw_bbbhbhihde_rd1p2 /mnt/mageia64 ext4 acl,relatime 1 2
/dev/mapper/isw_bbbhbhihde_rd1p3 /mnt/win7 ntfs-3g defaults,umask=000 0 0
none /proc proc defaults 0 0
/dev/mapper/isw_caaeaiibei_rd2p3 swap swap defaults 0 0
/dev/mapper/isw_caaeaiibei_rd2p1 /mnt/cauldron32 ext4 acl,relatime 1 2
/dev/mapper/isw_caaeaiibei_rd2p2 /mnt/mageia32 ext4 acl,relatime 1 2


GRUB menu stanza:
title linux
kernel (hd0,0)/boot/vmlinuz BOOT_IMAGE=linux root=/dev/mapper/isw_bbbhbhihde_rd1p1 resume=/dev/mapper/isw_caaeaiibei_rd2p3 splash=silent vga=791
initrd (hd0,0)/boot/initrd.img

device.map:
(hd0) /dev/mapper/isw_bbbhbhihde_rd1
(hd1) /dev/mapper/isw_caaeaiibei_rd2


I have dracut-014-10.mga2 and have recreated the initrd with it to be sure I use latest code.

Any suggestions?

And yes, mkinitrd/initscripts used to work on this system.
Comment 1 Colin Guthrie 2012-01-03 13:09:12 CET
Hiya,

What are "the rest" in this case?

Incidentally, if you have to enter the root password, this is not technically dracut any more but actually the shell in systemd (the dracut shell doesn't require a password).

So, I gather that "local-fs.service" is not starting and that is what is holding things up. At the shell, you should be able to do ("systemctl start local-fs.service" and have it fail, then after mount -a, the same command should succeed).

In systemd, local-fs.service works by querying udev for notification of when the devices are ready (rather than trying to blindly mount things - potentially at a point in time before they are officially "ready" like mount -a does).

So I presume that *something* is missing from the udev database. Dracut starts udev inside the initrd and thus collects info about raid/lvm disks and stores them in it's database in /run. As some attributes are designed to persist, even when udev quits in the initrd and is started again in the real system, and as /run is shared between initrd and the real system, this metadata should persist.

I'm guessing something in this sequence is breaking down. So we need to work out what info is missing from udev.

I guess the "udevadm info --export-db" might help to spot what is missing. It could also be useful to add a "rd.break=pre-pivot" to the kernel command line and check the udev db there too... perhaps the necessary metadata is not marked as persistent and is thus lost (tho' not 100% sure how to check this explicitly).

Feel free to poke me on IRC for more interactive debugging :)
Comment 2 Thomas Backlund 2012-01-03 14:24:01 CET
(In reply to comment #1)
> Hiya,
> 
> What are "the rest" in this case?

Here is the mount diff before and after mount -a
+/dev/mapper/isw_bbbhbhihde_rd1p5 on /mnt/data type ext4 (rw,relatime,user_xattr,acl,barrier=1,data=ordered)
+/dev/mapper/isw_bbbhbhihde_rd1p2 on /mnt/mageia64 type ext4 (rw,relatime,user_xattr,acl,barrier=1,data=ordered)
+/dev/mapper/isw_bbbhbhihde_rd1p3 on /mnt/win7 type fuseblk (rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other,blksize=4096)
+/dev/mapper/isw_caaeaiibei_rd2p1 on /mnt/cauldron32 type ext4 (rw,relatime,user_xattr,acl,barrier=1,data=ordered)
+/dev/mapper/isw_caaeaiibei_rd2p2 on /mnt/mageia32 type ext4 (rw,relatime,user_xattr,acl,barrier=1,data=ordered)

So basically it does not mount _any_ of the other mount points, including swap

> 
> Incidentally, if you have to enter the root password, this is not technically
> dracut any more but actually the shell in systemd (the dracut shell doesn't
> require a password).
> 
> So, I gather that "local-fs.service" is not starting and that is what is
> holding things up. At the shell, you should be able to do ("systemctl start
> local-fs.service" and have it fail, then after mount -a, the same command
> should succeed).

I will check that.

> Feel free to poke me on IRC for more interactive debugging :)

Will do.

I also see we dont have latest lvm2 and that fedora has done some more systemd related work in their packages, so I'll go sync up and test them too...
Comment 3 Colin Guthrie 2012-01-03 14:32:59 CET
Yup, so I guess the metadata in udev from any dmraid stuff is basically lost as otherwise systemd should have mounted them. The fact that they are all /dev/mapper mount points makes me think that this is what is missing.

FWIW, I have lvm partiions here that mount fine, so I know that LVM generally works OK even with our current version (it's actually the reason I started work on using dracut in the first place!). So I'm guessing the difference here imight relate to dmraid in some capacity...
Dan Fandrich 2012-01-19 08:23:38 CET

CC: (none) => dan

Manuel Hiebel 2012-02-06 14:47:03 CET

Blocks: (none) => 4298

Comment 4 Colin Guthrie 2012-02-14 17:17:48 CET
OK, so this might magically work now with new dracut.

See a thread on the ML that I think outlined a problem scenario before (/ on regular ext4 or simialr but /usr on LVM) which I didn't have a test case for (my / and /usr were both LVM, but different VGs). I think this setup should be fixed now.
Comment 5 José Jorge 2012-02-26 23:06:08 CET
Mageia Beta1 DVD i586, I have the same problem : root on Raid not detected- > shell. This is simple to reproduce in a VM with two disks...

CC: (none) => lists.jjorge

Comment 6 Colin Guthrie 2012-03-07 15:22:16 CET
@tmb: Did you get a chance to look at the intel raid stuff over the weekend?
Comment 7 Thomas Backlund 2012-03-07 17:38:33 CET
Well not much yet :(

But with the fixed kpartx, now my system boots with dmraid and activates and mounts the partitions on the first raid set:
/dev/mapper/isw_bbbhbhihde_rd1

But it still fails to boot if I in fstab have any reference to partitions on the second raid set:
/dev/mapper/isw_caaeaiibei_rd2

If I only have a reference to a swap partition on the second raid set, it will boot correctly, but wont activate swap partition since it hasn't activated the second set...

So a little progress, but needs more work.
Comment 8 Colin Guthrie 2012-03-07 17:51:18 CET
Interesting.

There was an upstream commit to the fedora-storage-init script but it didn't look that important from the message (hence why I asked you about spaces in your names a few days ago):

You can try making that change tho'. If it works, I can easily cherry pick that patch :)

http://colin.guthr.ie/git/initscripts/commit/systemd/fedora-storage-init?id=7414b9457fba61af86b0756a37bc994d079bc621
Comment 9 Thomas Backlund 2012-03-07 21:13:29 CET
I have now retested with the fixed kpartx,and can confirm it works with current setup.


The reason it activates the first dmraid set is obviously done by dracut since that's where the root is. (why dracut does not activate the second one since resume partition is there is another bug).


The reason systemd does not activate the second one is because it expects mdadm to run it, not dmraid since that's what Intel nowdays wants for isw_*

But now, if I add "noiswmd" on kernel command line, systemd will activate the second dmraid set and mount partititons and swap.

So this bug is sort of fixed. 

I've added it to the errata: 
https://wiki.mageia.org/en/Mageia_2_Errata#Intel_Softraid_.28Bios_FakeRaid.29

Next up is to see if we can alter the installer to setup mdadm by default for isw_* softraids...

As for the patch in comment 8, please apply it... iirc we had a similar bug in mdv back in the days regarding spaces in the names, so better to fix it here too...

Status: NEW => RESOLVED
Resolution: (none) => FIXED

Comment 10 Colin Guthrie 2012-03-11 21:39:22 CET
*** Bug 4875 has been marked as a duplicate of this bug. ***

CC: (none) => xiche


Note You need to log in before you can comment on or make changes to this bug.