| Summary: | dracut stops on boot, drops to shell | ||
|---|---|---|---|
| Product: | Mageia | Reporter: | Thomas Backlund <tmb> |
| Component: | RPM Packages | Assignee: | Colin Guthrie <mageia> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | normal | ||
| Priority: | Normal | CC: | dan, lists.jjorge, xiche |
| Version: | Cauldron | ||
| Target Milestone: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Source RPM: | dracut | CVE: | |
| Status comment: | |||
| Bug Depends on: | |||
| Bug Blocks: | 4298 | ||
|
Description
Thomas Backlund
2011-12-31 19:21:44 CET
Hiya,
What are "the rest" in this case?
Incidentally, if you have to enter the root password, this is not technically dracut any more but actually the shell in systemd (the dracut shell doesn't require a password).
So, I gather that "local-fs.service" is not starting and that is what is holding things up. At the shell, you should be able to do ("systemctl start local-fs.service" and have it fail, then after mount -a, the same command should succeed).
In systemd, local-fs.service works by querying udev for notification of when the devices are ready (rather than trying to blindly mount things - potentially at a point in time before they are officially "ready" like mount -a does).
So I presume that *something* is missing from the udev database. Dracut starts udev inside the initrd and thus collects info about raid/lvm disks and stores them in it's database in /run. As some attributes are designed to persist, even when udev quits in the initrd and is started again in the real system, and as /run is shared between initrd and the real system, this metadata should persist.
I'm guessing something in this sequence is breaking down. So we need to work out what info is missing from udev.
I guess the "udevadm info --export-db" might help to spot what is missing. It could also be useful to add a "rd.break=pre-pivot" to the kernel command line and check the udev db there too... perhaps the necessary metadata is not marked as persistent and is thus lost (tho' not 100% sure how to check this explicitly).
Feel free to poke me on IRC for more interactive debugging :)
(In reply to comment #1) > Hiya, > > What are "the rest" in this case? Here is the mount diff before and after mount -a +/dev/mapper/isw_bbbhbhihde_rd1p5 on /mnt/data type ext4 (rw,relatime,user_xattr,acl,barrier=1,data=ordered) +/dev/mapper/isw_bbbhbhihde_rd1p2 on /mnt/mageia64 type ext4 (rw,relatime,user_xattr,acl,barrier=1,data=ordered) +/dev/mapper/isw_bbbhbhihde_rd1p3 on /mnt/win7 type fuseblk (rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other,blksize=4096) +/dev/mapper/isw_caaeaiibei_rd2p1 on /mnt/cauldron32 type ext4 (rw,relatime,user_xattr,acl,barrier=1,data=ordered) +/dev/mapper/isw_caaeaiibei_rd2p2 on /mnt/mageia32 type ext4 (rw,relatime,user_xattr,acl,barrier=1,data=ordered) So basically it does not mount _any_ of the other mount points, including swap > > Incidentally, if you have to enter the root password, this is not technically > dracut any more but actually the shell in systemd (the dracut shell doesn't > require a password). > > So, I gather that "local-fs.service" is not starting and that is what is > holding things up. At the shell, you should be able to do ("systemctl start > local-fs.service" and have it fail, then after mount -a, the same command > should succeed). I will check that. > Feel free to poke me on IRC for more interactive debugging :) Will do. I also see we dont have latest lvm2 and that fedora has done some more systemd related work in their packages, so I'll go sync up and test them too... Yup, so I guess the metadata in udev from any dmraid stuff is basically lost as otherwise systemd should have mounted them. The fact that they are all /dev/mapper mount points makes me think that this is what is missing. FWIW, I have lvm partiions here that mount fine, so I know that LVM generally works OK even with our current version (it's actually the reason I started work on using dracut in the first place!). So I'm guessing the difference here imight relate to dmraid in some capacity...
Dan Fandrich
2012-01-19 08:23:38 CET
CC:
(none) =>
dan
Manuel Hiebel
2012-02-06 14:47:03 CET
Blocks:
(none) =>
4298 OK, so this might magically work now with new dracut. See a thread on the ML that I think outlined a problem scenario before (/ on regular ext4 or simialr but /usr on LVM) which I didn't have a test case for (my / and /usr were both LVM, but different VGs). I think this setup should be fixed now. Mageia Beta1 DVD i586, I have the same problem : root on Raid not detected- > shell. This is simple to reproduce in a VM with two disks... CC:
(none) =>
lists.jjorge @tmb: Did you get a chance to look at the intel raid stuff over the weekend? Well not much yet :( But with the fixed kpartx, now my system boots with dmraid and activates and mounts the partitions on the first raid set: /dev/mapper/isw_bbbhbhihde_rd1 But it still fails to boot if I in fstab have any reference to partitions on the second raid set: /dev/mapper/isw_caaeaiibei_rd2 If I only have a reference to a swap partition on the second raid set, it will boot correctly, but wont activate swap partition since it hasn't activated the second set... So a little progress, but needs more work. Interesting. There was an upstream commit to the fedora-storage-init script but it didn't look that important from the message (hence why I asked you about spaces in your names a few days ago): You can try making that change tho'. If it works, I can easily cherry pick that patch :) http://colin.guthr.ie/git/initscripts/commit/systemd/fedora-storage-init?id=7414b9457fba61af86b0756a37bc994d079bc621 I have now retested with the fixed kpartx,and can confirm it works with current setup. The reason it activates the first dmraid set is obviously done by dracut since that's where the root is. (why dracut does not activate the second one since resume partition is there is another bug). The reason systemd does not activate the second one is because it expects mdadm to run it, not dmraid since that's what Intel nowdays wants for isw_* But now, if I add "noiswmd" on kernel command line, systemd will activate the second dmraid set and mount partititons and swap. So this bug is sort of fixed. I've added it to the errata: https://wiki.mageia.org/en/Mageia_2_Errata#Intel_Softraid_.28Bios_FakeRaid.29 Next up is to see if we can alter the installer to setup mdadm by default for isw_* softraids... As for the patch in comment 8, please apply it... iirc we had a similar bug in mdv back in the days regarding spaces in the names, so better to fix it here too... Status:
NEW =>
RESOLVED |