Bug 4562 - dracut uses udevadm info to detect LVM+RAID+BTRFS
Summary: dracut uses udevadm info to detect LVM+RAID+BTRFS
Status: RESOLVED FIXED
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: Cauldron
Hardware: i586 Linux
Priority: release_blocker critical
Target Milestone: ---
Assignee: Colin Guthrie
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 4298
  Show dependency treegraph
 
Reported: 2012-02-17 20:08 CET by Colin Guthrie
Modified: 2013-03-13 16:01 CET (History)
6 users (show)

See Also:
Source RPM: dracut
CVE:
Status comment:


Attachments
dmesg from non-hostonly initrd boot of cauldron (122.32 KB, text/plain)
2012-04-28 23:12 CEST, Dave Hodgins
Details

Description Colin Guthrie 2012-02-17 20:08:46 CET
The latest version of dracut uses information from the udev metadata to determine if certain modules are needed.

This included detecting lvm, raid and some btrfs stuff too.

This presents a chicken and egg scenario for upgrades. As our previous mkinitrd system started LVM and RAID during early boot before udev was started, various bits of metadata never make it into udev. Because of this, if you take a Mageia 1 instance (booted with mkinitrd) and then generate a dracut initramfs, it will NOT include the needed LVM or RAID support.

Previous versions of dracut did not use udevadm and used more direct probing mechanisms and thus did not suffer from these problems.
Colin Guthrie 2012-02-17 20:08:56 CET

Status: NEW => ASSIGNED

Colin Guthrie 2012-02-17 20:09:40 CET

Priority: Normal => release_blocker

Comment 1 François Jaouen 2012-02-24 07:58:03 CET
I encounter the same problem - no LVM on early boot - with a _fresh_ install of Mageia 2 beta 1 from DVD.

The mapping of partitions is :
sda1 /boot
lvm  /
lvm  /home
lvm  swap

The symptom is that, at boot after a successfull installation, dracut falls to the debug shell because it can't mount /.
No lvm command is available from this debug shell.

I wonder if the creation of the dracut initramfs during a fresh installation suffers from the same problem than from for the upgrade from mga1 to mga2 you mention, Colin.

CC: (none) => farfouille64

Comment 2 Dave Hodgins 2012-02-24 08:49:28 CET
When sorting out how to fix this bug, keep in mind that rescue cds
may be old, and use devfs rather then udev.

In my opinion, the none of the various module.setup scripts should
assume udev is running.

My primary "rescue cd" is an old knoppix 5 cd that uses devfs.

CC: (none) => davidwhodgins

Comment 3 Colin Guthrie 2012-02-24 11:54:09 CET
OK, so the upstream recommendation is to generate a non-hostonly initramfs in these situations. This should be a far bigger (about 25megs in my tests!) but it should be a safer option.

I've added a hack that detects whether the current boot is from dracut, (via the existence of the /run/initramfs folder - if it exists, we've booted with dracut), and will disable hostonly mode even if it's specified in the config or via command line. There is an env var to override this behaviour should it be not required by e.g. the installer (as opposed to a "live" urpmi upgrade).
Comment 4 Colin Guthrie 2012-02-24 11:56:24 CET
@François, I would have thought that the installer should work without this issue, so perhaps there was another problem to blame (although I wouldn't rule out udev not having the necessary metadata about newly created LVMs).

Can I ask, did you reuse an existing LVM in your install or did you repartition (I know it was a fresh install, but reusing+formatting existing drives could still be considered "fresh installs" I guess).
Comment 5 François Jaouen 2012-02-24 21:22:12 CET
The LVM already existed before the fresh installation.
During the installation, at the disk preference screen, I choosed the last option (customized I think), then I reused existing partitions :

sda1 /boot format:yes
lvm  /     format:yes
lvm  /home format:no
lvm  swap  format:NA AFAIK

I didn't build the LVM during the installation process (I even don't know if it is possible)

There is also 2 NTFS partitions at the bottom of the HD (Windows7) that I have ignored during the mga2b1 installation.

My intent is to install mga2 on this computer when it will be released. Meanwhile, if you need me to test something don't hesitate, I can reformat or reinstall it.
Comment 6 Colin Guthrie 2012-02-25 11:44:35 CET
@François: I think there could be a subtle difference between two different paths during installation here.

I think that if the LVM were created at install time, things would work OK (data would get into udev database) with a hostonly initrd, but if they exist already and are reused, it will not (due to the way we boot, the LVM are activated before udev is run and thus the metadata isn't available via udev database)

As things stand right now (with cauldron) the installer should now produce a non-hostonly initrd during install. It'll be larger, but it should be safer. After a proper boot, a more streamlined initrd could be generated (sudo dracut -f). When the kernel is next updated, a new hostonly (smaller) initrd will also be produced for it.

This could be enough of a solution to close this bug, but I'll leave it open for testing etc.
Comment 7 François Jaouen 2012-03-27 08:56:21 CEST
@Colin: I've tested again with bare mga2b2 DVD and it is ok ; lvm partitions are detected at reboot (after install) by dracut. Thank you
Comment 8 Sander Lepik 2012-03-27 14:06:33 CEST
So can we close this bug?

CC: (none) => sander.lepik

Comment 9 Colin Guthrie 2012-03-27 14:08:39 CEST
I'd like to keep it open for now. There is still a question as to whether or not the installer can generate a hostonly initrd. I need to do some testing on various setups firs tho'. Hopefully I'll get round to that this weekend.
Comment 10 Dave Hodgins 2012-03-28 00:59:35 CEST
It still doesn't run the "lvm vgchange -a y" command before
trying to mount /usr, when using a non-hostonly initrd,
which results in dropping to an un-useable shell after pivot.

The system has to be rebooted with the rdbreak=pre-pivot,
and the command run manually.
Anne Nicolas 2012-04-05 22:32:55 CEST

CC: (none) => ennael1
Blocks: (none) => 4298

Comment 11 Colin Guthrie 2012-04-14 21:21:25 CEST
Just to update this bug with some progress:
 1. Installer will now generate a hostonly initrd.
 2. I have added a change to dracut and am about to commit a change to installer that ensures that any swap partitions on LVM etc are properly activated.

The issue of the lvm vgchange not being run on a non-hostonly image is still present, but I'll try and look at that next (as it will affect urpmi upgrades)
Comment 12 Colin Guthrie 2012-04-15 00:45:31 CEST
OK, I have just installed a test system.

sda1: / ext4
sda5: LVM (vg-mga)

vg-mga/swap: swap
vg-mga/usr: /usr ext4


The hostonly initrd generated in the installer worked fine and allowed smooth boot. (Note it was modified such that installer dropped a config file to activate the swap properly - patch submitted but if you try and reproduce this test you may get a timeout waiting for the swap. You can ignore this problem or drop the resume= command line entry).

I regenerated a non-hostonly initrd and rebooted. It also worked fine.

With all this in mind, have I missed any remaining problems or is this all finally handled?
Comment 13 Dave Hodgins 2012-04-25 02:53:11 CEST
Didn't work on my system.  With / on a regular partition, and /usr on
a lvm logical volume, the non-hostonly initrd generated when I installed
the latest update in a chroot from Magea 1 still fails to activate
the logical volume.  Since the device specified for /usr in /etc/fstab
is not found, dracut skips trying to mount /usr, and then locks up
after the root pivot.

I had to use rd.break=pre-pivot and run the "lvm vgchange -a y"
command.

grep usr /etc/fstab
/dev/mapper/91-usr /usr ext4 defaults,relatime,user_xattr 1 2
Comment 14 Colin Guthrie 2012-04-25 10:56:54 CEST
That's interesting as I tested this exact setup.

I'll take another look as I must have mis-tested :s
Comment 15 Colin Guthrie 2012-04-25 23:11:27 CEST
Oh sorry I misread. I was thinking the installer... yeah this is still a bit on an issue on upgrade. Still looking at it.
Comment 16 Colin Guthrie 2012-04-26 23:03:09 CEST
OK, I just installed a VM of Mageia 1 with the following layout:
/     ext4
/usr  LVM + btrfs
swap  LVM + swap

I did a urpmi-based upgrade. I installed rpm-helper first and then just did a urpmi --auto-select --auto for the rest of the upgrade.

Dracut generated a non-hostonly initrd as expected during this install and I rebooted and it all worked happily. I then regenerated the host-only initrd after booting and it also worked happily.

I am about to reset the snapshot and try an installer based upgrade which should generate a hostonly initrd.
Comment 17 Colin Guthrie 2012-04-26 23:33:03 CEST
OK, I have now done an installer based upgrade. It generated a hostonly initrd as expected which worked well.

I'm not sure what more I can do here as all my tests have passed.
Comment 18 Dave Hodgins 2012-04-27 06:09:49 CEST
I just installed all updates, and ran dracut in a chroot, and confirmed
it still fails to activate the lvm volume groups.

There are two problems here.  First not activating the volume groups,
which in my opinion, would best be fixed with testing if lvm is in
the initrd, in which case run the lvm vgchange -a y before trying
to mount anything.

Second problem, is that if a needed filesystem such as /usr, is in
/etc/fstab, and the device is not found, instead of ignoring it,
it should drop to an emergency shell, rather then trying to pivot
to a root where it panics due to /usr not being mounted.
Comment 19 Dave Hodgins 2012-04-27 06:11:22 CEST
Ah. Just noticed. My swap is not on lvm.  If swap is on lvm, that may
be why the volumes are activated for you, but not for me.
Comment 20 Colin Guthrie 2012-04-27 10:29:31 CEST
(In reply to comment #19)
> Ah. Just noticed. My swap is not on lvm.  If swap is on lvm, that may
> be why the volumes are activated for you, but not for me.

That could indeed be the difference (although I'm pretty certain that I checked and there were no cmdline.d files in my non-hostonly initrd so that would rule that out). I'll go and do yet more tests.
Comment 21 Colin Guthrie 2012-04-28 17:12:05 CEST
OK, while I'm not 100% sure why, I was able to reproduce the problem when I removed the swap partition.

I've added a patch that, in non-hostonly mode, ensures that the lvm_scan script is run which happens during the check_finished function at the start of the initqueue processing.

http://svnweb.mageia.org/packages/cauldron/dracut/current/SOURCES/0511-lvm-Ensure-LVM-is-initialised-in-non-hostonly-mode.patch?revision=234111&view=markup

I also agree that it should give you a shell if it cannot mount /usr, I'll see if I can do this.
Comment 22 Dave Hodgins 2012-04-28 23:12:24 CEST
Created attachment 2131 [details]
dmesg from non-hostonly initrd boot of cauldron

As shown by the attached dmesg, /usr is still not getting mounted prior
to the root pivot, however, as bug 4372 has now been fixed, it is
mounted ok, by systemd after the root pivot.

As far as I can see, this isn't causing any problems though.
Comment 23 Colin Guthrie 2012-04-28 23:53:00 CEST
Yeah I messed up the previous patch :s

I finally worked out why this is all so confusing and why I couldn't reproduce in my tests.

It's technically nothing to do with swap on LVM or anything it's to do with the lack of a resume= argument when you boot. If you had a resume=/dev/foo (doesn't really matter what), then LVM would be activated and mounted. This is ultimately because a call to the function wait_for_dev is called which ensures that the whole initqueue is run etc.

So my working fix http://svnweb.mageia.org/packages/cauldron/dracut/current/SOURCES/0511-lvm-Ensure-LVM-is-initialised-in-non-hostonly-mode.patch?revision=234159&view=markup simply waits for a fake device and then cancels the wait after a timeout. There may very well be neater ways to achieve the same result, but I'm tied and I just want to fix this bug now :p

Worked on my test machine without any resume= argument, so I'm optimistically going to close this bug now! If you disagree, please reopen with any extra info you can give :D

Thanks for your patience.

Status: ASSIGNED => RESOLVED
Resolution: (none) => FIXED

Comment 24 Charls Gurusky 2012-08-23 18:42:10 CEST
Hi,

this bug is not fixed. it can be reproduced easily.

create an virtual machine(mine is Fedora 16, kvm, x86_64) and do a fresh install with mageia 2 DVD. partition the hardisk with the following schema:

/boot ext4 500M
/     vg_system/lv_root   7000M
swap  vg_system/lv_swap   1024M

then after installation finished successfully, reboot and the get the error:
~~~
dracut Warning: Cancelling resume operation. Device not found.
udev[79]: sender uid=-1, message ignored.

dracut Warning: Unable to process initqueue
dracut Warning: "/dev/vg_system/lv_root" does not exist

Dropping to debug shell.

sh: 0: can't access tty; job control turned off
dracut:/#
~~~

i've tried remove resume=<path>, rhgb, quiet parameters and boot, failed the same.

try remove the above parameters, and add rd_LVM_LV=vg_system/lv_root rd_LVM_LV=vg_system/lv_swap , then boot, also failed.

does the lvm enabled in kernel configuration?

BR,
Charles

Status: RESOLVED => REOPENED
CC: (none) => phoenix.guo
Resolution: FIXED => (none)

Comment 25 Colin Guthrie 2012-08-23 22:40:05 CEST
@Charles: I'm pretty sure this is not actually this bug specifically, but rather another bug in the mageia-theme package which generates the initrd when it is installed (which is before several other essential packages are installed) and thus the initrd is generated without all the necessary stuff (i.e. the lvm command).

The quick work around is when the install has finished and you're being prompted for the root password, switch to the tty, and do a "rm -f /mnt/boot/initrd-3*.img", then flip back to the graphical display and complete the installation.

I've got an updated theme package that will hopefully be pushed as an update soon. Sadly this doesn't really help with the DVD installs unless you are connected to the 'net when installing and thus can install the "update" as the first package.

Please let me know if this work around works for you.
Comment 26 Charls Gurusky 2012-08-24 16:02:41 CEST
Hi Colin,

Thanks for your explanation. I have confirmed that it is the initramfs' problem.

Here I have a work around for those who do not want their system to be reinstalled.

Procedure:
----------

1. Boot with your Installation DVD and enter rescue mode, by selecting the "Rescue System" in the boot menu.

2. In the 'choose action' screen, select "Mount your partitions under /mnt" item, and select 'Ok'. 

3. You will see informations on system mounting operation. When it prompts "<Press Enter to return to Rescue menu>", press Enter.

4. Select "Go to console" item and select 'Ok', and you will be dropped to a shell.

5. Make /mnt as root directory:

#chroot /mnt

6. Change directory to /boot

#cd /boot

7. Rename the old initrd file:

#mv initrd-3.3.6-desktop-2.mga2.img initrd-3.3.6-desktop-2.mga2.img.bad

8. Rebuild the initramfs:

#dracut initrd-$(uname -r).img $(uname -r)

9. Exit chroot

#exit

10. Reboot system and see if it works.

#reboot


I also have some screenshots for this rescue guide, but I don't know how to host them as I'm a newbie here.

Hope this can help the others.

BTW, does mageia needs tester? How to apply?

BR,
Charls
Comment 27 alain deraedt 2013-02-04 00:36:23 CET
That doesn'work; I've just made the operations describded in comment 26, about a mageia2.i586:
sda1 /boot
vgmageia/lvrootmga32: /
vgmageia/lvusrmga32: /usr
vgmageia/lvhomemga32: /home
I wished to rename the lv's by replacing "mga32" by "mga2-32".
I have chrooted from a fedora16 32 bits and made exactly the same operations as in comment 26, using dracut.
I couldn't reboot my mageia:
the splash gave:
".../dev/mageia/lvrootmga2-32 doesn't exist..." and so on.
Best regards
Alder

CC: (none) => alainderaedt

Comment 28 Anne Nicolas 2013-03-13 15:24:52 CET
Any status about that bug?
Comment 29 Colin Guthrie 2013-03-13 16:01:20 CET
As stated above, I'd *really* rather not use this bug. It's old and was for mga2 and whether this is the same or a similar problem or not, I'd much prefer to use a new bug for mga3 for tracking purposes.

So closing this bug. If it is still an issue please reopen it and make sure it *doesn't* block the same bug as this one which was an MGA2 release tracking bug.

Resolving as FIXED because it was fixed in mga2.

Resolution: (none) => FIXED
Status: REOPENED => RESOLVED


Note You need to log in before you can comment on or make changes to this bug.