28875 – after upgrade from mga7.1 mga8 does not boot from lvm on nvme disk

Bug 28875 - after upgrade from mga7.1 mga8 does not boot from lvm on nvme disk

Summary: after upgrade from mga7.1 mga8 does not boot from lvm on nvme disk

Status:	RESOLVED OLD

Alias:	None

Product:	Mageia
Classification:	Unclassified
Component:	Installer (show other bugs)
Version:	8
Hardware:	x86_64 Linux

Priority:	High Severity: critical
Target Milestone:	---
Assignee:	Kernel and Drivers maintainers
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2021-05-04 16:09 CEST by Markus Mertens
Modified:	2024-08-22 23:02 CEST (History)
CC List:	2 users (show)

See Also:
Source RPM:
CVE:
Status comment:

Attachments
rdsosreport without debug information (115.55 KB, text/plain) 2021-05-04 16:09 CEST, Markus Mertens	Details
fstab (1.98 KB, text/x-matlab) 2021-05-05 09:19 CEST, Markus Mertens	Details
lvm.conf (100.06 KB, text/plain) 2021-05-05 09:19 CEST, Markus Mertens	Details
View All Add an attachment (proposed patch, testcase, etc.)

Description Markus Mertens 2021-05-04 16:09:38 CEST

Created attachment 12701 [details]
rdsosreport without debug information

Description of problem:

After upgrade from mga7.1 to mga8 the system cannot boot directly anymore. I have to start in rescue mode, login as root and enter "init 5" to complete the boot process. From 4 lvm-volumes on the main nvme-disk only one is activated early enough to be mounted. The other three lvm-volumes remain inactive. The /boot/efi partition resides on a SAS-disk, not on the nvme-disk.

From fstab:
/dev/vg_system/lv_root  /      ext4  rw,relatime,acl  1 1
/dev/vg_system/lv_home  /home  ext4  rw,relatime,acl  1 2
/dev/vg_system/lv_opt   /opt   ext4  rw,relatime,acl  1 2
/dev/vg_system/lv_var   /var   ext4  rw,relatime,acl  1 2

UUID=1E7E-90C2          /boot/EFI  vfat  umask=000,iocharset=utf8  0 0

What is different for lvm in rescue mode so that all volumes are activated and can be mounted? What has changed from mga7.1 to mga8? It was always working since mga6.


How reproducible:

On my system always. I produced an rdsosreport file with debug information, but that one is too big. So I attached a normal rdsosreport.

Markus Mertens 2021-05-04 16:10:19 CEST

Priority: Normal => High

Comment 1 Aurelien Oudelet 2021-05-04 16:58:26 CEST

relevant lines:

[    4.036482] nvme nvme0: pci function 0000:03:00.0
[    4.049191] nvme nvme0: 7/0/0 default/read/poll queues
[    4.053366]  nvme0n1: p1

[  161.878470] dracut: Scanning devices nvme0n1p1  for LVM logical volumes vg_system/lv_root
[  161.917604] dracut: inactive '/dev/vg_system/lv_opt' [64.00 GiB] inherit
[  161.917743] dracut: inactive '/dev/vg_system/lv_var' [128.00 GiB] inherit
[  161.917861] dracut: ACTIVE '/dev/vg_system/lv_root' [256.00 GiB] inherit
[  161.917973] dracut: inactive '/dev/vg_system/lv_home' [256.00 GiB] inherit
[  237.134115] dracut Warning: Could not boot.
[  237.144105] dracut Warning: /dev/disk/by-uuid/a2ba326c-21f0-447a-866f-cbfca4d4ac91 does not exist

It seems to be seen by the Kernel, but the uuid not found, moreover, the lvm signature seems missing from your nvme0n1p1 partition.

CC: (none) => ouaurelien

Comment 2 Markus Mertens 2021-05-05 08:45:53 CEST

It is interesting that mga6, mga7, mga7.1 and "mga8 rescue mode" can see the IDs and mount the disk. Just "mga8 release" does not work properly. Once booted I get the following information from blkid on "mga8 release":

NVME-disk (LVM):
/dev/nvme0n1p1: UUID="kz6fci-5f2f-n1S2-JxwP-ZvQR-5Y82-7oe5Jx" TYPE="LVM2_member" PARTUUID="4a1a36e4-9b07-441f-ab9c-6ff5091abb14"
/dev/mapper/vg_system-lv_root: UUID="70f16780-32cd-4edf-ae74-292573f97db3" BLOCK_SIZE="4096" TYPE="ext4"
/dev/mapper/vg_system-lv_opt: UUID="7161f2b2-43bd-45e8-9d92-a4e33d5723a6" BLOCK_SIZE="4096" TYPE="ext4"
/dev/mapper/vg_system-lv_var: UUID="bff2cdc8-6976-4a33-8b91-a1a67b9ea525" BLOCK_SIZE="4096" TYPE="ext4"
/dev/mapper/vg_system-lv_home: UUID="20f38aad-158c-4db4-95a5-71f9223a2bb8" BLOCK_SIZE="4096" TYPE="ext4"

Boot:
/dev/sdc1: UUID="1E7E-90C2" BLOCK_SIZE="512" TYPE="vfat" PARTUUID="2a75e560-2eaf-490a-ae5f-56d188cad02f"

Can it be a systemd/lvm2 timing problem / race condition? Maybe a mount attempt before all lvm volumes are activated?

Comment 3 Dave Hodgins 2021-05-05 09:07:42 CEST

What are the contents of /etc/lvm/lvm.conf and /etc/fstab?

CC: (none) => davidwhodgins

Comment 4 Markus Mertens 2021-05-05 09:19:05 CEST

Created attachment 12702 [details]
fstab

Comment 5 Markus Mertens 2021-05-05 09:19:35 CEST

Created attachment 12703 [details]
lvm.conf

Comment 6 Dave Hodgins 2021-05-05 09:45:04 CEST

Thanks. Nothing obvious there.

The only thing I can think of is that the upgrade failed with /root becoming
full. Can you get the output of " df -h|grep -v tmpfs" with everything mounted?

Comment 7 Markus Mertens 2021-05-05 10:03:45 CEST

#  df -h|grep -v tmpfs
Dateisystem                                      Größe Benutzt Verf. Verw% Eingehängt auf
/dev/mapper/vg_system-lv_root                     251G     24G  216G   10% /
/dev/mapper/vg_system-lv_opt                       63G     56G  4,0G   94% /opt
/dev/mapper/vg_system-lv_home                     251G    191G   61G   77% /home
/dev/mapper/vg_system-lv_var                      125G     42G   78G   35% /var
/dev/sdc4                                         2,1T    540G  1,5T   27% /data
/dev/sdc3                                         976M     71M  838M    8% /boot
/dev/sdc1                                        1022M    160K 1022M    1% /boot/EFI
/dev/sdh1                                         3,6T    1,7T  1,8T   49% /mnt/touro
amygdala-nfs:/data1                               345G    177G  150G   55% /data1
amygdala-nfs:/data2                               345G    234G   93G   72% /data2
//10.168.44.211/IMPORTSTUDIEN                     448G    327G  122G   73% /mnt/mcdimport
#

Comment 8 Thomas Backlund 2021-05-05 10:13:44 CEST

IIRC there was some other user hitting an issue like this earlier too...

for some reason some needed bit did not end up in initrd on upgrade, but we could not reprocuce it back then...

if you get the system up and running properly, please make a backup of the initrd, and recreate  the initrd, try to boot and see if it's fixed.

if it is, do an lsinitrd of the backup initrd and  the working initrd, and attach the outputs here so we can try to spot what part fails


so basically:

lsinitrd  /boot/old_initrd 2>&1 |tee old_initrd.log

lsinitrd  /boot/new_initrd 2>&1 |tee new_initrd.log

(change the "/boot/*_initrd"  to match the actual initrds)

and attach the *_initrd.log files here

Comment 9 Markus Mertens 2021-05-05 10:52:05 CEST

Creating a new initrd did not help. I just got a new rdsosreport with debug information. But it is too big to attach:

du -s /mnt/touro/mer/rdsosreport-debug.txt
3156	/mnt/touro/mer/rdsosreport-debug.txt

Is there a chance to reinstall dracut/lvm/systemd? I am using urpmi which does not provide such an option. And I do not dare to mix urpmi and dnf.

Comment 10 Aurelien Oudelet 2021-05-05 11:00:03 CEST

urpmi can reinstall packages with --reinstall switch available since Mageia 8:

urpmi --reinstall --force systemd dracut lvm

Comment 11 Markus Mertens 2021-05-05 12:21:56 CEST

Unfortunately, this did not help.

Comment 12 Lewis Smith 2021-05-07 20:04:31 CEST

Pity.
We are in tmb's hands for this, so assigning it to the kernel group.

Assignee: bugsquad => kernel

Comment 13 Marja Van Waes 2024-08-22 23:02:53 CEST

We stopped supporting Mageia 8 almost 8 months ago 
https://blog.mageia.org/en/2023/12/30/mageia-8-end-of-life/

That means we also stopped fixing Mageia 8 bugs and that this bug report needs to be closed, regardless of whether it was fixed for Mageia 8 or not.

If this particular bug did not get fixed for Mageia 8, then we do regret that.

If this issue is still present in Mageia 9 or cauldron, then please reopen this report, write a comment and adjust the "Version:" field.

If you are not yet a member of one or our teams, then please consider becoming one. https://wiki.mageia.org/en/Contributing
Mageia is a community project, meaning that we, the users, make Mageia together.

The more active contributors we have, the more bug reports will get fixed.
Besides, being active in a team can be very rewarding. It was and is certainly rewarding to me :-D

Status: NEW => RESOLVED
Resolution: (none) => OLD

Note You need to log in before you can comment on or make changes to this bug.