Bug 28875 - after upgrade from mga7.1 mga8 does not boot from lvm on nvme disk
Summary: after upgrade from mga7.1 mga8 does not boot from lvm on nvme disk
Status: NEW
Alias: None
Product: Mageia
Classification: Unclassified
Component: Installer (show other bugs)
Version: 8
Hardware: x86_64 Linux
Priority: High critical
Target Milestone: ---
Assignee: Kernel and Drivers maintainers
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-05-04 16:09 CEST by Markus Mertens
Modified: 2021-05-07 20:04 CEST (History)
2 users (show)

See Also:
Source RPM:
CVE:
Status comment:


Attachments
rdsosreport without debug information (115.55 KB, text/plain)
2021-05-04 16:09 CEST, Markus Mertens
Details
fstab (1.98 KB, text/x-matlab)
2021-05-05 09:19 CEST, Markus Mertens
Details
lvm.conf (100.06 KB, text/plain)
2021-05-05 09:19 CEST, Markus Mertens
Details

Description Markus Mertens 2021-05-04 16:09:38 CEST
Created attachment 12701 [details]
rdsosreport without debug information

Description of problem:

After upgrade from mga7.1 to mga8 the system cannot boot directly anymore. I have to start in rescue mode, login as root and enter "init 5" to complete the boot process. From 4 lvm-volumes on the main nvme-disk only one is activated early enough to be mounted. The other three lvm-volumes remain inactive. The /boot/efi partition resides on a SAS-disk, not on the nvme-disk.

From fstab:
/dev/vg_system/lv_root  /      ext4  rw,relatime,acl  1 1
/dev/vg_system/lv_home  /home  ext4  rw,relatime,acl  1 2
/dev/vg_system/lv_opt   /opt   ext4  rw,relatime,acl  1 2
/dev/vg_system/lv_var   /var   ext4  rw,relatime,acl  1 2

UUID=1E7E-90C2          /boot/EFI  vfat  umask=000,iocharset=utf8  0 0

What is different for lvm in rescue mode so that all volumes are activated and can be mounted? What has changed from mga7.1 to mga8? It was always working since mga6.


How reproducible:

On my system always. I produced an rdsosreport file with debug information, but that one is too big. So I attached a normal rdsosreport.
Markus Mertens 2021-05-04 16:10:19 CEST

Priority: Normal => High

Comment 1 Aurelien Oudelet 2021-05-04 16:58:26 CEST
relevant lines:

[    4.036482] nvme nvme0: pci function 0000:03:00.0
[    4.049191] nvme nvme0: 7/0/0 default/read/poll queues
[    4.053366]  nvme0n1: p1

[  161.878470] dracut: Scanning devices nvme0n1p1  for LVM logical volumes vg_system/lv_root
[  161.917604] dracut: inactive '/dev/vg_system/lv_opt' [64.00 GiB] inherit
[  161.917743] dracut: inactive '/dev/vg_system/lv_var' [128.00 GiB] inherit
[  161.917861] dracut: ACTIVE '/dev/vg_system/lv_root' [256.00 GiB] inherit
[  161.917973] dracut: inactive '/dev/vg_system/lv_home' [256.00 GiB] inherit
[  237.134115] dracut Warning: Could not boot.
[  237.144105] dracut Warning: /dev/disk/by-uuid/a2ba326c-21f0-447a-866f-cbfca4d4ac91 does not exist

It seems to be seen by the Kernel, but the uuid not found, moreover, the lvm signature seems missing from your nvme0n1p1 partition.

CC: (none) => ouaurelien

Comment 2 Markus Mertens 2021-05-05 08:45:53 CEST
It is interesting that mga6, mga7, mga7.1 and "mga8 rescue mode" can see the IDs and mount the disk. Just "mga8 release" does not work properly. Once booted I get the following information from blkid on "mga8 release":

NVME-disk (LVM):
/dev/nvme0n1p1: UUID="kz6fci-5f2f-n1S2-JxwP-ZvQR-5Y82-7oe5Jx" TYPE="LVM2_member" PARTUUID="4a1a36e4-9b07-441f-ab9c-6ff5091abb14"
/dev/mapper/vg_system-lv_root: UUID="70f16780-32cd-4edf-ae74-292573f97db3" BLOCK_SIZE="4096" TYPE="ext4"
/dev/mapper/vg_system-lv_opt: UUID="7161f2b2-43bd-45e8-9d92-a4e33d5723a6" BLOCK_SIZE="4096" TYPE="ext4"
/dev/mapper/vg_system-lv_var: UUID="bff2cdc8-6976-4a33-8b91-a1a67b9ea525" BLOCK_SIZE="4096" TYPE="ext4"
/dev/mapper/vg_system-lv_home: UUID="20f38aad-158c-4db4-95a5-71f9223a2bb8" BLOCK_SIZE="4096" TYPE="ext4"

Boot:
/dev/sdc1: UUID="1E7E-90C2" BLOCK_SIZE="512" TYPE="vfat" PARTUUID="2a75e560-2eaf-490a-ae5f-56d188cad02f"

Can it be a systemd/lvm2 timing problem / race condition? Maybe a mount attempt before all lvm volumes are activated?
Comment 3 Dave Hodgins 2021-05-05 09:07:42 CEST
What are the contents of /etc/lvm/lvm.conf and /etc/fstab?

CC: (none) => davidwhodgins

Comment 4 Markus Mertens 2021-05-05 09:19:05 CEST
Created attachment 12702 [details]
fstab
Comment 5 Markus Mertens 2021-05-05 09:19:35 CEST
Created attachment 12703 [details]
lvm.conf
Comment 6 Dave Hodgins 2021-05-05 09:45:04 CEST
Thanks. Nothing obvious there.

The only thing I can think of is that the upgrade failed with /root becoming
full. Can you get the output of " df -h|grep -v tmpfs" with everything mounted?
Comment 7 Markus Mertens 2021-05-05 10:03:45 CEST
#  df -h|grep -v tmpfs
Dateisystem                                      Größe Benutzt Verf. Verw% Eingehängt auf
/dev/mapper/vg_system-lv_root                     251G     24G  216G   10% /
/dev/mapper/vg_system-lv_opt                       63G     56G  4,0G   94% /opt
/dev/mapper/vg_system-lv_home                     251G    191G   61G   77% /home
/dev/mapper/vg_system-lv_var                      125G     42G   78G   35% /var
/dev/sdc4                                         2,1T    540G  1,5T   27% /data
/dev/sdc3                                         976M     71M  838M    8% /boot
/dev/sdc1                                        1022M    160K 1022M    1% /boot/EFI
/dev/sdh1                                         3,6T    1,7T  1,8T   49% /mnt/touro
amygdala-nfs:/data1                               345G    177G  150G   55% /data1
amygdala-nfs:/data2                               345G    234G   93G   72% /data2
//10.168.44.211/IMPORTSTUDIEN                     448G    327G  122G   73% /mnt/mcdimport
#
Comment 8 Thomas Backlund 2021-05-05 10:13:44 CEST
IIRC there was some other user hitting an issue like this earlier too...

for some reason some needed bit did not end up in initrd on upgrade, but we could not reprocuce it back then...

if you get the system up and running properly, please make a backup of the initrd, and recreate  the initrd, try to boot and see if it's fixed.

if it is, do an lsinitrd of the backup initrd and  the working initrd, and attach the outputs here so we can try to spot what part fails


so basically:

lsinitrd  /boot/old_initrd 2>&1 |tee old_initrd.log

lsinitrd  /boot/new_initrd 2>&1 |tee new_initrd.log

(change the "/boot/*_initrd"  to match the actual initrds)

and attach the *_initrd.log files here
Comment 9 Markus Mertens 2021-05-05 10:52:05 CEST
Creating a new initrd did not help. I just got a new rdsosreport with debug information. But it is too big to attach:

du -s /mnt/touro/mer/rdsosreport-debug.txt
3156	/mnt/touro/mer/rdsosreport-debug.txt

Is there a chance to reinstall dracut/lvm/systemd? I am using urpmi which does not provide such an option. And I do not dare to mix urpmi and dnf.
Comment 10 Aurelien Oudelet 2021-05-05 11:00:03 CEST
urpmi can reinstall packages with --reinstall switch available since Mageia 8:

urpmi --reinstall --force systemd dracut lvm
Comment 11 Markus Mertens 2021-05-05 12:21:56 CEST
Unfortunately, this did not help.
Comment 12 Lewis Smith 2021-05-07 20:04:31 CEST
Pity.
We are in tmb's hands for this, so assigning it to the kernel group.

Assignee: bugsquad => kernel


Note You need to log in before you can comment on or make changes to this bug.