Bug 28592

Summary: after a successfull upgrade from mageia7 to mageia8, the system can't boot
Product: Mageia Reporter: peter lawford <petlaw726>
Component: InstallerAssignee: Mageia Bug Squad <bugsquad>
Status: RESOLVED OLD QA Contact:
Severity: major    
Priority: Normal CC: davidwhodgins, fri, kernel, ouaurelien
Version: 8   
Target Milestone: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Source RPM: CVE:
Status comment:
Attachments: screenshot
other screenshot
as required
return of lsinitrd /boot/initrd-5.10.20-desktop-2.mga8.img
return of lsinitrd /boot/initrd-5.10.27-server-1.mga8.img (dracut)
lsinitrd /boot/initrd-5.10.27-desktop-1.mga8.img (not dracut)

Description peter lawford 2021-03-13 22:44:14 CET
Description of problem: I have upgrade one of my mageia7 systems to mageia8 using the method provided by your migration guide: I successively ran:
1) rpm -qa --queryformat "%{NAME}-%{version}-%{RELEASE}-%{ARCH}\n" |grep i586 |grep devel
and removed all 32bits devel libs founded
2) urpmi.removemedia -a
   urpmi.addmedia --distrib --mirrorlist 'http://mirrors.mageia.org/api/mageia.8.$ARCH.list'
(I omit the intermediate steps)
3)urpmi --auto-update --auto --force --download-all --test
(/var/cache/urpmi/rpms was mounted on a huge partition > 17GB)
which returned "installation is possible"
4)urpmi --auto-update --auto --force --download-all (the same as above without --test) and after a couple of hours, 4302 on 4303 downloaded rpm packages were intalled (only icedtea-web-1.8.2-2.mga8 was not)

I consider that the upgrade was successful

but the system couldn't reboot: everything ran OK to the step "research of peripherals"
after, it indefinitely many times invited me to type Ctrl+D to continue
here attached 2 screenshots which shows what happened.

hence, migrating from mageia7 to mageia8 seems not to be possible



Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
Comment 1 peter lawford 2021-03-13 22:45:12 CET
Created attachment 12459 [details]
screenshot
Comment 2 peter lawford 2021-03-13 22:46:21 CET
Created attachment 12460 [details]
other screenshot
Comment 3 Dave Hodgins 2021-03-14 04:42:00 CET
We've done many tests that have worked, and some where they failed as noted in
https://bugs.mageia.org/showdependencytree.cgi?id=28393&hide_resolved=1
which are being worked on.

At that screen, please login using the root password, then run
"journalctl --no-hostname -b|grep -v 'audit:>journal.txt", and attach
that journal.txt file to this bug report.

CC: (none) => davidwhodgins

Comment 4 Morgan Leijström 2021-03-14 10:15:00 CET
Daves keyboard neglected to type one '.  Correct:

 journalctl --no-hostname -b|grep -v 'audit:'>journal.txt

( i.e journal from current boot without some details )

CC: (none) => fri

Comment 5 peter lawford 2021-03-14 13:25:50 CET
Created attachment 12461 [details]
as required
Comment 6 peter lawford 2021-03-14 13:26:22 CET
(In reply to Morgan Leijström from comment #4)
> Daves keyboard neglected to type one '.  Correct:
> 
>  journalctl --no-hostname -b|grep -v 'audit:'>journal.txt
> 
> ( i.e journal from current boot without some details )

here attached journal.txt
Comment 7 Aurelien Oudelet 2021-03-14 13:44:33 CET
Comment on attachment 12461 [details]
as required

mars 14 13:14:23 systemd[1]: dev-vgmageia-lvhomemga6\x2d64.device: Job dev-vgmageia-lvhomemga6\x2d64.device/start timed out.
mars 14 13:14:23 systemd[1]: Timed out waiting for device /dev/vgmageia/lvhomemga6-64.
mars 14 13:14:23 systemd[1]: Dependency failed for /home.
mars 14 13:14:23 systemd[1]: Dependency failed for Local File Systems.
mars 14 13:14:23 systemd[1]: local-fs.target: Job local-fs.target/start failed with result 'dependency'.
mars 14 13:14:23 systemd[1]: local-fs.target: Triggering OnFailure= dependencies.

Above are relevent lines. Thanks reporting this.
The system can't find your /home partition on /dev/vgmageia/lvhomemga6-64.

Also, there is this:
mars 14 13:13:02 kernel: EDAC sbridge: Seeking for: PCI ID 8086:6faf
mars 14 13:13:02 kernel: EDAC sbridge: CPU SrcID #0, Ha #0, Channel #0 has DIMMs, but ECC is disabled
mars 14 13:13:02 kernel: EDAC sbridge: Couldn't find mci handler
mars 14 13:13:02 kernel: EDAC sbridge: Failed to register device with error -19.

Seems kernel can't find a particular hardware.

Does a fully updated Mageia 7 system worked well?

CC: (none) => ouaurelien

Comment 8 Dave Hodgins 2021-03-14 14:38:50 CET
Adding kernel team to cc list due to Error Detection and Correction errors.

CC: (none) => kernel

Comment 9 Morgan Leijström 2021-03-14 16:34:30 CET
May be interesting to try to boot a Live USB.
Make it with persistence, so kernel etc can be updated.
Comment 10 Dave Hodgins 2021-03-14 17:23:26 CET
Note that rebooting a live iso with persistence after updating the kernel
will not use the updated kernel as the persistence file system is not opened
until after the kernel has started.
Comment 11 peter lawford 2021-03-14 18:08:19 CET
(In reply to Aurelien Oudelet from comment #7)
> Comment on attachment 12461 [details]
> as required
> 
> mars 14 13:14:23 systemd[1]: dev-vgmageia-lvhomemga6\x2d64.device: Job
> dev-vgmageia-lvhomemga6\x2d64.device/start timed out.
> mars 14 13:14:23 systemd[1]: Timed out waiting for device
> /dev/vgmageia/lvhomemga6-64.
> mars 14 13:14:23 systemd[1]: Dependency failed for /home.
> mars 14 13:14:23 systemd[1]: Dependency failed for Local File Systems.
> mars 14 13:14:23 systemd[1]: local-fs.target: Job local-fs.target/start
> failed with result 'dependency'.
> mars 14 13:14:23 systemd[1]: local-fs.target: Triggering OnFailure=
> dependencies.
> 
> Above are relevent lines. Thanks reporting this.
> The system can't find your /home partition on /dev/vgmageia/lvhomemga6-64.
> 
> Also, there is this:
> mars 14 13:13:02 kernel: EDAC sbridge: Seeking for: PCI ID 8086:6faf
> mars 14 13:13:02 kernel: EDAC sbridge: CPU SrcID #0, Ha #0, Channel #0 has
> DIMMs, but ECC is disabled
> mars 14 13:13:02 kernel: EDAC sbridge: Couldn't find mci handler
> mars 14 13:13:02 kernel: EDAC sbridge: Failed to register device with error
> -19.
> 
> Seems kernel can't find a particular hardware.
> 
> Does a fully updated Mageia 7 system worked well?

yes, but it needs time (about 1mn15s) to find /home partition; in fact, my systems ran for 15 years on an old stuff (mobo Gigabyte chipset Intel X58, socket LGA1366, cpu core i7 960, graphic nvidia 9800gtx+), I recently changed my stuff, which is more modern, but not up-to-date: mobo Asus ROG, chipset Intel X99 socket 2011-3, core i7 6800, 64 GB DDR4 ram, graphic nvidia gtx 1060, and it is since the stuff has changed that the kernel needs time to find /home partition 
on the old stuff I never seen at boot a problem with EDAC sbridge (sincerly speaking, I don't know what it is), but now it is displayed at each boot; nevertheless, all my mageia7 systems (I have 3, all updated) successfully boot: it takes a while (2 to 3mn), but they go all the way
Comment 12 Thomas Backlund 2021-03-14 19:03:50 CET
ignore the EDAC errors... 
it just means kernel detects hw that technically should support it but it's not really enabled (Intel market segmentation)... and no ecc memory installed

the real bug is why lvm is not properly activating soon enough...

maybe using "rootdelay=" option can help slow initial boot a bit so lvm has time to properly init...
Comment 13 peter lawford 2021-03-14 19:18:04 CET
(In reply to Thomas Backlund from comment #12)
> ignore the EDAC errors... 
> it just means kernel detects hw that technically should support it but it's
> not really enabled (Intel market segmentation)... and no ecc memory installed

thanks for explanation about EDAC
> 
> the real bug is why lvm is not properly activating soon enough...
> 
> maybe using "rootdelay=" option can help slow initial boot a bit so lvm has
> time to properly init...
is it possible to modify "rootdelay"? the problem is that once booted, my mga7 systems, and I hope soon mga8, do fine work; I agree to accept slow boots if I can migrate to mga8
Comment 14 peter lawford 2021-03-14 19:32:58 CET
(In reply to Thomas Backlund from comment #12)
> ignore the EDAC errors... 
> it just means kernel detects hw that technically should support it but it's
> not really enabled (Intel market segmentation)... and no ecc memory installed
> 
> the real bug is why lvm is not properly activating soon enough...
> 
> maybe using "rootdelay=" option can help slow initial boot a bit so lvm has
> time to properly init...

do you think that executing a dracut with option --add "lvm mdraid" could fix the problem?
Comment 15 Thomas Backlund 2021-03-14 19:37:45 CET
the need should be autodetected.

running "dracut -f" should list all bits it detects / adds
Comment 16 Dave Hodgins 2021-03-14 20:33:58 CET
So the only important lines from the journal are ...
mars 14 13:12:54
mars 14 13:14:23 systemd[1]: dev-vgmageia-lvhomemga6\x2d64.device: Job dev-vgmageia-lvhomemga6\x2d64.device/start timed out.
mars 14 13:14:23 systemd[1]: Timed out waiting for device /dev/vgmageia/lvhomemga6-64.
mars 14 13:14:23 systemd[1]: Dependency failed for /home.

$ grep DefaultTimeoutStartSec /etc/systemd/system.conf 
#DefaultTimeoutStartSec=90s

I'm not sure if changing the DefaultTimeoutStartSec in the system.conf file
is enough, but worth trying increasing that. Also I'd try adding the
option x-systemd.mount-timeout=infinity to the entry for /home in /etc/fstab.
Comment 17 Dave Hodgins 2021-03-14 20:48:26 CET
I don't think rootdelay will help as the root filesystem was mounted ok.
mars 14 13:12:54 dracut: Mounted root filesystem /dev/mapper/vgmageia-lvrootmga6--64
mars 14 13:12:54 dracut: Switching root
Comment 18 peter lawford 2021-03-14 23:19:37 CET
(In reply to Morgan Leijström from comment #9)
> May be interesting to try to boot a Live USB.
> Make it with persistence, so kernel etc can be updated.

Live USB boots very quickly (if use USB3.1) without any problem
Comment 19 peter lawford 2021-03-15 16:00:27 CET
solved! after running a dracut from one another system, mga6-64 boots (very quickly) on mageia8
Comment 20 Thomas Backlund 2021-03-15 16:19:50 CET
that sounds like we miss something during initrd creation.

do you have more than one 5.10 series initrd in /boot ?

if so it would be nice to get an lsinitrd of both slow-booting initrd and the one that boots nicely
Comment 21 peter lawford 2021-03-15 16:25:13 CET
(In reply to Thomas Backlund from comment #20)
> that sounds like we miss something during initrd creation.
> 
> do you have more than one 5.10 series initrd in /boot ?
> 
> if so it would be nice to get an lsinitrd of both slow-booting initrd and
> the one that boots nicely

no, I have only initrd-5.10.20-<desktop,server>-2.mga8.img
Comment 22 Thomas Backlund 2021-03-15 16:31:24 CET
(In reply to peter lawford from comment #21)
> (In reply to Thomas Backlund from comment #20)
> > that sounds like we miss something during initrd creation.
> > 
> > do you have more than one 5.10 series initrd in /boot ?
> > 
> > if so it would be nice to get an lsinitrd of both slow-booting initrd and
> > the one that boots nicely
> 
> no, I have only initrd-5.10.20-<desktop,server>-2.mga8.img

well, then you have 2 :)

if you only re-created one of them in comment 19, then we should be able to spot what's missing.

if that's the case, please do lsinitrd on both so we can compare them
Comment 23 peter lawford 2021-03-15 16:51:18 CET
(In reply to Thomas Backlund from comment #22)
> (In reply to peter lawford from comment #21)
> > (In reply to Thomas Backlund from comment #20)
> > > that sounds like we miss something during initrd creation.
> > > 
> > > do you have more than one 5.10 series initrd in /boot ?
> > > 
> > > if so it would be nice to get an lsinitrd of both slow-booting initrd and
> > > the one that boots nicely
> > 
> > no, I have only initrd-5.10.20-<desktop,server>-2.mga8.img
> 
> well, then you have 2 :)
> 
> if you only re-created one of them in comment 19, then we should be able to
> spot what's missing.
> 
> if that's the case, please do lsinitrd on both so we can compare them

unfortunately, too late! some minutes ago, I ran dracut, from the system itself (and not from one another system using chroot),  with --mdadmconf as option on both kernels, because I have remarked that in the return of "cat /proc/mdstat", numbers of my raid volumes (/dev/mdxxx) were wrong:
[alain4@mga6-64 ~]$ cat /proc/mdstat 
Personalities : [raid6] [raid5] [raid4] 
md126 : active raid5 sdd7[4] sdc7[7] sda7[6] sdb7[5]
      1006239744 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
      bitmap: 1/3 pages [4KB], 65536KB chunk

md128 : active raid5 sdg6[2] sdf6[1] sde6[0] sdh6[4]
      4026138624 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
      bitmap: 0/10 pages [0KB], 65536KB chunk

md127 : active raid5 sdc6[2] sdd6[4] sda6[5] sdb6[1]
      157188096 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
      
md122 : active raid5 sdg3[2] sdf3[1] sde3[0] sdh3[4]
      94322688 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
      
unused devices: <none>
which are right numbers; prior running dracut and rebooting, they were /dev/md<124,125,126,127> (which seems to be more logical)
but it could be useful, I attach the return of lsinitrd /boot/initrd-5.10.20-desktop-2.mga8.img
Comment 24 peter lawford 2021-03-15 16:54:43 CET
Created attachment 12468 [details]
return of lsinitrd /boot/initrd-5.10.20-desktop-2.mga8.img
Comment 25 peter lawford 2021-03-15 16:57:04 CET
it's the actual initrd; lsinitrd /boot/initrd-5.10.20-server-2.mga8.img returns a similar result
Comment 26 peter lawford 2021-03-15 19:47:29 CET
(In reply to Thomas Backlund from comment #22)
> (In reply to peter lawford from comment #21)
> > (In reply to Thomas Backlund from comment #20)
> > > that sounds like we miss something during initrd creation.
> > > 
> > > do you have more than one 5.10 series initrd in /boot ?
> > > 
> > > if so it would be nice to get an lsinitrd of both slow-booting initrd and
> > > the one that boots nicely
> > 
> > no, I have only initrd-5.10.20-<desktop,server>-2.mga8.img
> 
> well, then you have 2 :)
> 
> if you only re-created one of them in comment 19, then we should be able to
> spot what's missing.
> 
> if that's the case, please do lsinitrd on both so we can compare them

if it could really help you, I have to migrate one another system from mageia7 to 8, and I could run dracut on only one of the 2 kernels, and see the difference between the 2 initrd's (server and desktop) thanks to lsinitrd, but this will take a large amount of time, and I won't do it today; I ask you to wait a bit
Comment 27 Morgan Leijström 2021-03-15 21:34:32 CET
(In reply to Dave Hodgins from comment #10)
> Note that rebooting a live iso with persistence after updating the kernel
> will not use the updated kernel as the persistence file system is not opened
> until after the kernel has started.

As long as the persistence is not encrypted, the new kernel will get used :)
Comment 28 Aurelien Oudelet 2021-03-21 16:21:38 CET
Status?

Status: NEW => NEEDINFO

Comment 29 Aurelien Oudelet 2021-04-06 20:21:28 CEST
Since there are insufficient details provided in this report for us to investigate the issue further, and we have not received feedback to the information we have requested above, we will assume the problem was not reproducible, or has been fixed in one of the updates we have released for the reporter's distribution.

Users who have experienced this problem are encouraged to upgrade to the latest update of their distribution, and if this issue turns out to still be reproducible in the latest update, please reopen this bug with additional information.

Closing as OLD.

Resolution: (none) => OLD
Status: NEEDINFO => RESOLVED

Comment 30 peter lawford 2021-04-09 15:17:07 CEST
(In reply to Thomas Backlund from comment #22)
> (In reply to peter lawford from comment #21)
> > (In reply to Thomas Backlund from comment #20)
> > > that sounds like we miss something during initrd creation.
> > > 
> > > do you have more than one 5.10 series initrd in /boot ?
> > > 
> > > if so it would be nice to get an lsinitrd of both slow-booting initrd and
> > > the one that boots nicely
> > 
> > no, I have only initrd-5.10.20-<desktop,server>-2.mga8.img
> 
> well, then you have 2 :)
> 
> if you only re-created one of them in comment 19, then we should be able to
> spot what's missing.
> 
> if that's the case, please do lsinitrd on both so we can compare them

Hi! I'm back to this bug; yesterday I have upgraded to 8 one my mga7 system, and I have "dracut" only initrd-5.10.27-server-1.mga8.img but not initrd-desktop-5.10.27-1.mga8.img
here attached the 2 returns of lsinitrd as you wished
Comment 31 peter lawford 2021-04-09 15:18:33 CEST
Created attachment 12605 [details]
return of lsinitrd /boot/initrd-5.10.27-server-1.mga8.img (dracut)
Comment 32 peter lawford 2021-04-09 15:19:55 CEST
Created attachment 12606 [details]
lsinitrd /boot/initrd-5.10.27-desktop-1.mga8.img (not dracut)