Bug 19131

Summary: Kernel 4.4.16 crashes on boot (matching initrd img not found)
Product: Mageia Reporter: Nathan Owens <pianocomp81>
Component: RPM PackagesAssignee: Kernel and Drivers maintainers <kernel>
Status: RESOLVED OLD QA Contact:
Severity: critical    
Priority: Normal CC: mageia, marja11, yvesbrungard
Version: 5   
Target Milestone: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Source RPM: kernel-desktop-4.4.16-1.mga5-1-1.mga5, dracut CVE:
Status comment:
Attachments: Install/Update of glibc and kernel on August 1, 2016
Subsequent install and reboot on August 3, 2016
Journal sequence updating kernel to 4.4.50

Description Nathan Owens 2016-08-04 15:32:01 CEST
Description of problem:
I updated from kernel-desktop-4.4.13-1.mga5 to kernel-desktop-4.4.16-1.mga5 via the kernel-desktop-latest package. After rebooting, the kernel failed to crash. Grub2 also indicated that it couldn't find the initrd image.

How reproducible:
Every single time I booted. 4.4.13 still boots without issue (it was still installed). I had just installed all the latest RPM updates as of August 3 between 6PM and 7PM EDT.

I'm not sure how to get a log of what actually failed, since it didn't even get to a prompt. If someone gives me some tips on how to do that, I'll upload what I can.

I'm running:
- CPU: Intel(R) Core(TM) i5-4690 CPU @ 3.50GHz
- Motherboard: GIGABYTE GA-Z97X-SLI
- SSD: SanDisk SDSSDP12
- USB-attached hard drive: NA52ZCFB
Nathan Owens 2016-08-04 15:32:18 CEST

Summary: New kernel crashes on boot => New kernel 4.4.16 crashes on boot

Nathan Owens 2016-08-04 15:32:31 CEST

Summary: New kernel 4.4.16 crashes on boot => Kernel 4.4.16 crashes on boot

Comment 1 Nathan Owens 2016-08-04 15:33:24 CEST
And I can't type...

"After rebooting, the kernel crashed", not "failed to crash".

I didn't even get to a shell.
Comment 2 Nathan Owens 2016-08-04 15:54:52 CEST
So... now I can reproduce it. After I uninstalled all 4.4.16-related packages (kernel-desktop-latest, kernel-desktop-4.4.16, kernel-desktop-devel-latest, and kernel-desktop-devel-4.4.16), then re-installed them, it worked just fine.


I'm wondering if the initrd link in /boot wasn't setup correctly the first time it got installed. Could updating glibc at the same time as the kernel without rebooting cause a problem? Both were previously updated at the same time. This time, glibc was already installed.
Comment 3 Marja Van Waes 2016-08-04 16:25:11 CEST
(In reply to Nathan Owens from comment #2)
> So... now I can reproduce it.

do you mean âcan't â?

> After I uninstalled all 4.4.16-related
> packages (kernel-desktop-latest, kernel-desktop-4.4.16,
> kernel-desktop-devel-latest, and kernel-desktop-devel-4.4.16), then
> re-installed them, it worked just fine.
> 
> 
> I'm wondering if the initrd link in /boot wasn't setup correctly the first
> time it got installed. Could updating glibc at the same time as the kernel
> without rebooting cause a problem? Both were previously updated at the same
> time. This time, glibc was already installed.

if before, grub2 couldn't find an initrd image, then the logs from when you installed kernel-4.4.16 for the first time would be interesting to have.

please run, as root, while, if needed, adjusting 'since=' and 'until=' to date + times before and after the update respectively:
 
  journalctl -a since="2016-08-03 18:00" until="2016-08-03 19:00" > output.txt

and attach output.txt to this bug report

Keywords: (none) => NEEDINFO
CC: (none) => marja11
Assignee: bugsquad => tmb

Comment 4 Nathan Owens 2016-08-04 21:24:42 CEST
Thanks for showing me how to view my logs (haven't had to do that since we switched to systemd - shows how well the system's been working).

After looking at the logs, it appears the order of events was:
1) Install glibc-6:2.20-23 and kernel 4.4.16-1 on August 1 (Monday, not Wednesday August 3)
2) Forget to reboot (got busy doing other things I guess)
3) Run more updates on August 3 (not related to glibc or the kernel)
4) Get the reminder to reboot
5) Reboot, and fail multiple times
6) Eventually realize kernel 4.4.13 is still installed, and so boot 4.4.13 20 minutes later.

I've attached the journalctl logs from the install of glibc and the kernel on August 1, as well as the "subsequent" install from Step 3 and the reboot. There are no logs for the failed reboots (I don't think it got that far, and I don't see it in journalctl).
Comment 5 Nathan Owens 2016-08-04 21:25:18 CEST
Created attachment 8307 [details]
Install/Update of glibc and kernel on August 1, 2016
Comment 6 Nathan Owens 2016-08-04 21:28:47 CEST
Created attachment 8308 [details]
Subsequent install and reboot on August 3, 2016
Comment 7 Marja Van Waes 2016-08-04 22:36:05 CEST
(In reply to Nathan Owens from comment #5)
> Created attachment 8307 [details]
> Install/Update of glibc and kernel on August 1, 2016

I see this line twice:

running: mkinitrd -f /boot/initrd-4.4.16-desktop-1.mga5.img 4.4.16-desktop-1.mga5

and that /boot/vmlinuz-4.4.16-desktop-1.mga5 was created and later found by update-grub2.

However, the matching initrd-4.4.16-desktop-1.mga5.img is _not_ mentioned as being found.



(In reply to Nathan Owens from comment #2)

> 
> 
> I'm wondering if the initrd link in /boot wasn't setup correctly the first
> time it got installed. Could updating glibc at the same time as the kernel
> without rebooting cause a problem? 

Thomas and Colin will know whether that's a possible cause.

CC'ing dracut maintainer

Keywords: NEEDINFO => (none)
CC: (none) => mageia
Summary: Kernel 4.4.16 crashes on boot => Kernel 4.4.16 crashes on boot (matching initrd img not found)
Source RPM: kernel-desktop-4.4.16-1.mga5-1-1.mga5 => kernel-desktop-4.4.16-1.mga5-1-1.mga5, dracut

Comment 8 Marja Van Waes 2016-08-26 11:42:39 CEST
Mass-reassigning all bugs with "kernel" in the Source RPM field that are assigned to tmb, to the kernel packagers group, because tmb is currently MIA.

Assignee: tmb => kernel

Comment 9 Marja Van Waes 2016-10-21 10:38:01 CEST
On QA ml, lebarhon reported that a user in the French forums reported hitting this issue twice, with the 4.4.22 and with the 4.4.26 kernel

http://www.mageialinux-online.org/forum/topic-22767.php#m219140
Comment 10 David Walser 2016-10-21 14:22:51 CEST
Usually when it fails to generate the initrd it's because of a lack of disk space in the /boot or / partition.
Comment 11 Nathan Owens 2016-10-25 03:06:10 CEST
My /boot directory is on my / partition, and it has 5.0G free and has never been close to the limit, so that wasn't the problem for me.

I just installed the 4.4.26 kernel and it worked just fine (I checked before rebooting).

Could the package be made to fail/revert to the old kernel if the initrd fails to generate for these intermittent cases (if the root cause can't be determined - I haven't reproduced it myself)? That would at least make the system bootable by-default. Maybe even just not update the symlinks until it worked, and then throw an error to the user?
Comment 12 papoteur 2017-03-22 09:17:45 CET
I just encountered the problem today with the update to 4.4.50 kernel.
The /boot/initrd-desktop.img point to .boot/initrd-4.4.50-desktop-2.mga5.img which doesn't exist.
The boot ended with a kernel panic.
Rebooting with the previous kernel, I executed in /boot:
 mkinitrd initrd-4.4.50-desktop-2.mga5.img 
which worked.
I have 2 luks partitions, one for /home and a second on an external drive.
The problem is also similar to:
https://forums.mageia.org/en/viewtopic.php?f=7&t=11464
I have 1 G free space on /
/boot is not separated partition.
I will join the journal sequence during update.

CC: (none) => yves.brungard_mageia

Comment 13 papoteur 2017-03-22 09:19:02 CET
Created attachment 9134 [details]
Journal sequence updating kernel to 4.4.50
Comment 14 Marja Van Waes 2018-07-12 16:10:04 CEST
Did anyone hit this issue in Mageia 6?

If so, please tell so and change the Version of this report (near the top of this screen, on the left) to 6

@ Nathan

==> If you didn't reset your password after february 2018, then you'll need to reset it here https://identity.mageia.org/forgot_password to be able to log in and comment in this report. <==
Comment 15 Nathan Owens 2018-07-13 03:03:15 CEST
I haven't seen it happen.
Comment 16 Marja Van Waes 2018-10-07 17:08:41 CEST
(In reply to Nathan Owens from comment #15)
> I haven't seen it happen.

Thanks for the feedback.

Closing this report as OLD, then, because Mageia 5 is no longer supported and Mageia 6 doesn't seem to be affected.

Status: NEW => RESOLVED
Resolution: (none) => OLD