Bug 18915 - urpmi --auto-update thinks a failed kernel install succeeded (/boot full)
Summary: urpmi --auto-update thinks a failed kernel install succeeded (/boot full)
Status: NEW
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: Cauldron
Hardware: x86_64 Linux
Priority: Normal normal
Target Milestone: ---
Assignee: Kernel and Drivers maintainers
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-07-10 22:41 CEST by Morgan Leijström
Modified: 2016-10-17 17:12 CEST (History)
7 users (show)

See Also:
Source RPM: dracut, kernel, urpmi-8.102-2.mga6.src.rpm
CVE:
Status comment:


Attachments
Output in Konsole where urpmi --update ran (16.32 KB, text/plain)
2016-07-10 22:42 CEST, Morgan Leijström
Details

Description Morgan Leijström 2016-07-10 22:41:09 CEST
See attached text copy from Konsole:


1) I ran urpmi --auto-update, which wanted new kernel 4.7.0-0.rc6.3.mga6 (and other packages)


2) during that it ran dracut, which failed:
dracut: *** Creating image file '/boot/initrd-4.7.0-desktop-0.rc6.3.mga6.img' ***
cp: error writing '/boot/initrd-4.7.0-desktop-0.rc6.3.mga6.img': No space left on device
dracut: dracut: creation of /boot/initrd-4.7.0-desktop-0.rc6.3.mga6.img failed


3) urpmi continued and ended with saying user should reboot the computer for the new kernel 4.7.0-0.rc6.3.mga6  (despite it was not installed completely!)


4) immediately again running urpmi --auto-update : it find no upgrades, which also is a sign that it think the install suceeded. 


5) On reboot the old kernel is used, grub boot alternatives do not contain the failed kernel. (correct behaviour as the install of new failed) 


Details of system: 64-bit, earlier upgraded from mga5, grub (not grub2) /boot is a separate partition sda1.
Comment 1 Morgan Leijström 2016-07-10 22:42:45 CEST
Created attachment 8154 [details]
Output in Konsole where urpmi --update ran
Comment 2 Thierry Vignaud 2016-07-11 11:40:40 CEST
Well urpmi will only reports scriptlet errors if librpm can reports them, meanng the kernel's postscriptlet & dracut should report the errors
Anyway, scriptlet errors are not fatal.

How many kernels were installed?
I guess we could try to always enforce there's no more than 3 kernels from the same flavor at most.

CC: (none) => mageia, thierry.vignaud, tmb
Source RPM: urpmi-8.102-2.mga6.src.rpm => dracut, kernel, urpmi-8.102-2.mga6.src.rpm

Comment 3 Morgan Leijström 2016-07-12 02:43:18 CEST
/boot is only 113MB 
  ( a little small i admit, but it is on a small SSD in an old but swift laptop, nowadays when /boot can be inside LVM i should repartition for that next install.)

There are now three kernels installed
For some reason the initrd-4.7.0-desktop-0.rc6.2.mga6.img is 26,1 M, while other *.img are 11,6.

partition have only 16 MB free, but i have cleaned a couple *.img old, and have one left there.  what are they for?
Comment 4 w unruh 2016-07-12 19:27:09 CEST
So there seem to be three bugs here. 
a) With a tiny /boot, it is crucial that, once a new kernel is installed and tested and found to work, the older kernels should be removed to keep enough room. (User bug)
(Note, you do NOT need a separate /boot AFAIK. Thus had /boot just been part of 
/, you would have had enough room).

b) rpm reported the new kernel as installed when dracut's making of initrd failed. That is indeed a bug in my opinion.

c) the new initrd is huge. My typical initrd is 10MB. I sure hope that the new kernels do not typically have such large initrd files.

The initrd files are critical to booting and removing them will make their associated kernels unbootable. systemd now demands that all directories including inside /usr/{lib,bin} be mounted at boot and grub2 tells the bootloader to load up initrd first before loading the kernel. It is initrd that mounts the directories, and all the required modules. 
If you get rid of the initrd you should rather erase the old kernels (urpme) 
But you should always keep at least one old kernel when you install a new one, in case the new one has a bug which makes it inoperable on your system. 
Especially if you are playing with cauldron. I certainly would not advise you to play with cauldron on an "old but swift laptop" with limited space.

CC: (none) => unruh

Comment 5 Thierry Vignaud 2016-07-12 22:08:41 CEST
urpme --auto-orphans does that automatically.

As for huge initrd, make sure you don't have altered hostonly setting in /etc/dracut.conf.d/50-mageia.conf
You can also remove ahci from it you don't need it.
You can uninstall stuff you don't need (if you really don't need them: kernel-firmware{,-nonfree}, biosdevname, ...)
Comment 6 Samuel Verschelde 2016-07-15 16:53:05 CEST
(In reply to w unruh from comment #4)
> So there seem to be three bugs here. 
> a) With a tiny /boot, it is crucial that, once a new kernel is installed and
> tested and found to work, the older kernels should be removed to keep enough
> room. (User bug)
> (Note, you do NOT need a separate /boot AFAIK. Thus had /boot just been part
> of 
> /, you would have had enough room).
> 
> b) rpm reported the new kernel as installed when dracut's making of initrd
> failed. That is indeed a bug in my opinion.
> 
> c) the new initrd is huge. My typical initrd is 10MB. I sure hope that the
> new kernels do not typically have such large initrd files.
> 

I suggest to keep this bug report about point b), which is what Morgan reported first. Other items can be discussed in a different bug report if needed.
Comment 7 Morgan Leijström 2016-07-15 18:41:12 CEST
Yep b is the bug.  It may be two part:

1) last line it said was telling the user to reboot for the new kernel, while it did not install a new one.

2) when i ran urpmi --auto-update again, it considered the latest kernel already installed.


a is my fault, being old style (having /boot outside LVM which did not work years ago, in combunation with too small /boot, induced by using an old small SSD) 

for c i note that when using Nvidia proprietary driver, the initrd grows quite a bit.  The big initrd-4.7.0-desktop-0.rc6.2.mga6.img was probably a transient now old bug.  I have never fiddled with any settings for initrd here manually.

Notes:

1) Regarding automatic deletion: For most end users it would be good if max say 3 kernels are kept, rest deleted. (before kernel install delete all but the two latest.  For best reliability make sure to keep the running kernel - in case there ahve been problems with the latest two.  and/or during cauldron do not delete kernels automatically.  On systems where i have large /boot or no separate /boot i like to keep several kernels in cauldron for test purposes.

2) Inform
It wopuld be good if there are some info on install notes and/or installer partitionning step about approximately how big /boot need be depending on number of kernels and drivers
Comment 8 Marja Van Waes 2016-07-16 18:20:45 CEST
Tbh, I wouldn't want an _installed_ kernel to show up as _not_ installed, just because dracut failed to create a matching initrd image.

Immediately after creating enough space on your /boot partition, it would have been possible to manually create the needed initrd image, with (as root):

  dracut initrd-4.7.0-desktop-0.rc6.3.mga6.img 4.7.0-desktop-0.rc6.3.mga6


because the kernel itself was correctly installed.

CC: (none) => marja11

Comment 9 w unruh 2016-07-16 19:25:14 CEST
The problem probably is that that kernel may be unbootable because of the missing initrd, confusing the customer. Is there any way of issuing an warning in that case hinting what needs to be done?
Comment 10 Morgan Leijström 2016-07-16 22:45:07 CEST
The good thing is that there is not an entry in grub menu for the kernel with the missing initrd.

When this happens i think the easiest way for normal users is to uninstall the latest kernel, and install it again (after also having installed some elder kernel) (One trick not to forget though then is to again install *-latest, because it is uninstalled when latest kernel is uninstalled...)


Anyhow, having this small /boot is pretty rare, and mostly set up by experienced users that can sort it out.  Plus a few users who do not know.


This problem/bug "b" will be worked around - at least hit much less if there is information on recommended min /boot partition, and automatic deletion of old kernels implemented.
Comment 11 Marja Van Waes 2016-10-17 11:16:12 CEST
(In reply to Samuel Verschelde from comment #6)
> (In reply to w unruh from comment #4)

> > 
> > b) rpm reported the new kernel as installed when dracut's making of initrd
> > failed. That is indeed a bug in my opinion.
> > 

> > 
> 
> I suggest to keep this bug report about point b), which is what Morgan
> reported first. Other items can be discussed in a different bug report if
> needed.


I don't agree, the kernel was installed, so rpm _must_ mark it as installed.

However, I do agree that the message to reboot for the new kernel is misleading when the initrd creation failed.

Assigning to the kernel maintainers, who might have a better solution for such cases.

Assignee: bugsquad => kernel

Comment 12 Marja Van Waes 2016-10-17 16:32:28 CEST
(In reply to Marja van Waes from comment #11)

> 
> 
> I don't agree, the kernel was installed, so rpm _must_ mark it as installed.
> 
> However, I do agree that the message to reboot for the new kernel is
> misleading when the initrd creation failed.

IINM, that message comes from
http://gitweb.mageia.org/software/rpm/urpmi/tree/urpm/sys.pm#n235

but I don't know whether it would be easy to expand that message in such cases with an advice to first create the needed initrd image

CC'ing mageiatools maintainers

> 
> Assigning to the kernel maintainers, who might have a better solution for
> such cases.

CC: (none) => mageiatools

Comment 13 w unruh 2016-10-17 16:47:21 CEST
At least the default kernel for booting should not be changed to the just installed kernel (/boot/vmlinuz should not point to the new kernel, since presumably the new initrd does not exist). 

Ie, messages are important, but behaviour is still more important.

I assume that if there was a breakdown in kernel installation, the links are not changed and grub/lilo is not changed. Presumably the same should happen if initrd was not created but should have been.
Comment 14 Pascal Terjan 2016-10-17 16:56:26 CEST
I would have believed the fixed would need to go to installkernel or bootloader-config but it doesn't seem to be the case.

The post calls /sbin/installkernel but then updates some of the links itself, I guess it would be  matter of failing if installkernel fails (and failing in installkernel if bootloader-config fails).

CC: (none) => pterjan

Comment 15 Thomas Backlund 2016-10-17 17:12:54 CEST
Yeah, there was (is?) a bug where installkernel failed (fails?) to update the kernel symlinks even if initrd kernel and initrd was properly done, and thus didn't set the system to boot on newly installed kernel.

So we "fixed" it back then by having kernel post scripts doing the symlinking itself as that always work...

Then there is also the fact nowdays that kernel has ahci builtin so on hw using ahci the kernel is capable of booting without initrd :)
(I haven't checked if grub(2) bails out when it does not find initrd)

The kernel %post symlinking could be fixed to theck for existing initrd before doing symlinking (or debug/fix installkernel to not fail changing the symlinks)

Note You need to log in before you can comment on or make changes to this bug.