| Summary: | unable to remove old kernels | ||
|---|---|---|---|
| Product: | Mageia | Reporter: | Pierre Fortin <pf> |
| Component: | RPM Packages | Assignee: | Mageia Bug Squad <bugsquad> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | normal | ||
| Priority: | Normal | CC: | davidwhodgins, joselp, lewyssmith |
| Version: | 8 | ||
| Target Milestone: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Source RPM: | CVE: | ||
| Status comment: | |||
| Attachments: |
output of script from attachment 12921
Results after 18 hours |
||
|
Description
Pierre Fortin
2021-10-19 00:01:32 CEST
Created attachment 12956 [details] output of script from attachment 12921 [details] Something caused the rpm db to become corrupt, resulting in rpm -qa somehow showing the packages as installed though urpme doesn't show it as installed. Possible causes I can think of include file system becoming full during install/uninstall or system shutting down during an install or uninstall. I'd try uninstalling each of the three packages using "rpm -e $package" directly. As to the uninstall wanting to reinstall files, that's just a mis-understanding of the messages from dkms. It's simply saying that the module has been uninstalled, no original module of the same name for that kernel exists for dkms to automatically re-install, and informing the user they should manually install a copy of the module with that name for that kernel, should they want to have it. It's not trying to re-install the module, just informing the user they should do so if they want the module. CC:
(none) =>
davidwhodgins I have always uninstalled the old kernels with "urpme --auto-orphans, and I haven't problems here. I think that we could create a gui tool of "urpme --auto-orphans", where the user can select what packages he want uninstall, and we can add a highlighted option for uninstalling the kernels, whith a warning or indications about the uninstallation of the packages marked. CC:
(none) =>
joselpddj (In reply to Dave Hodgins from comment #2) > Something caused the rpm db to become corrupt, resulting in rpm -qa > somehow showing the packages as installed though urpme doesn't show it as > installed. Shouldn't upgrades have cleaned up the old stuff? Note that there are mga6 and mga7 packages listed... > Possible causes I can think of include file system becoming full during > install/uninstall or system shutting down during an install or uninstall. Never had either happen. This system has two 1TB SSD (Samsung SSD 860) and a 250G mSATA card (Samsung SSD 850). Root partition is 48GB. > I'd try uninstalling each of the three packages using "rpm -e $package" > directly. Just did that on each package (308) in the previous list. After running 18 hours on my Intel i7-4710MQ CPU @ 2.50GHz -- 8 threads each averaging 25% to 60% (while I watched for a few minutes), I'm left with a smaller; but still significant list (65)[1]. I'm absolutely blown away by the amount of computing power needed to remove old kernels... Surely, there must be design flaw therein...? [1] Actually 61. Four packages are now listed twice... Created attachment 12957 [details]
Results after 18 hours
Before I run this file to try to delete those that weren't on first try, anyone have any ideas why there are now duplicate entries?
Any files I can provide that might help figure this out?
Try deleting the files /var/lib/rpm/__db.00* and then running rpm --rebuilddb (In reply to Jose Manuel López from comment #3) > I think that we could create a gui tool of "urpme --auto-orphans", where the > user can select what packages he want uninstall, and we can add a > highlighted option for uninstalling the kernels, whith a warning or > indications about the uninstallation of the packages marked. Given previous issues I've had with kernels (e.g., bugs 21979, 21753), may I suggest that in addition to preventing removal of the running kernel (Doh :) we also prevent the removal of at least one alternate kernel. (In reply to Dave Hodgins from comment #6) > Try deleting the files /var/lib/rpm/__db.00* and then running rpm --rebuilddb For the record before I do..... -rw-r--r-- 1 root root 0 Oct 12 12:45 __db.000 -rw-r--r-- 1 root root 352256 Oct 19 21:59 __db.001 -rw-r--r-- 1 root root 174264 Oct 19 21:59 __db.002 -rw-r--r-- 1 root root 1318912 Oct 19 21:59 __db.003 Sigh... story of my life... nothing is ever easy... # rpm --rebuilddb error: could not delete old database at /var/lib/rpmold.3272123 In addition to the above error (comment 9), just before running rpm --rebuilddb, now _empty_ /var/lib/rpm contained: ll total 447884 lrwxrwxrwx 1 root root 21 Oct 16 2019 alternatives -> /var/lib/alternatives/ -rw-r--r-- 1 root root 59408384 Oct 19 18:16 Basenames -rw-r--r-- 1 root root 135168 Oct 18 15:39 Conflictname -rw-r--r-- 1 root root 49815552 Oct 19 18:16 Dirnames -rw-r--r-- 1 root root 8192 Sep 12 2017 Enhancename -rw-r--r-- 1 root root 8192 Oct 2 17:23 Filetriggername drwxr-xr-x 2 root root 4096 Aug 15 21:41 filetriggers/ -rw-r--r-- 1 root root 90112 Oct 19 18:16 Group -rw-r--r-- 1 root root 1615755 Oct 12 12:45 installed-through-deps.list -rw-r--r-- 1 root root 1670369 Oct 18 2019 installed-through-deps.list.old -rw-r--r-- 1 root root 94208 Oct 19 18:16 Installtid -rw-r--r-- 1 root root 311296 Oct 19 18:16 Name -rw-r--r-- 1 root root 327680 Oct 18 15:39 Obsoletename -rw-r--r-- 1 root root 335683584 Oct 19 18:16 Packages -rw-r--r-- 1 root root 6987776 Oct 19 18:16 Providename -rw-r--r-- 1 root root 40960 Oct 19 18:16 Recommendname -rw-r--r-- 1 root root 1572864 Oct 19 18:16 Requirename -rw-r--r-- 1 root root 503808 Oct 19 18:16 Sha1header -rw-r--r-- 1 root root 278528 Oct 19 18:16 Sigmd5 -rw-r--r-- 1 root root 8192 Oct 2 17:22 Suggestname -rw-r--r-- 1 root root 8192 Mar 1 2021 Supplementname -rw-r--r-- 1 root root 8192 Oct 9 12:53 Transfiletriggername -rw-r--r-- 1 root root 8192 Oct 2 17:22 Triggername Yet: $ rpm -qa | wc 4696 4696 142192 Now what? The question is why is it failing to rebuild. On on of my working m8 systems,
running the rebuild under strace the only references to rpmold are ...
# grep rpmold strace.txt
2245 rename("/var/lib/rpm", "/var/lib/rpmold.2245") = 0
2245 access("/var/lib/rpmold.2245/pubkeys", F_OK) = -1 ENOENT (No such file or directory)
2245 openat(AT_FDCWD, "/var/lib/rpmold.2245", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
2245 unlink("/var/lib/rpmold.2245/Basenames") = 0
2245 unlink("/var/lib/rpmold.2245/Conflictname") = 0
2245 unlink("/var/lib/rpmold.2245/Dirnames") = 0
2245 unlink("/var/lib/rpmold.2245/Enhancename") = 0
2245 unlink("/var/lib/rpmold.2245/Filetriggername") = 0
2245 unlink("/var/lib/rpmold.2245/Group") = 0
2245 unlink("/var/lib/rpmold.2245/Installtid") = 0
2245 unlink("/var/lib/rpmold.2245/Name") = 0
2245 unlink("/var/lib/rpmold.2245/Obsoletename") = 0
2245 unlink("/var/lib/rpmold.2245/Packages") = 0
2245 unlink("/var/lib/rpmold.2245/Providename") = 0
2245 unlink("/var/lib/rpmold.2245/Recommendname") = 0
2245 unlink("/var/lib/rpmold.2245/Requirename") = 0
2245 unlink("/var/lib/rpmold.2245/Sha1header") = 0
2245 unlink("/var/lib/rpmold.2245/Sigmd5") = 0
2245 unlink("/var/lib/rpmold.2245/Suggestname") = 0
2245 unlink("/var/lib/rpmold.2245/Supplementname") = 0
2245 unlink("/var/lib/rpmold.2245/Transfiletriggername") = 0
2245 unlink("/var/lib/rpmold.2245/Triggername") = 0
2245 unlink("/var/lib/rpmold.2245/__db.000") = 0
2245 openat(AT_FDCWD, "/var/lib/rpmold.2245", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
2245 rmdir("/var/lib/rpmold.2245") = 0
In my case /var/lib/, and /var/lib/rpm are in the same filesystem as /
What's the output of df -h
Also what's the output of "df -h -i" in addition to "df -h". And the output of "mount|grep ^/dev". (In reply to Dave Hodgins from comment #12) > Also what's the output of "df -h -i" in addition to "df -h". $ df -h Filesystem Size Used Avail Use% Mounted on devtmpfs 16G 0 16G 0% /dev tmpfs 16G 4.0K 16G 1% /dev/shm tmpfs 16G 11M 16G 1% /run /dev/sdb1 48G 44G 2.3G 95% / /dev/sdb2 1.5G 168K 1.5G 1% /boot/EFI /dev/sda5 32G 23G 7.5G 76% /root.old /dev/sdc2 867G 702G 121G 86% /home.bak /dev/sdb4 175G 158G 8.8G 95% /home.test /dev/sda3 101G 46G 55G 46% /media/windows /dev/sda7 741G 634G 70G 91% /home tmpfs 16G 80M 16G 1% /tmp tmpfs 3.2G 240K 3.2G 1% /run/user/1000 /dev/sda1 39M 114K 39M 1% /run/media/pfortin/E3A2-CAFF tmpfs 3.2G 116K 3.2G 1% /run/user/1001 $ df -h -i Filesystem Inodes IUsed IFree IUse% Mounted on devtmpfs 4.0M 741 4.0M 1% /dev tmpfs 4.0M 2 4.0M 1% /dev/shm tmpfs 4.0M 1.5K 4.0M 1% /run /dev/sdb1 3.1M 1.1M 2.0M 35% / /dev/sdb2 0 0 0 - /boot/EFI /dev/sda5 2.1M 502K 1.6M 25% /root.old /dev/sdc2 56M 1.5M 54M 3% /home.bak /dev/sdb4 12M 239K 11M 3% /home.test /dev/sda3 55M 91K 55M 1% /media/windows /dev/sda7 48M 2.3M 45M 5% /home tmpfs 400K 169 400K 1% /tmp tmpfs 801K 157 801K 1% /run/user/1000 /dev/sda1 0 0 0 - /run/media/pfortin/E3A2-CAFF tmpfs 801K 83 801K 1% /run/user/1001 (In reply to Dave Hodgins from comment #13) > And the output of "mount|grep ^/dev". $ mount|grep ^/dev /dev/sdb1 on / type ext4 (rw,noatime) /dev/sdb2 on /boot/EFI type vfat (rw,relatime,fmask=0000,dmask=0000,allow_utime=0022,codepage=437,iocharset=utf8,shortname=mixed,utf8,errors=remount-ro) /dev/sda5 on /root.old type ext4 (rw,noatime) /dev/sdc2 on /home.bak type ext4 (rw,noatime) /dev/sdb4 on /home.test type ext4 (rw,noatime) /dev/sda3 on /media/windows type fuseblk (rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other,blksize=4096) /dev/sda7 on /home type ext4 (rw,noatime) /dev/sda1 on /run/media/pfortin/E3A2-CAFF type vfat (rw,nosuid,nodev,relatime,uid=1000,gid=1000,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,showexec,utf8,flush,errors=remount-ro,uhelper=udisks2) rpm --rebuilddb silently ended this time. Is that a good sign? Now re-running rpm -e on remaining 61 packages. BTW, the 18 hour removal of some of the old kernels only changed the free space on / (sdb1) from 2.2G to 2.3G Progress; down to 4 remaining packages... and removing 57 only took 18 minutes... that's a drastic difference: 18 hours to remove 247 -- avg 4.4 minutes 18 minutes for 57 -- avg 0.3 minutes Got this from trying to remove the last 61 packages: $ . /tmp/rmoldk dkms.conf: Error! No 'DEST_MODULE_LOCATION' directive specified. dkms.conf: Error! No 'PACKAGE_NAME' directive specified. dkms.conf: Error! No 'PACKAGE_VERSION' directive specified. Error! Bad conf file. Your dkms.conf is not valid. error: %preun(<PACKAGE>) scriptlet failed, exit status 4 error: <PACKAGE>: erase failed where PACKAGE is each of: virtualbox-kernel-4.9.56-desktop-1.mga6-5.1.30-1.mga6 virtualbox-kernel-4.14.56-desktop-1.mga6-5.2.14-6.mga6 virtualbox-kernel-4.14.56-server-1.mga6-5.2.14-6.mga6 xtables-addons-kernel-4.14.56-desktop-1.mga6-2.13-48.mga6 These 4 OLD packages are each listed twice; though I get only one of the above error for each. Question: Any chance rpm looks through the entire filesystem rather than staying within the mga8 system? I still have (possibly-)bootable mga5 and mga6 systems on other drives (3 drives in this laptop) as hinted to with: /dev/sda5 on /root.old type ext4 (rw,noatime) Yes, deleting old kernels can be tedious. I start with:
$ rpm -qa | grep kernel | sort
which gives a clear ordered list:
kernel-desktop-5.10.60-2.mga8-1-1.mga8
kernel-desktop-5.10.62-1.mga8-1-1.mga8
kernel-desktop-5.10.70-1.mga8-1-1.mga8
kernel-desktop-latest-5.10.70-1.mga8
...
then 'urpme' the older full package names, several at a time. Always leaving at least the two latest. If it takes ages, this might be due to rare systems where re-generating Grub each time (as happens for kernel updates) takes ages; this does not affect most people.
The end result is visible in /boot, where each kernel now includes for example:
config-5.10.60-desktop-2.mga8
initrd-5.10.60-desktop-2.mga8.img
symvers-5.10.60-desktop-2.mga8.xz
System.map-5.10.60-desktop-2.mga8
vmlinuz-5.10.60-desktop-2.mga8
but this list was shorter in the past.CC:
(none) =>
lewyssmith It would not be looking on other file systems. For each of those packages, I'm not sure what is causing the problem. Rather then investigate them further since it's likely one of the changes in dkms since mga6 that is causing it, I'd next try "rpm -e --noscripts $packagename" to remove them. Then manually delete each directory shown by "tree -ifa /var/lib/dkms|grep mga6". Wow... rpm -e --noscripts finally got rid of those four last old packages and returned 6.5G /dev/sdb1 48G 37G 8.8G 81% / tree -ifa /var/lib/dkms|grep mga6 gave no output, so all looks fine now. THANKS!! Hope this results in a better old kernel removal tool... Thanks! Resolution:
(none) =>
FIXED I have no idea why having those mga6 packages installed caused such problems, especially with the space. I'm guessing it's due to a combination of changes in dkms and rpm since mga6. As the author of the script being used, I don't see any changes needed at this point. I'm not going to change it to try and handle mga6 packages, but will keep in mind their impact if others report a similar problem. I recommend running the script and more often and uninstalling packages from the prior release shortly after each upgrade. I've been doing that on my main personal use install (not one used normally for qa testing) which started as a Mageia 3 install. |