| Summary: | drakboot crashed [ERROR: killing runaway process (process=update-grub2, pid=5978, args=, error=ALARM at /usr/lib/libDrakX/run_program.pm line 235.] | ||
|---|---|---|---|
| Product: | Mageia | Reporter: | George Mitchell <george> |
| Component: | RPM Packages | Assignee: | Mageia tools maintainers <mageiatools> |
| Status: | RESOLVED INVALID | QA Contact: | |
| Severity: | normal | ||
| Priority: | Normal | CC: | davidwhodgins, davidwhodgins, doc-bugs, kernel, mageiatools, marja11, zen25000 |
| Version: | 6 | ||
| Target Milestone: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Source RPM: | drakxtools-17.88.1-1.mga6, grub2 | CVE: | |
| Status comment: | |||
| Attachments: |
journal output at leading up to crash
Grub update history Journal Output leading up to and including the crash ... |
||
|
Description
George Mitchell
2018-02-06 03:29:27 CET
(In reply to George Mitchell from comment #0) > The "drakboot" program crashed. Drakbug-17.88.1 caught it. > > I was attempting to update grub from Mageia Control Center > after a routine kernel update failed and locked up my system. > Please attach the journalctl output (as root) from when you did the kernel update and the grub update attempt. If you didn't reboot, then, as root run: journalctl -b > log.txt If you rebooted once: journalctl -b -1 > log.txt If twice: journalctl -b -2 > log.txt etc. Compress with xz if the file is too large. Thanks :-) Keywords:
(none) =>
NEEDINFO Created attachment 9964 [details]
journal output at leading up to crash
This is my journal output, as requested, leading up to the drakboot crash.
Created attachment 9965 [details]
Grub update history
I am adding a brief history of grub updates since this system was initially installed.
This problem may be related to bug 19953 that I reported 12-15-2016 when I was still running what is now Mageia 6 as Cauldron. Just as a general observation regarding the overall update-grub2 process, I would suggest that the current process of moving grub.cfg.new to grub.cfg.old as backup and then building grub.cfg.new in place means that if update-grub2 fails, the system becomes unbootable as a result. Because of that risk it might be a better ideal to move grub.cfg.new to grub.cfg.old and THEN build grub.cfg.tmp and only moving grub.cfg.tmp to grub.cfg.new AFTER the process has complete normally. That way in case update-grub2 fails, the older intact grub.cfg.new remains to insure the system is still bootable. (In reply to George Mitchell from comment #5) > Just as a general observation regarding the overall update-grub2 process, I > would suggest that the current process of moving grub.cfg.new to > grub.cfg.old as backup and then building grub.cfg.new in place means that if > update-grub2 fails, the system becomes unbootable as a result. Because of > that risk it might be a better ideal to move grub.cfg.new to grub.cfg.old > and THEN build grub.cfg.tmp and only moving grub.cfg.tmp to grub.cfg.new > AFTER the process has complete normally. That way in case update-grub2 > fails, the older intact grub.cfg.new remains to insure the system is still > bootable. I guess it was an upstream grub2 decision to do it the way it is done now. Barjac will know, assigning this report to him. Btw, after every kernel update I run a tiny script to see that the links in /boot have been updated and that a new /boot/grub2/grub.cfg has been written. Of course, since I use that script, a kernel update has never gone wrong, so I've never needed it ;-) Assignee:
bugsquad =>
zen25000
Marja Van Waes
2018-02-08 09:33:13 CET
CC:
(none) =>
mageiatools I suspect that this is a combination of os-prober being slow on that massive drive combined with the timeout in drakboot which kills over-long os-prober runs. As far as grub.cfg generation goes IAFAIK it's determined by grub2-mkconfig upstream and I _think_ it backs up grub.cfg to grub.cfg.old and then re-writes grub.cfg. So in the case of failure, renaming grub.cfg.old to grub.cfg should get you running again. It would be possible to introduce an extra grub.tmp step but I'm not sure it's needed. OT My personal thoughts on your set-up with so many very old systems on such a large drive would be to use a small dedicated partition with it's own install of grub2 with a manually generated grub.cfg to chain into each system. This would then allow you to not use os-prober at all, and no system would need anything other than it's own kernel entries in it's boot menu. I have attempted to automate this in a script which works fine for me, however it uses os-prober to do this so probably not a good idea in your case, unless you use it without the big drive to generate the initial configuration, and then re-add it and manually enter the systems on that drive later. https://wiki.mageia.org/en/User_talk:Barjac /OT @marja I think this one is beyond my capabilities, please re-assign :\ Assignee:
zen25000 =>
bugsquad Thanks for having looked into this, Barry. I'm wondering how many installed OS's we decided to, or want to, let our bootloader support, and whether, if that's more work than handling supported OS versions, we want grub2/drakboot to handle installs of OS versions that are no longer supported themselves. Re-assigning to the mageiatools maintainers and CC'ing David Hodgins (because he has many OS'es installed) and docteam. If we close this report as invalid, then the reason for doing so should also be added to our documentation. Assignee:
bugsquad =>
mageiatools (In reply to Marja van Waes from comment #8) > Thanks for having looked into this, Barry. > > I'm wondering how many installed OS's we decided to, or want to, let our > bootloader support, and whether, if that's more work than handling supported > OS versions, we want grub2/drakboot to handle installs of OS versions that > are no longer supported themselves. > > Re-assigning to the mageiatools maintainers and CC'ing David Hodgins > (because he has many OS'es installed) and docteam. If we close this report > as invalid, then the reason for doing so should also be added to our > documentation. In my case most of these many OS's are just archived data and old backups. The problem does not actually seem to be driven by the number of OS's, but rather by the existence of the unformatted 4TB drive with many subvolumes. That is what is driving drakboot bonkers. That was something of an experiment on my part and one that I am not about to repeat again. At some point I am going to be breaking that 4TB drive down into 1TB partitions and that should resolve the timeout problems. Right now we have 8TB drives on the market and I hate to even immagine how drakboot would handle an unpartitioned 8TB drive. So I suspect the problem really revolves around a combination of huge drives and lack of partitioning. (In reply to Barry Jackson from comment #7) > I suspect that this is a combination of os-prober being slow on that massive > drive combined with the timeout in drakboot which kills over-long os-prober > runs. > > As far as grub.cfg generation goes IAFAIK it's determined by grub2-mkconfig > upstream and I _think_ it backs up grub.cfg to grub.cfg.old and then > re-writes grub.cfg. So in the case of failure, renaming grub.cfg.old to > grub.cfg should get you running again. It would be possible to introduce an > extra grub.tmp step but I'm not sure it's needed. > > OT > My personal thoughts on your set-up with so many very old systems on such a > large drive would be to use a small dedicated partition with it's own > install of grub2 with a manually generated grub.cfg to chain into each > system. This would then allow you to not use os-prober at all, and no system > would need anything other than it's own kernel entries in it's boot menu. I > have attempted to automate this in a script which works fine for me, however > it uses os-prober to do this so probably not a good idea in your case, > unless you use it without the big drive to generate the initial > configuration, and then re-add it and manually enter the systems on that > drive later. > https://wiki.mageia.org/en/User_talk:Barjac > /OT > > > @marja > I think this one is beyond my capabilities, please re-assign :\ Barry, you are right on the mark here. The problem is not the number of systems, but rather the 4TB UNPARTITIONED drive. That is precisely what os-prober is choking on. In my case this drive is used only for archived data including old OS'es. Mageia 3? How old is that? Certainly nothing that I would ever have a need to boot up. And I think the real question is whether Mageia wants or is able to support booting an OS on a 4TB unpartitioned drive since any drive that size is most likely being used to backup or archive data and very unlikely to be used to actually run a system on. I removed the 4TH drive and that does not fix the problem. That leaves me with a totally hosed system because an attempt at a kernel upgrade after removing the large drive left me with a corrupted grub.cfg and no backup left. So now I have no valid grub.cfg to boot on. I tried building grub.cfg by denying drakboot access to all the old versions of Mageia by denying it the ability to mount the partitions and it still fails to build a valid grub.cfg. That leaves me having to use a Super Grub2 disc which magically assembles a viable grub screen with all the options in seconds, something which the Mageia software now seems to find impossible to do. Since the 4TB drive is not disconnected, that means the problem is that it doesn't like the old OS's but it also demands to look at ALL the partitions or else. I am not sure what I can do with this mess. I am trying to run a business with this system and I am not happy with things. I have never ever had this sort of problem before with Mageia. With the 4TB drive remaining disconnected. I removed around 20 old kernel versions from the /boot directory of the primary OS on the hardware in question and that seems to have fixed the problem. I had not cleaned out the old kernel versions for some time and there were around 26 of them in the /boot directory and that is now reduced to around 6. Now drakboot is able to build a viable grub.cfg in a reasonably short period of time. So I am going to close out this bug if that is OK. I am marking this resolved as explained in my previous comment. Status:
NEW =>
RESOLVED
George Mitchell
2018-02-23 21:35:40 CET
Resolution:
WORKSFORME =>
(none) Created attachment 10002 [details]
Journal Output leading up to and including the crash ...
I am reopening this because suddenly it is crashing again with the large hard drive removed and the old kernels cleaned out and I would appreciate it if someone can take a look at it and see if they can figure out what is going on. This time it looks like perhaps the kernel is experiencing a panic for some reason during the os-probing process. I am attaching journal data leading up to the crash.
(In reply to George Mitchell from comment #14) > Created attachment 10002 [details] > Journal Output leading up to and including the crash ... > > I am reopening this because suddenly it is crashing again with the large > hard drive removed and the old kernels cleaned out and I would appreciate it > if someone can take a look at it and see if they can figure out what is > going on. This time it looks like perhaps the kernel is experiencing a > panic for some reason during the os-probing process. I am attaching journal > data leading up to the crash. CC'ing kernel maintainers CC:
(none) =>
kernel I am going to close this one more time, hopefully forever. For the first time I was able to successfully run drakboot after running `rpm -Va` and discovering a slew of python scripts with bad md5 sums. I identified all the packages involved and did reinstalls on all of them and suddenly a lot of my problems went away. My system log has been spewing out all sorts of kde complaints for as long as I can remember and all that has stopped now as well and the log is finally running clean. Hopefully this is a permanent fix. Thank you so much for your efforts to help me. In all my years of using Linux I have never seen anything quite like this. The problem was never in drakboot from the beginning, but corrupted python scripts were the problem. Resolution:
(none) =>
INVALID |