Bug 16010 - Mageia5 Final Lives & Classic on real EFI HW fail to install bootloader (ERROR: killing runaway process (process=update-grub2...)
Summary: Mageia5 Final Lives & Classic on real EFI HW fail to install bootloader (ERRO...
Status: RESOLVED FIXED
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: Cauldron
Hardware: x86_64 Linux
Priority: High major
Target Milestone: ---
Assignee: Thomas Backlund
QA Contact:
URL: https://bugzilla.redhat.com/show_bug....
Whiteboard:
Keywords: NEEDINFO
: 19047 (view as bug list)
Depends on:
Blocks: 416
  Show dependency treegraph
 
Reported: 2015-05-22 09:23 CEST by Lewis Smith
Modified: 2016-11-17 14:17 CET (History)
7 users (show)

See Also:
Source RPM: grub2, os-prober
CVE:
Status comment:


Attachments
Journal of failed KDE installation from the ISO USB boot menu (661.88 KB, text/plain)
2015-05-22 09:31 CEST, Lewis Smith
Details
rediffed patches (16.64 KB, patch)
2015-05-22 11:17 CEST, Thierry Vignaud
Details | Diff
Journals showing where bootloader generation runaway process killed. (98.58 KB, application/gzip)
2015-05-29 12:50 CEST, Vladimir Zawalinski
Details
'ps' while abortive Classic bootloader configuration is running (3.34 KB, text/plain)
2015-05-29 18:18 CEST, Lewis Smith
Details
"bug output at the end of abortive bootloader configuration Classic x64 real EFI hardware. (149.04 KB, application/x-xz)
2015-05-29 18:49 CEST, Lewis Smith
Details
limit timeout to 2m and rety w/o os-prober if needed (894 bytes, patch)
2015-06-15 11:49 CEST, Thierry Vignaud
Details | Diff

Description Lewis Smith 2015-05-22 09:23:24 CEST
Description of problem:
This happens whether installed directly from the boot menu, or the Live desktop icon; from both DVD or USB; both Gnome and KDE. These are the first issued Live ISOs with EFI booting directly from ISO USB.
Real EFI hardware.

Installation proceeds normally to the Bootloader dialogue at the end. 'Next' downloads 5 pkgs for EFI booting. The bootloader window then remains completely blank for 10 minutes, then crashes, and up pops a drakbug window:

The program "draklive-install" has crashed with the following message:
update-grub2 failed: at /usr/lib/libDrakX/any.pm line 615
Perl's trace:
drakbug::bug_handler() called from /usr/lib/libDrakX/any.pm:615
any::setupBootloader_grub2() called from /usr/lib/libDrakX/any.pm:228
any::setupBootloader() called from /usr/lib/libDrakX/any.pm:240
any::setupBootloaderUntilInstalled() called from
 /usr/sbin/draklive-install:331
main::setup_bootloader() called from /usr/sbin/draklive-install:70
main::install_live() called from /usr/sbin/draklive-install:42

Used theme: oxygen-gtk [Live]    [Adwaita if install direct from boot menu]

Version-Release number of selected component (if applicable):
Cauldron Mageia 5/Final round 3.

How reproducible:
Always

Steps to Reproduce:
1.
2.
3.


Reproducible: 

Steps to Reproduce:
Comment 1 Lewis Smith 2015-05-22 09:31:30 CEST
Created attachment 6609 [details]
Journal of failed KDE installation from the ISO USB boot menu

Taken from virtual console after the crash with the drakbug window extant to a mounted formatted USB stick:
# journalctl > [path-to-USB-root-directory/journal]
Sorry it is so big, I would not where where to trim it.
Comment 2 Marja Van Waes 2015-05-22 09:55:27 CEST
(In reply to Lewis Smith from comment #1)
> Created attachment 6609 [details]
> Journal of failed KDE installation from the ISO USB boot menu
> 
> Taken from virtual console after the crash with the drakbug window extant to
> a mounted formatted USB stick:
> # journalctl > [path-to-USB-root-directory/journal]
> Sorry it is so big, I would not where where to trim it.

Thanks for the complete report, Lewis.

Half of the log is grub2 debug, so I wouldn't worry about its size ;-)


@ iso-testers

Can someone please try to reproduce (I can't, my EFI-laptop died a few days ago)

CC: (none) => marja11, qa-bugs, thierry.vignaud, zen25000
Assignee: bugsquad => tmb

Comment 3 Thierry Vignaud 2015-05-22 10:11:23 CEST
You got hit by 10m timeout.
@Barry: update-grub2 (really os-probers) shouldn't take that much time!!!
os-prober tests complete just after the timeout expire:

Mai 22 08:41:33 localhost draklive-install[2746]: running: update-grub2  with root /mnt/install
(...)
Mai 22 08:41:52 localhost logger[24017]: 50mounted-tests: debug: running subtest /usr/lib/os-probes/mounted/90linux-distro
Mai 22 08:51:33 localhost draklive-install[2746]: ERROR: killing runaway process (process=update-grub2, pid=20882, args=, error=ALARM at /usr/lib/libDrakX/run_program.pm line 241.
Mai 22 08:51:35 localhost logger[24017]: 90linux-distro: result: /dev/sda10:Korora release 20 (Peach):Fedora:linux

Component: Installer => RPM Packages
Summary: Mageia5 Final Round 3 Live DVDs on real EFI HW fail to install bootloader => Mageia5 Final Round 3 Live DVDs on real EFI HW fail to install bootloader (ERROR: killing runaway process (process=update-grub2...)
Source RPM: (none) => grub2, os-prober

Comment 4 Samuel Verschelde 2015-05-22 10:13:39 CEST
Blocker unless proved rare and having a workaround.

Priority: Normal => release_blocker

Comment 5 Barry Jackson 2015-05-22 11:11:08 CEST
@Lewis
Is there anything unusual about the installation that is on sda10? Filesystem types, bad blocks, anything odd?

What happens if you run os-prober (as root) from a Cauldron/Mga5 installation on that machine? Does it complete in a reasonable time?

@Thierry
No it usually takes about 15s for os-prober to complete on my machine with 7 systems spread across 3HD, (1 external USB) and 1 SSD.
Comment 6 Thierry Vignaud 2015-05-22 11:14:36 CEST
FC has (not yet applied) patches here:
https://bugzilla.redhat.com/show_bug.cgi?id=875356

URL: (none) => https://bugzilla.redhat.com/show_bug.cgi?id=875356

Comment 7 Thierry Vignaud 2015-05-22 11:17:10 CEST
Created attachment 6611 [details]
rediffed patches

This tarball contains updated SPEC & patches for os-prober.
Existing patches were resynced with FC (mainly noise about line numbers + a couple fixes).
New patches were rediffed but the 3 last ones (the first of those being already applied, the 2 others needing more work for applying).

As os-prober runtime was shrinked from 45s to 10s on my test VM, I didn't care putting more work on those 2 last patches)
Though the RHBZ comments says the reduce the runtime quite a lot more...
Comment 8 Thierry Vignaud 2015-05-22 11:40:34 CEST
Humm, there's no more output with those patches...
Comment 9 Thierry Vignaud 2015-05-22 11:48:36 CEST
Forget that, I'd unplugged the test USB disk :-(
Comment 10 Thierry Vignaud 2015-05-22 11:50:33 CEST
Obviously the gain was poor once the USB disk was plugged again :-(
Comment 11 Barry Jackson 2015-05-22 21:13:40 CEST
@Thierry
Your patched os-prober (emailed) crashes when a btrfs system is found due to use of debug lines in my btrfs patch. One of the patches you applied removes the debug command :\
If you feel that these patches will help this bug then I will have to remove the debug lines from that patch.
Comment 12 Lewis Smith 2015-05-22 22:56:27 CEST
(In reply to Barry Jackson from comment #5)
> @Lewis
> Is there anything unusual about the installation that is on sda10?
> Filesystem types, bad blocks, anything odd?
It is a working Korora system, ext4. I am not aware of problems with it. This is a Fedora derivative, and different from all the others in having loads of Secure Boot stuff in \EFI\ - and something at the level of that directory i.e. top level of the ESP. Doubt that this matters.

> What happens if you run os-prober (as root) from a Cauldron/Mga5
> installation on that machine? Does it complete in a reasonable time?
I take it it can be run on its own, then. *If* I still have a working M5 I will try it. Would Live or Classic origin make any difference?

> @Thierry
Comment 3:
> You got hit by 10m timeout
I guessed, but this is not legitimate...
Comment 5
> No it usually takes about 15s for os-prober to complete on my machine with 7
> systems spread across 3HD, (1 external USB) and 1 SSD.
FWIW I also have Manjaro Linux (Arch derivative), and every time they issue a new kernel, their system update process takes about 10 minutes at that point - twice (2 kernels)! presumably regenerating Grub. They cannot explain this beyond blindly blaming OSprober.
All my other Linux's, Mageia 4 included, re-make Grub quickly after new kernels.

Let us not forget that I have previously succeeded in installing at least once the M5 Classic on this box, a Live also. This problem is recent.
Comment 13 Lewis Smith 2015-05-22 23:20:20 CEST
(In reply to Barry Jackson from comment #5)
> What happens if you run os-prober (as root) from a Cauldron/Mga5
> installation on that machine? Does it complete in a reasonable time?
From an M5 RC Classic installation on the same box:
# time os-prober         [note the hyphen]
took about 8s adding the 3 figures from 'time'; if only the first matters, about 5s.
No valid M5 Live installation to compare.
Comment 14 Lewis Smith 2015-05-24 21:56:44 CEST
This *still* happens with Final round 4 Gnome Live DVD x64 dated 23 May - installation directly from *USB* boot menu (from DVD boot menu gives another bug 15999, sooner).

I think it occurs now also with Final round 3 CLASSIC installed from USB. That goes right to the end and I got a blank window for 10 minutes, followed by "An error has occurred, Grub 2 failed to install" - *twice*:
- when configuring Bootloader from the summary screen
- when continuing from the summary screen -> 'Installing bootloader'.
I could get no other output from this. Please advise how if you want bug output. I refrain from altering the bug title for the moment [not Live exclusive].
Comment 15 Lewis Smith 2015-05-26 10:08:05 CEST
(In reply to Barry Jackson from comment #5)

> Is there anything unusual about the installation that is on sda10?
> Filesystem types, bad blocks, anything odd?
Further to my Comment 12 :
This is a Manjaro (ex Arch) installation. I have just discovered on running it that on booting it complained that the 'dirty' bit was set, so your question was a good one.
Manjaro simply cleared the dirty bit & carried on OK. (I am using it). I will fsck it from another system, and re-try at least the Gnome Live installation.
Comment 16 Lewis Smith 2015-05-26 10:27:27 CEST
My previous Comment 15 is rubbish, ignore it: M5 is driving me ga-ga.

I have just 'e2fsck -f' both /dev/sda8 (Manajaro) and /dev/sda10 (Korora) which revealed no problems for each of the 5 passes. Plain fsck seems to simply check the 'clean' status.
Comment 17 Thierry Vignaud 2015-05-26 10:31:02 CEST
So ideally, os-prober should skip partitions that need fsck.
Ideally, it could also cache the info with the partition UUID as key...
Comment 18 Lewis Smith 2015-05-26 19:27:48 CEST
(In reply to Thierry Vignaud from comment #17)
> So ideally, os-prober should skip partitions that need fsck.
Not at all.

You missed the subtle point of my erroneous Comment 15 and correction Comment 16: Barry had questioned partition 10: *that* was pure. It was partition 8 that when I later used it said it was not clean - but the 'clean' bit apart, it was.
Not important for this problem, I think.
Comment 19 Lewis Smith 2015-05-27 22:43:28 CEST
Confirmation that this problem remains for me with the 27 May Gnome Live ISO as per the original Description. At the end of the installation, whether from DVD or USB. Real EFI hardware.
The download of the 5 booting-related pkgs does not *quite* seem to finish, but the windows in these latest Live ISOs are slightly bizarre, so I doubt that this is important.
Comment 20 Thierry Vignaud 2015-05-28 06:30:19 CEST
As a workaround, you could run "chrooot /mnt rpm -e os-prober" on tty2
or killall os-prober
Comment 21 Vladimir Zawalinski 2015-05-28 07:40:55 CEST
I have just completed boot-live and then install of mga5,64b 27/5/2015 (Gnome) in UEFI  on an externnal usb disk encountering no issues at all (Nvidia grpahics).

Here is the ouput of os-prober, which took less than 10 seconds to run on a box having 3 discrete discs (2 sdd) and one isw-raid0 array. 

[root@localhost vlad]# os-prober
/dev/sda2:openSUSE 13.2 (x86_64):SUSE:linux
/dev/sde2@/EFI/Microsoft/Boot/bootmgfw.efi:Windows Boot Manager:Windows:efi
/dev/sdg1:Mageia 5 (5):Mageia:linux
/dev/mapper/isw_cijgfgdgbd_Volume1p8:Mageia 5 (5):Mageia1:linux
[root@localhost vlad]# 

Therefore, this build can install and work uneventfully in a specific set of circumstances.  Can anything be inferred from this and is there some way I could  provoke this bug?

CC: (none) => vzawalin1

Comment 22 Lewis Smith 2015-05-28 08:51:19 CEST
Confirm that this is still extant on the 27 May *Classic* ISO.
It probably crept into the Classic at the same time as for the Lives, but I did not go that way at that time. I have adjust the title accordingly.
It is now 

Reply to Comment 20:
> As a workaround, you could run "chrooot /mnt rpm -e os-prober" on tty2
> or killall os-prober
I am not interested in doing this just to get M5 installed; that is pointless. However, if it would help resolve the issue, I will.
Can I do this for a Live installation (quicker)? Is it to try after the error has occurred, or sooner? Is there any more information you want? I have not yet provided a journal from the Classic failure - if there is one.

TIA

Summary: Mageia5 Final Round 3 Live DVDs on real EFI HW fail to install bootloader (ERROR: killing runaway process (process=update-grub2...) => Mageia5 Final Lives & Classic on real EFI HW fail to install bootloader (ERROR: killing runaway process (process=update-grub2...)

Comment 23 Thierry Vignaud 2015-05-28 08:51:46 CEST
Well:
- having a slower machine (the report happened on a AMD E1-1200)
- having slow disk (eg: not SSD)
- maybe installing a Korora instance
  see "different from all the others in having loads of Secure Boot stuff in \EFI\ - and something at the level of that directory i.e. top level of the ESP."

Though " 50mounted-tests: error: umount error, retrying after 1 sec" that happens 6 times could be hint...
Comment 24 Lewis Smith 2015-05-28 08:53:52 CEST
Confirm that this is still extant on the 27 May *Classic* ISO.
It probably crept into the Classic at the same time as for the Lives, but I did not go that way at that time. I have adjust the title accordingly.
It is now a hard fault blocking me from installing M5 at all.

Reply to Comment 20:
> As a workaround, you could run "chrooot /mnt rpm -e os-prober" on tty2
> or killall os-prober
I am not interested in doing this just to get M5 installed; that is pointless. However, if it would help resolve the issue, I will.
Can I do this for a Live installation (quicker)? Is it to try after the error has occurred, or sooner? Is there any more information you want? I have not yet provided a journal from the Classic failure - if there is one.

TIA
Comment 25 Thierry Vignaud 2015-05-28 08:58:13 CEST
(In reply to Lewis Smith from comment #22)
Yes you can do this for both Live & Classic installer

the "rpm -e..." command can be run at any time after the grub2 install & prior to the bootloader configuration.
The "killall..." command must be run while grub2 is configured.

There's no journal for Classic installer.
Basically:
- Classic installer:
  o write /root/drakx/report.bug.xz at end of installation
    (before end, one can look at /tmp/ddebug.log for logs and to
    /mnt/root/drakx/install.log for package installation logs)
  o will write report.bug on USB key on demand when running the "bug"
    command on tty2 (to be run after the error obviously)
- Live installer doesn't generate such a report, so one must look at
  syslogs, thus the need for running the journalctl command
Comment 26 Thierry Vignaud 2015-05-28 08:58:56 CEST
Why did you post your answer twice (comment #22 & #24)?
Comment 27 Lewis Smith 2015-05-28 09:08:41 CEST
(In reply to Thierry Vignaud from comment #23)
> Well:
> - having a slower machine (the report happened on a AMD E1-1200)
> - having slow disk (eg: not SSD)
> - maybe installing a Korora instance
>   see "different from all the others in having loads of Secure Boot stuff in
> \EFI\ - and something at the level of that directory i.e. top level of the
> ESP."
Not sure what you are getting at. The box is relatively recent, I don't know what 'slow' means either for the processor or the totally normal disc. This problem has arisen latish in the M5 pre-release saga; I had previously got a Live & Classic installed.
I do not see why the presence of Koroa (Fedora) should suddenly matter. Its differences in the ESP should have nothing to do with OS-prober unless it actually looks there; I thought it scanned the disc(s) partitions. os-prober runs in about 10s.

Sorry for the duplicate Comments 22/24, due to a clash.
Comment 28 Thierry Vignaud 2015-05-28 10:18:04 CEST
Well, recent != fast, E1-1200 is quite slow ; eg compare with my 7 year old CPU: http://cpuboss.com/cpus/Intel-Core2-Duo-E8400-vs-AMD-E1-1200#performance

And a HD is slower than a SSD (in case Vladmimir has one).

As for running in 10s, obviously update-grub2 (really os-prober) took more than 10minutes during install (see the logs in attachment #6609 [details])...
Are you saying that it run fast in the rebooted Mageia 5 (10s) and slow running install?
With which OS does os-prober run in 10s?

Keywords: (none) => NEEDINFO

Comment 29 Vladimir Zawalinski 2015-05-28 11:06:00 CEST
I don't have a slower machine with UEFI firmware, so can only test on this one, about 6 months old.

While there are SSDs in my laptop, the installation was on an external rattling seagate 500G garden-variety SATA disc.

To test further, I first installed Fedora 21 from a Live DVD on that disk, deleted the MGA partitions, and reran the MGA5 installation, which again completed normally.

Fedora created its own ESP partion, no questions asked.

os-prober did not detect Fedora, so it may not have looked in the Fedora esp.. don't know.

I don't think this proves anything.  Too bad.
Comment 30 Thierry Vignaud 2015-05-28 13:10:10 CEST
Lewis, could you look at what processes are run when it's stuck?
For classical installer, you will found a shell on tty2 (alt+ctrl+F2).
For live, you can just open a terminal.
use "ps awx" & "pstree" to find out what's blocking
Comment 31 Dave Hodgins 2015-05-28 23:22:22 CEST
Just a thought. Is the esp partition full?

CC: (none) => davidwhodgins

Comment 32 Lewis Smith 2015-05-29 11:56:37 CEST
Just for the record, I have experimented with all my installed systems, running
 #time os-prober
from each one in turn. Both M5s had to be accessed via the M4 grub menu - but worked despite their abortive bootloader installation! This is the 1st time I have tried this. The following shows the results:
sda2/4 Win8.1
sda8  Manjaro    10m finds all others, blocks on sda3=MS Reserved, NO filesystem
sda9  Suse 12.3  10m finds all others
sda10 Korora 20  7s finds all others but refuses M5C (sda12)
sda11 Mageia 4   7s finds all others except M5C (sda12)
sda12 Mageia 5 C 10m finds all others
sda13 Mageia 5 L 10m finds all others. Starts with 4 extra FS type lines.
sda14 LMDE       15s finds all others

All start their listing from sda10-sda14, then sda2/4-sda9. This is probably due to an alphabetically sorted results array. None threw an error.

It is clear that there are two basic os-probers. The fast one starts with "No volume groups found", and finds everything except the incomplete M5C (Classic) on sda12 - but LMDE includes that! The slow one does NOT have the initial volume groups msg, and finds everything. I suspect most of the time it is blocked on something - sda3? - but only outputs its list at the end of the period.

Re Comment 25 & Comment 30, I re-ran the Classic install and saved the ps/pstree output - but forgot to write them out to a USB stick (but the bug output goes straight there, so I have that); damn. Will re-do...

Re Comment 31, no, 16%.
Comment 33 Vladimir Zawalinski 2015-05-29 12:45:43 CEST
(In reply to Thierry Vignaud from comment #23)
> Well:
> - having a slower machine (the report happened on a AMD E1-1200)
> - having slow disk (eg: not SSD)
> - maybe installing a Korora instance
>   see "different from all the others in having loads of Secure Boot stuff in
> \EFI\ - and something at the level of that directory i.e. top level of the
> ESP."
> 
> Though " 50mounted-tests: error: umount error, retrying after 1 sec" that
> happens 6 times could be hint...

I installed korora 21 on the slow usb2 disk.  Tried installing mga5-gnome-live on the remaining space. Came across problem reported in 16055. Aborted installer , retaining live session and tried again. This time, installation aborted at the bootloader step. 'Killing runaway process' as per attachment.

Powered everything off. booted to live session, then powered on usb disk, started installer and everything completed normally.  This may discount presence of korora as direct factor.
Comment 34 Vladimir Zawalinski 2015-05-29 12:50:30 CEST
Created attachment 6665 [details]
Journals showing where bootloader generation runaway process killed.

Not sure how relevant this is to the os-finder discussion
Comment 35 Lewis Smith 2015-05-29 18:18:50 CEST
Created attachment 6666 [details]
'ps' while abortive Classic bootloader configuration is running

During bootloader configuration at the end of 27 May Classic ISO x64 installation on EFI hardware. Requested 'ps awx' not available, just plain 'ps'. 'pstree' follows.
Comment 36 Lewis Smith 2015-05-29 18:37:38 CEST
'pstree' during abortive bootloader configuration at the end of the Classic ISO x64 dated 27 May. Goes with previous 'ps' output. Bug report follows.
 $pstree
switch_root-+-dbus-daemon
            |-dbus-launch
            |-grub2-mount
            |-runinstall2-+-Xorg
            |             |-bash---pstree
            |             |-display_release
            |             |-drakx-matchbox-
            |             |-modprobe
            |             |-update-grub2---su---grub2-mkconfig---30_os-prober---30_os-prober-+-os-prober---os-prober-+-logger
            |             |                                                                  |                       `-os-prober---+
            |             |                                                                  |-paste
            |             |                                                                  `-tr
            |             `-update-menus
            |-switch_root
            `-systemd-udevd
This is difficult with word-wrap, + marks the branch to the next inner level.
The isolated os-prober branches from before logger;
Then paste & tr branch from before the first os-prober;
Then update-menus is at the same level as update-grub2, the last branch from before Xorg.
Comment 37 Lewis Smith 2015-05-29 18:49:44 CEST
Created attachment 6667 [details]
"bug output at the end of abortive bootloader configuration Classic x64 real EFI hardware.

Goes with previous two comments. ISO dated 27 May 2015.
Comment 38 Thierry Vignaud 2015-05-29 20:51:45 CEST
I should have told you to use /mnt/bin/ps else you use installer's ps which is more limited...
Comment 39 Lewis Smith 2015-05-30 08:50:53 CEST
(In reply to Thierry Vignaud from comment #38)
> I should have told you to use /mnt/bin/ps else you use installer's ps which
> is more limited...
Does this mean you want me to do it again - just for the '/mnt/bin/ps awx' O/P?

Most recently I have been dealing with this bug re Classic installer. But initially it was with a Live. Is there any difference for you between the two?
Comment 40 Thierry Vignaud 2015-05-30 19:08:28 CEST
Not for the bootloader setup.
They basically execute the same code for bootloader configuration with some minor differences but not for the actual boot loader installation.
So for this bug, both are identical.
Comment 41 Lewis Smith 2015-06-10 12:55:27 CEST
Good news: for the latest couple of Final rounds, Gnome Live & Classic, x64 on real EFI hardware, the installations have *not* crashed at the bootloader configuration stage. Something has changed.
However, that runs the entire 10 minutes with a blank window; plus another 10 minutes at the later bootloader installation stage. But it all goes to end if you sit it out, and the resulting installed systems boot OK.
I have added this to Errata, and downgraded the bug criticality.

What next?
Close the bug because the crash has 'gone away'; and open another for the 2 x 10 minute waits?
Or
Change its title and leave it open for the 2 x 10-minute waits.

Priority: release_blocker => High

Comment 42 Thierry Vignaud 2015-06-10 13:02:14 CEST
That's the same issue: os-prober took too long.

If it runs just below 10m, you will just have waited ~10m

If it takes just a little more time, we abort update-grub2 as it took too long.
We could extend the timeout for update-grub2 to eg 15m but that's not the right solution, it should not take that long...
Samuel Verschelde 2015-06-10 14:20:08 CEST

Whiteboard: (none) => FOR_ERRATA

Comment 43 Thierry Vignaud 2015-06-10 14:22:36 CEST
I don't think that deserves an errata.
We won't put an errata for pathologic cases
Comment 44 Samuel Verschelde 2015-06-10 14:26:05 CEST
I thought that could happen to other users having a lot of partitions, but if it's only one or a rare case, let's not add one.

Whiteboard: FOR_ERRATA => (none)

Comment 45 Lewis Smith 2015-06-13 20:18:41 CEST
(In reply to Thierry Vignaud from comment #42)
> That's the same issue: os-prober took too long.
> If it runs just below 10m, you will just have waited ~10m
> If it takes just a little more time, we abort update-grub2 as it took too
> long.
> We could extend the timeout for update-grub2 to eg 15m but that's not the
> right solution, it should not take that long...
To fix this, I suggest adding 1m to the timeout, 10m -> 11m. It is clear from my Comment 32 that there are situations where os-prober takes 10m, however unjustified that is. By allowing just *over* 10m, it will always carry on successfully, rather than the current situation where if it does go over 10m the bootloader installation crashes. This is disastrous, especially as it happens *after* the main installation process.

This does not preclude finding out 'Why?'; but be comforted that Mageia 5 is not alone with this problem.
Comment 46 Thierry Vignaud 2015-06-14 19:49:02 CEST
That's not a solution:
If you add more partitions on your system, next time it may timeout with 11mn...
Comment 47 Lewis Smith 2015-06-15 10:23:51 CEST
How do we know that this ~10m time limit is related to the number of partitions?
And why is it that having just tried '# time os-prober':
- Mageia 4: ~6 secs
- Mageia 5: ~8.5 minutes
In spite of the latter being <10m, I am sure that the installer (Live & Classic) ran the full 10 minutes before continuing.
Comment 48 Thierry Vignaud 2015-06-15 10:54:23 CEST
Please...

1) read your journalctl logs, os-timer took more than 10mn (attachment #6609 [details]):

Mai 22 08:41:33 localhost draklive-install[2746]: running: update-grub2  with root /mnt/install
Mai 22 08:51:33 localhost draklive-install[2746]: ERROR: killing runaway process (process=update-grub2, pid=20882, args=, error=ALARM at /usr/lib/libDrakX/run_program.pm line 241.
Mai 22 08:52:52 localhost logger[5274]: linux-boot-prober: debug: linux detected by /usr/lib/linux-boot-probes/50mounted-tests

Then compute 8:52:52 - 8:41:33. That's nearly 11mn 20s!
We killed its parent (the update-grub2 process we run exactly 10mn after it startup


2) 10m is a generic fixed timeout when we run a command
we don't want the installer to be stuck b/c of an external command

See http://gitweb.mageia.org/software/drakx/tree/perl-install/run_program.pm#n190
Either the command returns normally before 10mn (and waitpid() tell us so) or the timer expire and boum

The more partitions you've the more work os-prober will have to do (more fses to check, ...)
Comment 49 Thierry Vignaud 2015-06-15 11:30:52 CEST
Actually, I think we should reduce the timout for such commands to eg 2m and if it failed to complete in time, rerun it with GRUB_DISABLE_OS_PROBER (& remember to do so for 2nd run)
Comment 50 Thierry Vignaud 2015-06-15 11:49:57 CEST
Created attachment 6740 [details]
limit timeout to 2m and rety w/o os-prober if needed
Comment 51 Lewis Smith 2015-12-20 20:16:15 CET
As I seem to be the only complainant for the 2 * 10-minute bootloader installation waits both for installing Mageia 5 and any subsequent kernel updates to it, *and* Mageia 6 is emerging, I am inclined to grind my teeth and close this bug just to get ot off the books.
What do others think?
Comment 52 Lewis Smith 2016-05-26 09:50:21 CEST
Is Bug 18538 the same as this one?
Comment 53 Barry Jackson 2016-05-26 15:15:53 CEST
(In reply to Lewis Smith from comment #52)
> Is Bug 18538 the same as this one?

Looks like it.
Comment 54 Thierry Vignaud 2016-05-26 15:48:32 CEST
Thomas, we may end applying my patch (attachment #6740 [details]), re-running grub w/o os-prober if it take too long...
Comment 55 Thierry Vignaud 2016-05-26 15:58:15 CEST
Barry, we could do like Debian, increase GRUB_DISK_CACHE_BITS to 11:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=508834
They increaset it from 8 to 11, ours is even lower at 6...
Comment 56 Thierry Vignaud 2016-05-26 16:08:19 CEST
Could you test grub2-2.02-0.git10457.2.mga6 from core/updates_testing once it finishes building and land on your favorite mirror?
Comment 57 Lewis Smith 2016-05-26 20:35:21 CEST
(In reply to Thierry Vignaud from comment #56)
> Could you test grub2-2.02-0.git10457.2.mga6 from core/updates_testing once
> it finishes building and land on your favorite mirror?

Hoping that running Mageia 6 and selecting Updates Testing repos will show me the package, I will certainly try it.
And for Mageia 5?
Comment 58 Barry Jackson 2016-05-26 21:41:12 CEST
(In reply to Lewis Smith from comment #57)

> And for Mageia 5?

Let's see if this works in Cauldron first :)
Comment 59 Lewis Smith 2016-05-30 10:28:14 CEST
Testing x64 real EFI hardware Mageia 6

 os-prober-1.71-6.mga6
 grub2-common-2.02-0.git10457.5.mga6
 grub2-efi-2.02-0.git10457.5.mga6
 grub2-mageia-theme-2.02-0.git10457.5.mga6
Note that the grub2 version is advanced from that cited to try in Comment 56.
Mageia 6 Classic install is on sda11.

# time os-prober
/dev/sda10:Korora release 20 (Peach):Fedora:linux
/dev/sda12:Mageia 5 (5):Mageia:linux
/dev/sda13:Mageia 5 (5):Mageia1:linux
/dev/sda14:LMDE MATE Edition (1):LinuxMint:linux
/dev/sda2@/EFI/Microsoft/Boot/bootmgfw.efi:Windows Boot Manager:Windows:efi
/dev/sda8:Manjaro Linux (16.06-rc1):ManjaroLinux:linux
/dev/sda9:openSUSE 12.3 (x86_64):SuSE:linux
real	6m20.970s

# time update-grub
Generating grub configuration file ...
Found theme: /boot/grub2/themes/maggy/theme.txt
Found linux image: /boot/vmlinuz-4.6.0-desktop-1.mga6
Found initrd image: /boot/initrd-4.6.0-desktop-1.mga6.img
Found linux image: /boot/vmlinuz-desktop
Found initrd image: /boot/initrd-desktop.img
Found Korora release 20 (Peach) on /dev/sda10
Found Mageia 5 (5) on /dev/sda12
Found Mageia 5 (5) on /dev/sda13
Found LMDE MATE Edition (1) on /dev/sda14
Found Windows Boot Manager on /dev/sda2@/EFI/Microsoft/Boot/bootmgfw.efi
Found Manjaro Linux (16.06-rc1) on /dev/sda8
Found openSUSE 12.3 (x86_64) on /dev/sda9
done
real	7m53.911s

Well, better than 10m.

Can we close this bug (M5) and transfer the discourse to Bug 18538 (M6); to where I am copying this Comment.
Comment 60 Thierry Vignaud 2016-05-30 15:07:53 CEST
See bug #18538, some needed speeing patches were wrongly removed from os-prober.
Please test the new one.
Comment 61 Mageia Robot 2016-06-06 11:12:27 CEST
commit 8847eda6f7f8aaad07931290f1a37a5e44f7a426
Author: Thierry Vignaud <thierry.vignaud@...>
Date:   Mon Jun 6 11:12:04 2016 +0200

    enable to (un)install os-prober
    
    thus enabling to prevent slow boot (mga#16010, mga#18538)
---
 Commit Link:
   http://gitweb.mageia.org/software/drakx/commit/?id=8847eda6f7f8aaad07931290f1a37a5e44f7a426

 Bug links:
   Mageia
      https://bugs.mageia.org/16010
      https://bugs.mageia.org/18538
Thierry Vignaud 2016-06-08 16:37:03 CEST

See Also: (none) => https://bugs.mageia.org/show_bug.cgi?id=18538

Thierry Vignaud 2016-07-25 14:31:40 CEST

See Also: (none) => https://bugs.mageia.org/show_bug.cgi?id=19047

Thierry Vignaud 2016-07-25 14:34:18 CEST

Blocks: (none) => 416

Comment 62 Thierry Vignaud 2016-07-25 14:34:46 CEST
*** Bug 19047 has been marked as a duplicate of this bug. ***

See Also: https://bugs.mageia.org/show_bug.cgi?id=19047 => (none)
CC: (none) => westel

Comment 63 Samuel Verschelde 2016-10-10 17:34:27 CEST
Can we close this bug report and use bug 18538 for follow-ups?
Comment 64 Lewis Smith 2016-10-10 19:43:14 CEST
(In reply to Samuel Verschelde from comment #63)
> Can we close this bug report and use bug 18538 for follow-ups?
Quite happy to do so; indeed, I suggested as much in Comment 59. Do it if you wish; or tell me what categories to use to close it, and I will.

For the superseding bug, Ben has long suggested that the problem might be related to the number of partitions or installed systems. But my Comment 32 shows that this was not the only factor.
Comment 65 Lewis Smith 2016-11-17 14:17:03 CET
Closing RESOLVED FIXED because the title problem no longer exists. Originally bootloader installation crashed; now it can take a long time, but works. q.v.
 https://bugs.mageia.org/show_bug.cgi?id=15752
 https://bugs.mageia.org/show_bug.cgi?id=18538
which I suspect are duplicates of this residual problem & themselves.

Status: NEW => RESOLVED
Resolution: (none) => FIXED


Note You need to log in before you can comment on or make changes to this bug.