Running update-grub2 detects all my Linux kernels, and then hangs. As a workaround, I edited /etc/grub.d/30_os-prober to immediately exit. The root cause is that os-prober itself simply hangs, and never reurns till I kill-9 it. This is rather puzzling, since I have a very simple setup: sda1 : Mageia root fs (ext4) sda5 : swap sda6 : emptry ext4 partition for future use sdb1 : /home (ext4) The underlying process that is getting stuck is: mount -o ro -t ext2 /dev/sda2 /var/lib/os-prober/mount this is called by: /usr/lib/os-probes/50-mounted-tests.0007 /dev/sda2 and the underlying mount process cannot be terminated except by a reboot. Marking this as critical, because it is preventing urpmi --auto-update from running to completion. Reproducible: Steps to Reproduce:
What happens when you manually run "mount -o ro -t ext2 /dev/sda2 /var/lib/os-prober/mount"?
Keywords: (none) => NEEDINFOCC: (none) => thierry.vignaudAssignee: bugsquad => zen25000
> mount -o ro -t ext2 /dev/sda2 /var/lib/os-prober/mount The mount process hangs. It never returns, and cannot even be kill-9'd. The odd thing is, why would anything try to mount /dev/sda2, which is merely the container for /dev/sda5 and sda6 ? dmesg shows this: [ 2036.520345] INFO: task mount:22332 blocked for more than 120 seconds. [ 2036.520345] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 2036.520346] mount D ffffffff8160c1c0 0 22332 22325 0x00000004 [ 2036.520348] ffff88042bbd3c68 0000000000000082 ffff88042bbd3c18 ffff88042f028de0 [ 2036.520350] ffff880416bd0000 ffff88042bbd3fd8 ffff88042bbd3fd8 ffff88042bbd3fd8 [ 2036.520352] ffff88042d972d20 ffff880416bd0000 ffff88042bbd3cc8 ffff880428000400 [ 2036.520354] Call Trace: [ 2036.520355] [<ffffffff8144fe9f>] schedule+0x3f/0x60 [ 2036.520357] [<ffffffff8144ecea>] __mutex_lock_slowpath+0xca/0x140 [ 2036.520359] [<ffffffff8144e93a>] mutex_lock+0x2a/0x50 [ 2036.520360] [<ffffffff811601ef>] mount_bdev+0x7f/0x200 [ 2036.520362] [<ffffffff811d80b0>] ? ext2_error+0x130/0x130 [ 2036.520364] [<ffffffff811526bd>] ? __kmalloc_track_caller+0x13d/0x190 [ 2036.520365] [<ffffffff811d6a35>] ext2_mount+0x15/0x20 [ 2036.520367] [<ffffffff81161103>] mount_fs+0x43/0x1b0 [ 2036.520369] [<ffffffff81126710>] ? __alloc_percpu+0x10/0x20 [ 2036.520370] [<ffffffff8117ac92>] vfs_kern_mount+0x72/0x110 [ 2036.520372] [<ffffffff8117b4f4>] do_kern_mount+0x54/0x110 [ 2036.520373] [<ffffffff8117cce4>] do_mount+0x1a4/0x850 [ 2036.520375] [<ffffffff81120bab>] ? memdup_user+0x4b/0x90 [ 2036.520376] [<ffffffff81120c4b>] ? strndup_user+0x5b/0x80 [ 2036.520378] [<ffffffff8117d4d0>] sys_mount+0x90/0xe0 [ 2036.520379] [<ffffffff81458e79>] system_call_fastpath+0x16/0x1b
Obviously os-prober should ignore (primary) extended partitions and only look at regular primary/logical partitions
Summary: update-grub2 hangs and never returns - caused by os-prober => update-grub2 hangs and never returns (os-prober stuch trying to mount extended partition)
Yes, that's true. If I run /usr/lib/os-probes/50-mounted-tests.0007 /dev/sda5 #or sda6 then it's fine: it returns 0 instantly. However, I don't know if it's os-prober that is to blame here. I think the problem is that the mount call is locking up, rather than failing quickly on what should be an invalid request. It's invalid to try to mount /dev/sda2, and it's certainly invalid to mount it with "-t ext2". Aside: there is something rather wrong with mount on this system. Experimenting a bit with my spare partition (empty /dev/sda6, ext4): mount -t ext4 /dev/sda6 /spare #works mount -t ext2 /dev/sda6 /spare #fails [1] mount -o rw /dev/sda6 /spare #works mount -o ro /dev/sda6 /spare #fails [2] [1] is correct to fail, but the error message is wrong: "mount: /dev/sda6 is already mounted or /spare is busy" [2] should't fail, but it does. Furthermore, it gives the same error message as [1]. (it's kernel 3.3.8-desktop-2.mga2)
This is a kernel problem in 3.3.8.mga2 If I try all the tests above under 3.8.8.mga3, then everything works fine. BUT... I think people are going to hit this bug when upgrading from Mga2: it will result in update-grub2 hanging, and in turn, making urpmi stall. (Also, I can't run a later kernel than that at the moment, because the 3.8.8 kernel refuses to load the module for my graphics card, an intel i915!)
Thomas, see comment #4 & #5
CC: (none) => tmb
Blocks: (none) => 416
(In reply to Richard Neill from comment #5) > This is a kernel problem in 3.3.8.mga2 If I try all the tests above under > 3.8.8.mga3, then everything works fine. > > BUT... I think people are going to hit this bug when upgrading from Mga2: it > will result in update-grub2 hanging, and in turn, making urpmi stall. It is unlikely to hit upgraders as grub2 was not offered in Mga2. > (Also, I can't run a later kernel than that at the moment, because the 3.8.8 > kernel refuses to load the module for my graphics card, an intel i915!) Resetting assignee and qa to default as this appears not to be a grub2/os-prober fault.
Assignee: zen25000 => bugsquad
It _is_ also an os-prober issue. It should not try to mount _logical_ partitions. That's just pure non sense
CC: (none) => zen25000
(In reply to Richard Neill from comment #5) > This is a kernel problem in 3.3.8.mga2 If I try all the tests above under > 3.8.8.mga3, then everything works fine. > > BUT... I think people are going to hit this bug when upgrading from Mga2: it > will result in update-grub2 hanging, and in turn, making urpmi stall. > > (Also, I can't run a later kernel than that at the moment, because the 3.8.8 > kernel refuses to load the module for my graphics card, an intel i915!) I just pushed an updated os-prober to cauldron updates/testing, however I'm unsure if the new Fedora patches applied actually address this problem (one is related to extended partitions and btrfs). Could you test os-prober-1.57-6.mga3 please? Thanks, Barry
I've just tried to get the latest version from the mirrors, but the latest one available (by urpmi --auto-select) is still 1.57-5.mga3. Am I missing something?
(In reply to Richard Neill from comment #10) > I've just tried to get the latest version from the mirrors, but the latest > one available (by urpmi --auto-select) is still 1.57-5.mga3. Am I missing > something? You need to enable 'core updates testing' repo just to install it and then disable that repo again.
Thanks. I managed to install 1.57-6. os-prober still hangs, though fortunately it's now killable with "Ctrl-\". syslog shows... May 3 23:14:43 chocolate logger: os-prober: debug: running /usr/lib/os-probes/50mounted-tests on /dev/sda2 May 3 23:14:43 chocolate logger: os-prober: debug: running /usr/lib/os-probes/50mounted-tests.0007 on /dev/sda2 This is a problem of 2 parts: os-prober shouldn't probe a logical partition, also there is something wrong with mount.
Summary: 1. In kernel 3.3.8-desktop-2.mga2, os-prober (1.57-6) shows the bug (probing a logical partition), and the kernel's mount call hangs. 2. In kernel 3.8.11-desktop-1.mga3, os-prober still tries to probe the logical partition (sda2), but does so safely, and the kernel doesn't hang when mounting. Fragment of syslog below. 3. [Annoyingly, I am still stuck with 3.3.8 because of another bug which prevents the intel graphics driver loading on newer kernels!] ------------ May 3 23:23:57 chocolate logger: os-prober: debug: running /usr/lib/os-probes/50mounted-tests on /dev/sda2 May 3 23:23:57 chocolate logger: os-prober: debug: running /usr/lib/os-probes/50mounted-tests.0007 on /dev/sda2 May 3 23:23:57 chocolate kernel: EXT2-fs (sda2): error: unable to read superblock May 3 23:23:57 chocolate kernel: EXT4-fs (sda2): unable to read superblock May 3 23:23:57 chocolate kernel: cramfs: wrong magic May 3 23:23:57 chocolate kernel: EXT3-fs (sda2): error: unable to read superblock May 3 23:23:57 chocolate kernel: REISERFS warning (device sda2): sh-2006 read_super_block: bread failed (dev sda2, block 8, size 1024) May 3 23:23:57 chocolate kernel: REISERFS warning (device sda2): sh-2006 read_super_block: bread failed (dev sda2, block 64, size 1024) May 3 23:23:57 chocolate kernel: REISERFS warning (device sda2): sh-2021 reiserfs_fill_super: can not find reiserfs on sda2 May 3 23:23:57 chocolate kernel: XFS (sda2): bad magic number May 3 23:23:57 chocolate kernel: XFS (sda2): SB validate failed May 3 23:23:57 chocolate kernel: FAT-fs (sda2): bogus number of reserved sectors May 3 23:23:57 chocolate kernel: FAT-fs (sda2): Can't find a valid FAT filesystem May 3 23:23:57 chocolate kernel: FAT-fs (sda2): bogus number of reserved sectors May 3 23:23:57 chocolate kernel: FAT-fs (sda2): Can't find a valid FAT filesystem May 3 23:23:57 chocolate kernel: MINIX-fs: unable to read superblock May 3 23:23:57 chocolate kernel: attempt to access beyond end of device May 3 23:23:57 chocolate kernel: sda2: rw=0, want=3, limit=2 May 3 23:23:57 chocolate kernel: hfs: unable to find HFS+ superblock May 3 23:23:57 chocolate kernel: qnx4: wrong fsid in superblock. May 3 23:23:57 chocolate kernel: You didn't specify the type of your ufs filesystem May 3 23:23:57 chocolate kernel: May 3 23:23:57 chocolate kernel: mount -t ufs -o ufstype=sun|sunx86|44bsd|ufs2|5xbsd|old|hp|nextstep|nextstep-cd|openstep ... May 3 23:23:57 chocolate kernel: May 3 23:23:57 chocolate kernel: >>>WARNING<<< Wrong ufstype may corrupt your filesystem, default is ufstype=old May 3 23:23:57 chocolate kernel: hfs: can't find a HFS filesystem on dev sda2.
Thanks for testing - I was hoping those patches may have fixed it. :( So this is harmless with later kernels and so is not actually critical for Mga3, however it still *should* be fixed in os-prober. I will report upstream.
Keywords: NEEDINFO => (none)Assignee: bugsquad => zen25000
(In reply to Richard Neill from comment #13) Before I do please try the latest unstable just to be sure this is not already fixed upstream : http://mtf.no-ip.co.uk/pub/linux/barjac/distrib/cauldron/x86_64/media/extra/release/os-prober-1.58-1.mga3.x86_64.rpm or for 32 bit: http://mtf.no-ip.co.uk/pub/linux/barjac/distrib/cauldron/i586/media/extra/release/os-prober-1.58-1.mga3.i586.rpm Thanks, Barry
Thanks. Yes, this still affects the latest version 1.58-1, which still probes /dev/sda2. I agree - please file this upstream, but it's not a serious critical bug any more. [I remain mystified by the weird behaviour of "mount" under the older kernel 3.3.8, but that's not really relevant to this bug]
Bug report sent upstream - if it gets a number I will post it here.
This is not a release blocker so removing from tracker
Blocks: 416 => (none)Severity: critical => normal
Summary: update-grub2 hangs and never returns (os-prober stuch trying to mount extended partition) => os-prober tries to mount extended partition
Status: NEW => ASSIGNED
This is now working, as of Mga4, so can be closed. On the same system as previously (which has only Linux present), running "/etc/grub.d/30_os-prober" now exits successfully, and prints nothing. (This is as expected).
Status: ASSIGNED => RESOLVEDResolution: (none) => FIXED
It still tries to open extended partition in mga5. It should only look at logical partitions (the contents) not the extended one (the container).
Status: RESOLVED => REOPENEDResolution: FIXED => (none)
Source RPM: os-prober-1.57-5.mga3.src.rpm => os-prober-1.65-6.mga5.src.rpm
I think that's b/c we're totally unsynced with RH which has fixed it: http://pkgs.fedoraproject.org/cgit/os-prober.git/commit/?id=1cc85085b1fb5ddf7e66548f2db7808397f23c11
URL: (none) => http://pkgs.fedoraproject.org/cgit/os-prober.git/commit/?id=1cc85085b1fb5ddf7e66548f2db7808397f23c11
Also we should exclude backup files generated by patch: /usr/lib/linux-boot-probes/mounted/40grub2.0007 /usr/lib/linux-boot-probes/mounted/40grub2.0015 /usr/lib/os-probes/50mounted-tests.0007 /usr/lib/os-probes/50mounted-tests.0012 /usr/lib/os-probes/mounted/05efi.0013 /usr/lib/os-probes/mounted/20microsoft.0016 /usr/lib/os-probes/mounted/83haiku.0016 /usr/lib/os-probes/mounted/90linux-distro.0001 /usr/lib/os-probes/mounted/90linux-distro.0004 /usr/lib/os-probes/mounted/90linux-distro.0007 /usr/lib/os-probes/mounted/90linux-distro.0011 /usr/lib/os-probes/mounted/90linux-distro.0012
Actually removing those files fix that issue. it's a side effect of switching to %apply_patches (bug #15579) It also make it faster...
Well, we still have a bogus error message but that's quite a lot better: /dev/sdb3:Fedora release 13 (Goddard):Fedora:linux rmdir: failed to remove â/var/lib/os-prober/mountâ: Device or resource busy
URL: http://pkgs.fedoraproject.org/cgit/os-prober.git/commit/?id=1cc85085b1fb5ddf7e66548f2db7808397f23c11 => (none)
This should do it: %exclude /usr/lib/os-probes/*.00?? %exclude /usr/lib/os-probes/mounted/*.00??
OK thanks, Fixed the patch backup files in svn. Committed revision 819882 Fedora are using 1.57 we are using 1.65, so not sure if any more of their patches are needed. . Shall I ask for freeze push now or is there anything else? I have tested in UEFI and BIOS with no apparent regressions. Index: SPECS/os-prober.spec =================================================================== --- SPECS/os-prober.spec (revision 819739) +++ SPECS/os-prober.spec (working copy) @@ -1,7 +1,7 @@ %define _libexecdir %{_exec_prefix}/lib Name: os-prober Version: 1.65 -Release: %mkrel 6 +Release: %mkrel 7 Summary: Probes disks on the system for installed operating systems License: GPLv1 and GPLv2+ Group: System/Boot and Init @@ -73,6 +73,7 @@ install -m 755 -p os-probes/mounted/powerpc/20macosx \ %{buildroot}%{_libexecdir}/os-probes/mounted fi +find %{buildroot}/usr/lib/os-probes -name "*.00??" -delete %files %doc README TODO debian/copyright debian/changelog COPYING-note.txt
BTW I see no errors when running os-prober-1.65-7 in an EFI system with multiple BIOS HD attached (two of which have extended partitons) [root@localhost ~]# os-prober /dev/sda3:Mageia 4 (4):Mageia:linux /dev/sda5:Mageia 2 (2):Mageia1:linux /dev/sda6:Mageia 3 (3):Mageia2:linux /dev/sda7:Mageia 5 (5):Mageia3:linux /dev/sda8:Mageia 5 (5):Mageia4:linux /dev/sdb16:Mageia 5:Mageia5:linux [root@localhost ~]# In debug mode there are these for both of them, so it does try to run 50mounted-tests on the extended: --------------snip-------------- os-prober: debug: running /usr/lib/os-probes/50mounted-tests on /dev/sda2 --------------snip-------------- os-prober: debug: running /usr/lib/os-probes/50mounted-tests on /dev/sdb2 os-prober: debug: /dev/sdb5: is active swap os-prober: debug: /dev/sdb6: is active swap os-prober: debug: running /usr/lib/os-probes/50mounted-tests on /dev/sdb7 50mounted-tests: debug: mounted as ext3 filesystem 50mounted-tests: debug: running subtest /usr/lib/os-probes/mounted/05efi 05efi: debug: /dev/sdb7 is ext3 partition: exiting 50mounted-tests: debug: running subtest /usr/lib/os-probes/mounted/10freedos 10freedos: debug: /dev/sdb7 is not a FAT partition: exiting ------------snip-------------- Note that it quietly fails to mount them. See further down where it says: os-prober: debug: running /usr/lib/os-probes/50mounted-tests on /dev/sdb7 50mounted-tests: debug: mounted as ext3 filesystem ...where it has mounted sdb7 and reports the fs. So is this an issue?
(In reply to Barry Jackson from comment #26) I've a freeze push (In reply to Barry Jackson from comment #27) It still better (quite a lot less dangerous commands run on extended partitions)
(In reply to Thierry Vignaud from comment #28) > (In reply to Barry Jackson from comment #26) > I've a freeze push > > (In reply to Barry Jackson from comment #27) > It still better (quite a lot less dangerous commands run on extended > partitions) I had already committed it in #26. The testing I referred to in #27 was with the fix I committed in #26. Never mind it won't do any harm to have both, I'll remove one next time it's updated. ;)
Closing then
Status: REOPENED => RESOLVEDResolution: (none) => FIXED
Depends on: (none) => 15579