Install of the current kernel packages is blocking, and only responds to kill -9: virtualbox (6.0.12-2.mga8): Installing module. ............. ....... Creating: target|kernel|dracut args|basicmodules remove-boot-splash: Format of /boot/initrd-5.3.5-desktop-1.mga8.img not recognized ^C^C^C^C^C^C^C^C^C^C^Cwarning: %posttrans(kernel-desktop-5.3.5-1.mga8-1-1.mga8.x86_64) scriptlet failed, signal 9 ERROR: 'script' failed for x11-driver-video-intel-2.99.917-56.mga7.x86_64 Update: This has happened now 3 times on both a laptop and a desktop system, and in the last two times "kill -9" had no effect and a reboot was required. The freeze appears to happen during the install of different coreqs of the kernel, e. g. VirtualBox. "ps ax" shows the status of the hung urpmi as D.
Can you please give some background information? e.g. * Is this happening in a VB virtual machine, or a real hardware Mageia installation? You mention "both a laptop and a desktop system". * If real hardware, can you say what graphics hardware you have? (since there is an error re an Intel video driver). * You say "installing the kernel", and mention urpmi. So, is this happening when updating an existing system? Can you give the failing command? * Can you give your current/previous kernel version; and that of the one failing to install? * After the failed/aborted kernel install, can you still boot successfully to the previous one? TIA Assigning to the kernel team.
Assignee: bugsquad => kernel
See also bug 25542.
CC: (none) => lewyssmith
(In reply to Lewis Smith from comment #1) > Can you please give some background information? e.g. > * Is this happening in a VB virtual machine, or a real hardware Mageia > installation? You mention "both a laptop and a desktop system". Real hardware, during "urpmi --auto-update" > * If real hardware, can you say what graphics hardware you have? (since > there is an error re an Intel video driver). The laptop has two cards: Identification Vendor: Intel Corporation Description: UHD Graphics 620 Media class: VGA compatible controller Connection Bus: PCI Express PCI domain: 0 Bus PCI #: 0 PCI device #: 2 PCI function #: 0 PCI revision: 0x07 Vendor ID: 0x8086 Device ID: 0x5917 Sub vendor ID: 0x1043 Sub device ID: 0x163e Misc Module: Card:Intel 810 and later Identification Vendor: NVIDIA Corporation Description: GP108M [GeForce MX150] Media class: 3D controller Connection Bus: PCI Express PCI domain: 0 Bus PCI #: 1 PCI device #: 0 PCI function #: 0 PCI revision: 0xa1 Vendor ID: 0x10de Device ID: 0x1d10 Sub vendor ID: 0x1043 Sub device ID: 0x163e Misc Module: Card:NVIDIA GeForce 635 series and later As far as I know, it's using the Intel card. For the desktop, it's Identification Vendor: Advanced Micro Devices, Inc. [AMD/ATI] Description: RS780D [Radeon HD 3300] Media class: VGA compatible controller Connection Bus: PCI PCI domain: 0 Bus PCI #: 1 PCI device #: 5 PCI function #: 0 Vendor ID: 0x1002 Device ID: 0x9614 Sub vendor ID: 0x1565 Sub device ID: 0x0217 Misc Module: Card:ATI Radeon HD 4870 and earlier > * You say "installing the kernel", and mention urpmi. So, is this happening > when updating an existing system? Can you give the failing command? "urpmi --auto-update", as above. > * Can you give your current/previous kernel version; and that of the one > failing to install? This appears to have happened with the last 3 kernels to hit cauldron. The current one is 5.3.4-desktop-1.mga8 > * After the failed/aborted kernel install, can you still boot successfully > to the previous one? In general, you can even boot to the current one, but there may be incompletely installed packages. In one case, every Konsole window that opened showed an error claiming that a lib64gdk library was not long enough, and I had to reinstall the rpm containing that library using --replacepkgs to correct this. However, see also bug#25542.
Kernel is not at fault, it only calls out to /sbin/installkernel and the toolchain / utils takes over... Could be grub2, os-prober, some other thing hanging... What happends if you actually wait it out... ?
CC: (none) => tmbAssignee: kernel => mageiatoolsSource RPM: kernel => bootloader-utils, drakxtools-backend, grub(2)?
(In reply to Thomas Backlund from comment #4) > What happends if you actually wait it out... ? I left one occurrence hanging for several hours with no progress.
Ok, so something (even maybe the new rpm) is hanging, as iirc we normally should time out after ~10 minutes ... So we'd need something like strace or gdb backtrace to see what/where we get stuck. Of course if you happend to have an older kernel (5.1 or 5.2 series) installed, it would be nice to see if the same hang happends when you install a new 5.3.5 kernel
OK, I'll do the updating with strace and get a gdb backtrace if I get a hang.
I was just about to close this when I got a hit. I have the strace file, which is unfortunately about 1 GB in size. I'll attach the last 1000 lines, as well as the stdout lines leading up to the hang. I can't get a gdb backtrace, since gdb won't attach to a process being straced. The hang appears to happen in the midst of a syncfs() call.
Created attachment 11321 [details] Last 1000 lines of strace
Created attachment 11322 [details] tail end of urpmi stdout
As before, the hung process does not respond to CTRL-C, kill, or kill -9.
The syncfs() appears to be holding some sort of lock, because although it was only the Konsole window running urpmi that was hung initially, other open Konsole windows hung as time passed. Eventually, everything including Plasma and X became unresponsive, requiring a magic key reboot.
CC: lewyssmith => (none)
Got another hit, same exact strace signature - openat() of "/" followed by syncfs() of that fd. This one had nothing to do with kernel, but was installing task-obsolete and nothing else.
Summary: Kernel install freezes, requiring "kill -9" or reboot => urpmi activity causes occasional symptom hang in syncfs() on root filesystemSource RPM: bootloader-utils, drakxtools-backend, grub(2)? => urpmi, rpm
Summary: urpmi activity causes occasional symptom hang in syncfs() on root filesystem => urpmi activity causes occasional system hang in syncfs() on root filesystem
After the latest hang, I tried to verify if bug#25573 was still occuring, and when I tried to invoke drakboot from MCC it eventually timed out with: The "drakboot" program has crashed with the following error: update-grub2 failed: at /usr/lib/libDrakX/any.pm line 697. ...propagated at /usr/libexec/drakboot line 49. Perl's trace: drakbug::bug_handler() called from /usr/libexec/drakboot:49 Used theme: oxygen-gtk To submit a bug report, click on the report button. This will open a web browser window on Bugzilla where you'll find a form to fill in. The information displayed above will be transferred to that server Things useful to attach to your report are the output of the following commands: 'lspcidrake -v', 'blkid'. You should also attach the following files: /etc/modprobe.conf, /etc/fstab, /boot/grub/menu.lst, /boot/grub/devices.map as well as /etc/lilo.conf.
I found another oddity in a subsequent update on the same system as the two above. The update for a tex package appeared to stall midway, and when I used "tail -f" on the strace file, it was reading and writing as fast as it could. That update never completed, but the strace activity continued. As in the syncfs() cases, various components in the system stopped responding, and I ended up rebooting. On a hunch, I did an "rpm --rebuilddb", which succeeded, and then restarted the update which also succeeded. I'd hold off pursuing this unless it happens again. It's possible that the rpm database was screwed up and causing infinite I/O loops.
Hello, I saw hangs with another application, konversation. When I look at the activity, konversation was "waiting for disk". This occurred after installation of kernel-desktop-5.3.2-1.mga7-1-1.mga7. Each time this occurred and that I tried to kill this unresponsive application, the whole desktop became unresponsive, except the moving of the mouse. Magic keys wasn't of any help. I use now a previous kernel which is fine.
CC: (none) => yves.brungard_mageia
It's happened again, but with a variation. This time the hang occurs in a different sort of sync: 23451 getpid() = 23451 23451 getpid() = 23451 23451 getpid() = 23451 23451 getpid() = 23451 23451 getpid() = 23451 23451 getpid() = 23451 23451 getpid() = 23451 23451 pread64(6, "\0\0\0\0\1\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0*\1\"\6\0\r\373\17\366\17\361\17"..., 4096, 4096) = 4096 23451 pwrite64(6, "\0\0\0\0\1\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0*\1\"\6\0\r\373\17\366\17\361\17"..., 4096, 4096) = 4096 23451 fdatasync(6
Yet another variation. In this case, the urpmi completes, both with a new prompt on the command line and the strace indicating "exited with rc 0". However, the Konsole window in which urpmi ran has stopped responding and will not receive the focus. The same is true for other Konsole windows ***on the same virtual desktop***, but not Konsole windows on other virtual desktops. I don't understand this. My only guess is that some kernel callback related to sync activity but running asynchronously to the client app request is hanging while holding some lock needed by other processes. What this should have to do with virtual desktops is anybody's guess. One other symptom I've seen when switching to a tty to initiate reboot is that the tty is receiving messages to the effect that journald-stop has timed out, so perhaps the hang is related to journald not responding.
Could be update-grub2 (via os-prober) doing sg harmful on some partitions. Did you try disabling "Probe Foreign OS" in drakboot? That should fix timeouting while running update-grub2 AFAIC, this is not an urpmi/rpm bug, calling sync() is not a bug.
CC: (none) => thierry.vignaudKeywords: (none) => NEEDINFOSource RPM: urpmi, rpm => kernel, urpmi, rpm, grub2, os-prober
I'm closing this as RESOLVED. I have been running urpmi under strace on both systems with no hangs for a couple of months. Wherever this bug was, it has apparently been fixed.
Status: NEW => RESOLVEDResolution: (none) => FIXED