Description of problem: Resuming from suspend, I only see a mouse pointer on black screen. I can Ctrl-Alt-F2 and log in as root and reboot Plasma, nouveau Kernel versions: Fail: 6.2.2-2, 6.2.6-1 OK: 6.1.12-2, 6.1.14-1 How reproducible: Every try Excerpt from journal: mar 17 21:13:51 localhost plasmashell[4088]: Aborting shell load: The activity manager daemon (kactivitymanagerd) is not running. mar 17 21:13:51 localhost plasmashell[4088]: If this Plasma has been installed into a custom prefix, verify that its D-Bus services dir is known to the system for> mar 17 21:13:51 localhost kernel: ACPI: \_SB_.PCI0.LPC_.EC__.HKEY: BCSG: evaluate failed mar 17 21:13:51 localhost kernel: ------------[ cut here ]------------ mar 17 21:13:51 localhost kernel: irq 26 handler nvkm_intr+0x0/0x240 [nouveau] enabled interrupts mar 17 21:13:51 localhost kernel: WARNING: CPU: 2 PID: 3547 at kernel/irq/handle.c:161 __handle_irq_event_percpu+0x153/0x1a0 mar 17 21:13:51 localhost kernel: Modules linked in: rfcomm ip6t_REJECT nf_reject_ipv6 xt_comment ip6table_mangle ip6table_nat ip6table_raw ip6table_filter ip6_ta> mar 17 21:13:51 localhost kernel: btbcm btmtk videobuf2_common btrtl mei_wdt btintel mc kvm bluetooth snd_ctl_led irqbypass snd_hda_codec_conexant ecdh_generic e> mar 17 21:13:51 localhost kernel: drm_kms_helper mxm_wmi video wmi drm cec dm_mirror dm_region_hash dm_log dm_mod mar 17 21:13:51 localhost kernel: CPU: 2 PID: 3547 Comm: Xorg Not tainted 6.2.6-desktop-1.mga9 #1 mar 17 21:13:51 localhost kernel: Hardware name: LENOVO 4349A13/4349A13, BIOS 6MET92WW (1.52 ) 09/26/2012 mar 17 21:13:51 localhost kernel: RIP: 0010:__handle_irq_event_percpu+0x153/0x1a0 Tell what info I shall look for.
Still valid kernel 6.2.7-desktop-1 With all other updates per now.
CC: (none) => tmbAssignee: tmb => kernel
Same with 6.2.8-desktop-1.mga9
Full update, not better. Observation When at black screen with cursor, i can ctrl-alt-backspace,backspace and the login dialogue appear again. I enter password and go, but computer hangs, cant Ctrl-Alt-F2 to another terminal, and neither ctrl-alt-backspace,backspace nor ctrl-alt-del,del works. When i use xfce instead of Plasma, desktop appear, but Firefox is unresponsive. I can launch the notepad and enter some text. But soon desktop is frozen. Tried switching from SDDM to LightDM, minor better. Nothing obvious in journal. Next, i should try another graphics driver.
Drives used is nouveau - I can not find another working at all. GPU: GT218M [NVS 3100M] Proprietary nvidia driver is supposed to be the 340, which we do not have. I tried a couple other but it failed. Also tried and failed Xorg vesa. So this bug I think is about regression in kernel 6.2 compatibility with nouveau driver for this GPU.
I got the same under nouveau too, however I get exactly the same problem using modesetting under qemu-5.2.0 (and with native to intel driver), so probably it's not just limited to nouveau. As for nvidia340 it's in obsolete now, because EOL, however with a bit of patience and if you do not care about potential security problems and for testing, it could be possible to rebuild locally the driver 340.xx (admitting it works with current xorg due to API increased version) from here: https://svnweb.mageia.org/packages/obsolete/nvidia340/current/SPECS/ and then applying on sequence the following patchset from patch for kernel 5.11 up to kernel 6.2: https://bugs.mageia.org/show_bug.cgi https://svnweb.mageia.org/packages/obsolete/nvidia340/current/SPECS/nvidia340.spec?view=log https://aur.archlinux.org/cgit/aur.git/tree/0005-kernel-5.11.patch?h=nvidia-340xx https://aur.archlinux.org/cgit/aur.git/tree/0006-kernel-5.14.patch?h=nvidia-340xx https://aur.archlinux.org/cgit/aur.git/tree/0007-kernel-5.15.patch?h=nvidia-340xx https://aur.archlinux.org/cgit/aur.git/tree/0008-kernel- 5.16.patch?h=nvidia-340xx https://aur.archlinux.org/cgit/aur.git/tree/0009-kernel-5.17.patch?h=nvidia-340xx https://aur.archlinux.org/cgit/aur.git/tree/0010-kernel-5.18.patch?h=nvidia-340xx https://aur.archlinux.org/cgit/aur.git/tree/0011-kernel-6.0.patch?h=nvidia-340xx https://aur.archlinux.org/cgit/aur.git/tree/0012-kernel-6.2.patch?h=nvidia-340xx
CC: (none) => ghibomgx
Some more testing: § Testing today above and below is with kernel 6.2.10-desktop-2.mga9 § same level of problem in iceVM as in xcfe: after resume it kind of works for a while but soon hang. § I see artefacts in xcfe window frames (but that is an old problem), but not in Plasma nor iceVM § SDDM ficker when i move mouse. I think i have seen it before. LightDM do not show this problem. § After disabling hardware acceleration and hardware mouse pointer, i could use iceVM after resume wihtout problem (did not test very long, but definitely better) Resuming into xfce it hung pretty quick.
Keywords: (none) => FOR_ERRATA9Hardware: x86_64 => AllSummary: Kernel 6.2 breaks resuming from suspend on Thinkpad T510 (OK with 6.1 series) => Kernel 6.2 regression resuming from suspend on some systems
For simplest workaround in errata, is it a bad idea to use kernel 6.1 from mga8 backport? (I have not tried) Or could we get a 6.1 kernel as alternative in mga9? Or will we be able to fix whatever problem there is with kernel6.2/nouveau/x11/.. I have not tried switching to Wayland, could that be an idea?
We could try kernel-linus and if it shows the same problems we can try to do a report on upstream bugzilla.kernel.org.
Same problem using kernel-linus-6.2.10-1.mga9 Sidenote: Another problem have showed up with Plasma: after last days updates it do not get to desktop even with kernel 6.1.14 and a fresh user. Possibly worsened incompatibility with nouveau on this GPU? Or something broke due to the hard reboots from hang... Will do fresh install with beta 2 ISOs when available.
Hm. Installed GNOME and it too like Plasma fail to get to showing desktop, but it show a dialog about it and it works to log out. This is true for both GNOME and GNOME X11, on both kernel desktop 6.2.10-2 and 6.1.14-1. I dont think i ever tested GNOME on this machine, but... another thing to test with next ISO. XFCE and IceWM works like before.
(In reply to Morgan Leijström from comment #9) > Same problem using kernel-linus-6.2.10-1.mga9 > > Sidenote: > Another problem have showed up with Plasma: after last days updates it do > not get to desktop even with kernel 6.1.14 and a fresh user. Possibly > worsened incompatibility with nouveau on this GPU? Or something broke due to > the hard reboots from hang... Will do fresh install with beta 2 ISOs when > available. I've not noticed this kind of behaviour after plasma update, though the resume problem seems still there. There were also mesa (23.0.2) updates recently. It could be also that the problems were twos: one with resume after suspend and the other with 3D. What we can try is also to disable the hardware acceleration for 3D in nouveau driver inxorg.conf, and rely only on llvmpipe software driver. As native it should have a OpenGL compatibility level of 4.x so enough for most of operations. It would be a lot slower, especially if you are on a pretty old CPU, but at least shouldn't hang.
As per my comment 6, it got slightly better by disabling hardware acceleration but for IceWM only. Set using drakx11, I dont know how to set it in xorg.conf. I read somewhere that modern systems rely less on xorg.conf, i.e Plasma overrides things, but this is nut my cup of tea. Now I intended to try old nv, but Bug 31788 - drakx11 offers xorg nv, then tell there is no package x11-driver-video-nv
Driver "xorg modesetting" works! :) I can use desktop after resume from suspend in both XFCE and IceWM. $ uname -a Linux localhost 6.2.10-desktop-2.mga9 #1 SMP PREEMPT_DYNAMIC Mon Apr 10 13:11:58 UTC 2023 x86_64 GNU/Linux --- Here is from last boot incl a few suspend/resume: (all *successful*) $ sudo journalctl -b | grep nouveau apr 13 10:39:13 localhost kernel: nouveau 0000:01:00.0: vgaarb: deactivate vga console apr 13 10:39:13 localhost kernel: nouveau 0000:01:00.0: NVIDIA GT218 (0a8600b1) apr 13 10:39:13 localhost kernel: nouveau 0000:01:00.0: bios: version 70.18.87.00.00 apr 13 10:39:13 localhost kernel: nouveau 0000:01:00.0: fb: 512 MiB DDR3 apr 13 10:39:13 localhost kernel: nouveau 0000:01:00.0: DRM: VRAM: 512 MiB apr 13 10:39:13 localhost kernel: nouveau 0000:01:00.0: DRM: GART: 1048576 MiB apr 13 10:39:13 localhost kernel: nouveau 0000:01:00.0: DRM: TMDS table version 2.0 apr 13 10:39:13 localhost kernel: nouveau 0000:01:00.0: DRM: DCB version 4.0 apr 13 10:39:13 localhost kernel: nouveau 0000:01:00.0: DRM: DCB outp 00: 01800323 00010034 apr 13 10:39:13 localhost kernel: nouveau 0000:01:00.0: DRM: DCB outp 01: 02811300 00000000 apr 13 10:39:13 localhost kernel: nouveau 0000:01:00.0: DRM: DCB outp 02: 028223a6 0f220010 apr 13 10:39:13 localhost kernel: nouveau 0000:01:00.0: DRM: DCB outp 03: 02822362 00020010 apr 13 10:39:13 localhost kernel: nouveau 0000:01:00.0: DRM: DCB outp 04: 048333b6 0f220010 apr 13 10:39:13 localhost kernel: nouveau 0000:01:00.0: DRM: DCB outp 05: 04833372 00020010 apr 13 10:39:13 localhost kernel: nouveau 0000:01:00.0: DRM: DCB outp 06: 088443c6 0f220010 apr 13 10:39:13 localhost kernel: nouveau 0000:01:00.0: DRM: DCB outp 07: 08844382 00020010 apr 13 10:39:13 localhost kernel: nouveau 0000:01:00.0: DRM: DCB conn 00: 00000040 apr 13 10:39:13 localhost kernel: nouveau 0000:01:00.0: DRM: DCB conn 01: 00000100 apr 13 10:39:13 localhost kernel: nouveau 0000:01:00.0: DRM: DCB conn 02: 00101246 apr 13 10:39:13 localhost kernel: nouveau 0000:01:00.0: DRM: DCB conn 03: 00202346 apr 13 10:39:13 localhost kernel: nouveau 0000:01:00.0: DRM: DCB conn 04: 00410446 apr 13 10:39:13 localhost kernel: nouveau 0000:01:00.0: DRM: MM: using COPY for buffer copies apr 13 10:39:13 localhost kernel: [drm] Initialized nouveau 1.3.1 20120801 for 0000:01:00.0 on minor 0 apr 13 10:39:13 localhost kernel: fbcon: nouveaudrmfb (fb0) is primary device apr 13 10:39:13 localhost kernel: nouveau 0000:01:00.0: [drm] fb0: nouveaudrmfb frame buffer device apr 13 10:39:16 localhost sensord[1241]: Chip: nouveau-pci-0100 apr 13 10:39:27 localhost kernel: nouveau 0000:01:00.0: Direct firmware load for nouveau/nva8_fuc084 failed with error -2 apr 13 10:39:27 localhost kernel: nouveau 0000:01:00.0: Direct firmware load for nouveau/nva8_fuc084d failed with error -2 apr 13 10:39:27 localhost kernel: nouveau 0000:01:00.0: msvld: unable to load firmware data apr 13 10:39:27 localhost kernel: nouveau 0000:01:00.0: msvld: init failed, -19 apr 13 10:45:41 localhost kernel: nouveau 0000:01:00.0: Direct firmware load for nouveau/nva8_fuc084 failed with error -2 apr 13 10:45:41 localhost kernel: nouveau 0000:01:00.0: Direct firmware load for nouveau/nva8_fuc084d failed with error -2 apr 13 10:45:41 localhost kernel: nouveau 0000:01:00.0: msvld: unable to load firmware data apr 13 10:45:41 localhost kernel: nouveau 0000:01:00.0: msvld: init failed, -19 apr 13 10:51:23 localhost kernel: nouveau 0000:01:00.0: Direct firmware load for nouveau/nva8_fuc084 failed with error -2 apr 13 10:51:23 localhost kernel: nouveau 0000:01:00.0: Direct firmware load for nouveau/nva8_fuc084d failed with error -2 apr 13 10:51:23 localhost kernel: nouveau 0000:01:00.0: msvld: unable to load firmware data apr 13 10:51:23 localhost kernel: nouveau 0000:01:00.0: msvld: init failed, -19 apr 13 10:59:39 localhost sensord[1241]: Chip: nouveau-pci-0100 apr 13 11:13:04 localhost kernel: nouveau 0000:01:00.0: Direct firmware load for nouveau/nva8_fuc084 failed with error -2 apr 13 11:13:04 localhost kernel: nouveau 0000:01:00.0: Direct firmware load for nouveau/nva8_fuc084d failed with error -2 apr 13 11:13:04 localhost kernel: nouveau 0000:01:00.0: msvld: unable to load firmware data apr 13 11:13:04 localhost kernel: nouveau 0000:01:00.0: msvld: init failed, -19 apr 13 11:14:44 localhost kernel: nouveau 0000:01:00.0: Direct firmware load for nouveau/nva8_fuc084 failed with error -2 apr 13 11:14:44 localhost kernel: nouveau 0000:01:00.0: Direct firmware load for nouveau/nva8_fuc084d failed with error -2 apr 13 11:14:44 localhost kernel: nouveau 0000:01:00.0: msvld: unable to load firmware data apr 13 11:14:44 localhost kernel: nouveau 0000:01:00.0: msvld: init failed, -19 One odd thing is that "unable to load firmware data" occurred at boot and first two resume, but not two latest resume. "Chip: nouveau-pci-0100" occured for boot and one resume where it did not get the "unable to load" lines. But some resumes - both to IceWM and XFCE rendered no journal messages containing "nouveau" Coincidence with some timing? * But resume did always work *
Status comment: (none) => Suggest trying xorg modesetting?
And now with xorg modesetting, Plasma and GNOME works, both X11 and Wayland, XFCE do not show any artefacts in window borders, and SDDM never flicker when moving mouse. To summarise: All DE I tried works including resume from suspend correctly - Except Plasma on Wayland, but for that I blame Plasma itself, as Gnome on Wayland resume OK. Plasma only show pointer on black screen, exit to login using ctrl-alt-bksp.)
Summary: Kernel 6.2 regression resuming from suspend on some systems => Kernel 6.2 regression resuming from suspend on some systems - but works with xorg modesetting
https://wiki.mageia.org/en/Mageia_9_Errata#Nvidia
Keywords: FOR_ERRATA9 => IN_ERRATA9
I guess there is no feasible way to automate detection and setting modesetting when needed and only when needed neither for install nor upgrade? What to blame? nouveau, mesa, kernel.. or the combination?
Status comment: Suggest trying xorg modesetting? => (none)
Trying to do other automatic switching could be more harmful at this point because we don't know yet the origin. IMHO the problem arised in the last month, and previously for the Live ISO for instance I wasn't getting the problem in qemu5 (which is using the modesetting driver too and no nouveau involved). So it could be two different problems, first the problem of suspend of X11 for inactivity (but at this point can be only Plasma?) with resulting of the black screen with just the mouse pointer after the resume (maybe problems with DPMS?), and apparently no other visible problems in logs; and second it could be a certain instability with the nouveau driver, which probably was already there. Maybe the latest desktop upgrades just increased the minimal OpenGL capability required to run all the 3D effects, and nouveau just remained lagged. For using software 3D acceleration with llvmpipe and no other hardware acceleration, you may edit /etc/X11/xorg.conf and try: Section "Device" ... Driver "nouveau" Option "NoAccel" "true" EndSection while to disable for modesetting: Section "Device" ... Driver "modesetting" Option "AccelMethod" "none" EndSection Then you can check the OpenGL renderer using glxinfo -B | grep "OpenGL renderer"
(In reply to Giuseppe Ghibò from comment #17) > IMHO the problem arised in the last month, In retrospect, the flicker in SDDM when moving mouse as well as the artefacts in XFCE window borders have been there already several years but i dont remember if i saw it on this specific laptop. I just did not care much, there was more important stuff to chase, and never tried modesetting before. But with modesetting those problems too are gone. What I mean is that nouveau may have always had this problem, it just got worse in combination with i.e mesa and kernel changes. But this locking up a while after resume is new to me. > So it could be two > different problems, first the problem of suspend of X11 for inactivity (but > at this point can be only Plasma?) Since a few days neither Plasma nor GNOME in neither Wayland nor X11 mode start *at all* even in initial login without modesetting. And with modesetting everything is OK in all DE i tried (except resuming Plasma Wayland) I must admit I should read up on what modesetting actually is... I might experiment with xorg.conf later, too much private and job backlog... But first, isnt there something nowadays that writes to xorg.conf automatically, and how to turn that off?
Using only drakx11 to set it up, the well working modesetting choice give this section in xorg.conf: Section "Device" Identifier "device1" Driver "modesetting" Option "DPMS" EndSection And if selecting nouveau (resuming badly): Section "Device" Identifier "device1" Driver "nouveau" Option "DPMS" EndSection $ glxinfo -B | grep "OpenGL renderer" OpenGL renderer string: NVA8 Adding a line to disable hardware acceleration: Section "Device" Identifier "device1" Driver "nouveau" Option "DPMS" Option "NoAccel" "true" EndSection then $ glxinfo -B | grep "OpenGL renderer" OpenGL renderer string: llvmpipe (LLVM 15.0.6, 128 bits) And resuming works. But performance is much slower than modesetting. Updated errata.
There(In reply to Morgan Leijström from comment #19) > Using only drakx11 to set it up, the well working modesetting choice give > this section in xorg.conf: > > Section "Device" > Identifier "device1" > Driver "modesetting" > Option "DPMS" > EndSection > > > And if selecting nouveau (resuming badly): > > Section "Device" > Identifier "device1" > Driver "nouveau" > Option "DPMS" > EndSection > > > $ glxinfo -B | grep "OpenGL renderer" > OpenGL renderer string: NVA8 > > > Adding a line to disable hardware acceleration: > > Section "Device" > Identifier "device1" > Driver "nouveau" > Option "DPMS" > Option "NoAccel" "true" > EndSection > > then > > $ glxinfo -B | grep "OpenGL renderer" > OpenGL renderer string: llvmpipe (LLVM 15.0.6, 128 bits) > > And resuming works. > But performance is much slower than modesetting. > > Updated errata. There is also a new button in XFdrake 1.37 option to do this automatically. It doesn't completely resolve this bug (as problably the problems are in kernel or in latest xorg), but might be helpful.
Thank you. Verified. I now changed errata to describe the GUI method.
Similar problem is with external monitor on mag8 with radeon GPU and kernel 6.2.. This problem is starting with 6.2 kernel. When the external monitor goes to sleep, it will not wake up.
CC: (none) => pavlikd
Any changes with Kernel 6.3?
CC: (none) => linux
Sorry for my mistake but I meant 6.1 kernel from beckport on mag8.
Still need Xorg modesetting. nouveau hangs on resume from suspend. I did not try disabling hw accel this time. Same system, full update incl kernel 6.3.10-desktop-1.mga9
On my workstation the solution is to use linus kernel: On my P55 main board with nvidia GTX750, resume from suspend fail to make picture on my monitor unless i power cycle the monitor, when kernel desktop or server is used with nvidia driver. Workarounds 1) manually power cycle monitor after resume, 2) use linus kernel, 3) use nouveau (slow) or modesetting (decent).
So, as usual 6.6.14-1 .desktop fails and 6.6.14-1 -linus works? We might try a sort of "bisect" until we find the offending patch. Here is a version of 6.6.14 with most of patches disabled: https://download.copr.fedorainfracloud.org/results/ghibo/mageia9-bonus/mageia-9-x86_64/06965290-kernel/ please check whether it works or fails (just install -desktop and -devel).
Great, I am on :)
First "bisect" result: kernel-desktop-6.6.14-1.s1.mga9-1-1.mga9.x86_64 fail
Summary: Kernel 6.2 regression resuming from suspend on some systems - but works with xorg modesetting => Kernel 6.2 regression resuming from suspend on some nvidia systems - works with free drivers or kernel linus
(In reply to Morgan Leijström from comment #29) > First "bisect" result: > kernel-desktop-6.6.14-1.s1.mga9-1-1.mga9.x86_64 fail Try 6.6.14-1.s2 here: https://download.copr.fedorainfracloud.org/results/ghibo/mageia9-bonus/mageia-9-x86_64/06971928-kernel/
6.6.14-1.s2: Success :) (test incl new microcode and dracut)
Try 6.6.14-1.s3 here: https://download.copr.fedorainfracloud.org/results/ghibo/mageia9-bonus/mageia-9-x86_64/06975523-kernel/
6.6.14-1.s3: Success :)
Try 6.6.14-1.s4 here: https://download.copr.fedorainfracloud.org/results/ghibo/mageia9-bonus/mageia-9-x86_64/06977876-kernel/
6.6.14-1.s4 : success :)
Try 6.6.14-1.s5 here: https://download.copr.fedorainfracloud.org/results/ghibo/mageia9-bonus/mageia-9-x86_64/06980096-kernel/
I find no rpm there, despite https://copr.fedorainfracloud.org/coprs/ghibo/mageia9-bonus/build/6980096 shows all succeeded. And I also waited ten minutes and looked again.
You triggered a COPR bug, files are there but are not shown in the generated HTML page which seems an older cached version. Single URLs is this: https://download.copr.fedorainfracloud.org/results/ghibo/mageia9-bonus/mageia-9-x86_64/06980096-kernel/kernel-desktop-devel-6.6.14-1.s5.mga9-1-1.mga9.x86_64.rpm https://download.copr.fedorainfracloud.org/results/ghibo/mageia9-bonus/mageia-9-x86_64/06980096-kernel/kernel-desktop-6.6.14-1.s5.mga9-1-1.mga9.x86_64.rpm https://download.copr.fedorainfracloud.org/results/ghibo/mageia9-bonus/mageia-9-x86_64/06980096-kernel/kernel-6.6.14-1.s5.mga9.src.rpm
6.6.14-1.s5 : Fail.
Try 6.6.14-2.s1: https://download.copr.fedorainfracloud.org/results/ghibo/mageia9-bonus/mageia-9-x86_64/06982548-kernel/
6.6.14-2.s1: Success :)
Ok, we got it! The problem seems coming from this patch: https://svnweb.mageia.org/packages/cauldron/kernel/current/SOURCES/block-floppy-disable-pnp-modalias.patch?view=markup related to floppy autoloading. Apparently it was introduced in 2012 to fix bug https://bugs.mageia.org/show_bug.cgi?id=4696 though the mechanism why it interferes right now are pretty weird. We might try to disable in a next build. BTW, did you have a floppy reader installed in your desktop?
(In reply to Giuseppe Ghibò from comment #42) > Ok, we got it! Great :) > BTW, did you have a floppy reader installed in your desktop? Yes. Just inserted a 1.44 diskette and clicked on it in Dolphin. Can list and copy files from it. It seem i have no right to add or delete files, have not tried to find out why.
So in 6.6.14-2.s1 we have all patches we usually use except *only* that patch for bug 4696 ? All tests in this bisect series was with nvidia-current, 535.154.05-1.mga9. Before this test series i have acknowledged the problem on our usual -desktop kernel using also 470 and 545. Now as next exercise I switch to nvidia-newfeature-545.29.06-2.mga9 with this 6.6.14-2.s1 kernel.
(In reply to Morgan Leijström from comment #44) > Now as next exercise I switch to nvidia-newfeature-545.29.06-2.mga9 with > this 6.6.14-2.s1 kernel. Bug 32579 Comment 23 - strikes again.
(In reply to Morgan Leijström from comment #44) > So in 6.6.14-2.s1 we have all patches we usually use except *only* that > patch for bug 4696 ? Exactly. > > All tests in this bisect series was with nvidia-current, 535.154.05-1.mga9. > Before this test series i have acknowledged the problem on our usual > -desktop kernel using also 470 and 545. > > Now as next exercise I switch to nvidia-newfeature-545.29.06-2.mga9 with > this 6.6.14-2.s1 kernel.
(In reply to Morgan Leijström from comment #45) > (In reply to Morgan Leijström from comment #44) > > Now as next exercise I switch to nvidia-newfeature-545.29.06-2.mga9 with > > this 6.6.14-2.s1 kernel. > > Bug 32579 Comment 23 - strikes again. I don't think the patch is responsible for this. Consider that 6.6.14-2.s1 is in "oldversionedscheme" (while 6.6.14-2 was in newversionedscheme), so I think that the nvidia-newfeature probs it's triggered by some condition where the driver temporarely goes unconfigured (e.g. missed devel or something like that) and where drakx11 intervenes.
Summary: Kernel 6.2 regression resuming from suspend on some nvidia systems - works with free drivers or kernel linus => Kernel 6.2+ regression resuming from suspend on some nvidia systems - works with free drivers or kernel linus
Created attachment 14360 [details] Journal from booting same kernel after switcing nvidia 535 to 545 Some times it works, sometimes not, I can not tell a clear pattern. This time I was running experimental kernel-desktop-6.6.14-2.s1, Bug 31695 Comment 44 and nvidia535, used drakx11 to switch to nvidia-newfeature. This bug then hit: graphics fail trying to boot that kernel. OK booting another, dkms-autorebuild works. This was a while ago, and now I tried again and saved and attach this log. Instead of showing SDDM login, it boots to black screen with text(?) cursor top left. I then issued Ctrl-Alt-F4 -> screen full black. In blined I logged in as root and issued reboot -> worked. I choosed another kernel (linus) -> dkms autorebuild did its job, and now I $ sudo journalctl -b-1 > ThisAttachement (which I then compressed)
OOPS wrong bug.
Next step? Should we ask other testers to test kernel-desktop-6.6.14-2.s1 for regressions compared to normal -desktop kernel ? Or more theoretical thinking/analysing first?
Next step is to include the fix in a next kernel build. Indeed was already done in kernel-6.6.16-1.mgaX, but actually it doesn't actually complete the build yet, because the building fails on the BS for the i586 arch due to VM memory exausting.
Oh no... I have swithed to kernel kernel-linus-6.6.14-1.mga9.x86_64 which had no problem before resuming from suspend. But now this morning, monitor needed power cycle. Changes to system since it worked: mesa updated to testing, tainted. And I removed all i585 packages from the system. as logged in Bug 32826 Comment 12 to 14 Weird. Will keep trying to see if it is consistent. But apparently that kernel patch is not directly involved. Complicated issue.
What about 6.6.14-1.s2 (which is basically a 6.6.15)? Does the same occurs?
1) Repeated the failure with kernel-linus-6.6.14-1 2) 6.6.14-1.s2, which worked before, also fail today. Now trying elder kernel-linus-6.5.13-2.mga9 which worked before. Will report when it had a suspend - wait an hour - resume cycle.
kernel-linus-6.5.13-2.mga9 which worked before also fail now. Strange that is seem no one else but me have reported this issue. Maybe it is very special to the combination of kernel, drivers, etc, GPU and its implementation, and of course sleep handling in the monitor. I suggest to *not* change what patches we apply, unless there is another reason than the issue i am seeing.
Severity: critical => major
... I am even leaning on setting this as wontfix unless we get some bright ideas soon.
What exactly you changed from last time it worked to now (package, drivers, etc.)? journalctl could show that. Last time was pretty easy to trigger and in the end was due to the floppy patch. Isn't that your floppy hardware is degrading in some way? How about simply trying to disconnect it from the motherboard?
Hah you are kind of correct. I had unplugged the diskette drive, because i was trying to see if other drives worked better to read old damaged setup backup diskettes from an old machine (some idiot had stored diskettes close to a transformer). Plugged in diskette drive, rebooted, suspend over night. Resumed now, all OK (linus-6.5.13-2) Will try 6.6.14-1.s2 next. Thanks for the idea!
Ok, so now, it's expected to: - kernel-linus-6.5.13-2.mga9: working - kernel-desktop-6.5.13-6.mga9: failing (newversionscheme, from updates) - kernel-linus-6.6.14-1.mga9: working - kernel-desktop-6.6.14-2.mga9: failing (newversionscheme, from updates) - kernel-desktop-6.6.14-1.s2: working (oldversionscheme, copr kernel) - kernel-linus-6.6.16-1.mga9: working (from updates_testing) - kernel-desktop-6.6.16-1.mga9: working (newversionscheme, from updates_testing)
(Crazy that having a floppy drive connected or not makes the monitor wake up or not...!) --- Yes, Comment 59 summarise some of the result of 6.5.13 and 6.6.14 I will try the 6.6.16 linus and desktop after verifying 6.6.14-1.s2 still works. Hmm... I also see linus 6.6.*14*-1.mga9 building again ? --- After testing new kernels I will see if I can borrow another monitor, to test with both "bad" and good kernels. - Manufacturers often interprete standards differently, plus standards are often changing...
(In reply to Morgan Leijström from comment #60) > (Crazy that having a floppy drive connected or not makes the monitor wake up > or not...!) Yes... > > --- > > Yes, Comment 59 summarise some of the result of 6.5.13 and 6.6.14 > > I will try the 6.6.16 linus and desktop after verifying 6.6.14-1.s2 still > works. > > Hmm... I also see linus 6.6.*14*-1.mga9 building again ? kernel-linus-6.6.14-1.mga9 was a mistake in submitting and I couldn't interrupt the build. But it should be removed. Do not consider it, as it's the actual the same version already in updates.
(with diskette drive attached...) kernel-desktop-6.6.14-2.s1.mga9-1-1.mga9.x86: OK kernel-desktop-6.6.16-1.mga9.x86_64: OK
kernel-linus-6.6.16-1.mga9.x86_64: OK Will next try latest -desktop when it is built.
Status comment: (none) => Deends on patch see c42 and absent diskette drive c58
kernel-desktop-6.6.16-3.mga9.x86_64: OK -- sidenote: I started VirtualBox with MSW7 guest, and that worked, but afterwards i see in journal a bunch of lines with call trace, register dump etc. First line: feb 14 21:48:25 svarten.tribun kernel: WARNING: CPU: 0 PID: 74553 at /var/lib/dkms/virtualbox/7.0.14-1.mga9/build/vboxdrv/r0drv/linux/memobj-r0drv-linux.c:564 rtR0MemObjLinuxApplyPageRange+0x67/0xa0 [vboxdrv] I can get more lines of course but this should be in a separate bug if so. The second time i started Virtualbox and guest there is no such bunch of lines, only: feb 15 00:12:26 svarten.tribun kernel: vboxdrv: 000000003b9337fc VMMR0.r0 feb 15 00:12:27 svarten.tribun kernel: vboxdrv: 00000000145cb2e7 VBoxDDR0.r0 and those lines ware also at the end of the previous session. kmod was built locally; $ dkms status|grep 6.6.16-desktop-3 virtualbox, 7.0.14-1.mga9, 6.6.16-desktop-3.mga9, x86_64: installed nvidia-newfeature, 545.29.06-2.mga9.nonfree, 6.6.16-desktop-3.mga9, x86_64: installed
Status comment: Deends on patch see c42 and absent diskette drive c58 => Depends on patch see c42, and absent diskette drive c58
Is that situation happening before or after a resume?
Verified now it happens after reboot, before suspend.
There is newer 6.6.14-mga9 in updates_testing. Did you get more info about the same VBox prob. using dmesg? Probably is unrelated to this bug, but upstream for virtualbox. Seems pretty similar to these reports: https://forums.virtualbox.org/viewtopic.php?t=110706 https://www.virtualbox.org/ticket/21964 https://www.virtualbox.org/ticket/21952
(In reply to Giuseppe Ghibò from comment #67) > There is newer 6.6.14 I suppose you mean 6.6.16-4: Running now, Same result for VB. > Probably is unrelated to this bug, but upstream for virtualbox. Agree, just wanted to feedback while testing here as we have no bug open on 6.6.16 yet. I see in journal: § It happens first time for 6.6.14-1, but I did not then check journal, as the guest works. § It did not appear for 6.5.13-2 > Seems pretty similar to these reports: Yes. All three are very similar to all those lines I see when viewing journal. I think we should open a bug for this. - On VirtualBox 7.0.14 I suppose, and set it UPSTREAM, and with the links you gave?
Created: Bug 32858 - UPSTREAM VirtualBox bug in vboxdrv module for kernel 6.6 host create warning in journal
kernel-desktop-6.6.17-1.mga9: OK resuming
From Bug 32922 Comment 28 and on: kernel-desktop-6.6.18-1.mga9.x86_64: OK resuming kernel-server-6.6.18-1.mga9.x86_64: strangely it may hard hang! :( kernel-linus-6.6.18-1.mga9.x86_64: OK resuming vt switching problem is partly back :( From just referenced bug: > § FAIL: vt switching Works before a suspend-resume cycle, but never after: > vt switching ctrl-alt-F6 and back ctrl-alt-F2 fails: hard hang black screen > with non moveable mouse cursor, even REISUB did not work, user lost work. I > also note this problem do *not* exist with linus 6.6.18-1. > I remember we had this problem before but did we not get it sorted? Guiseppe replied: There was a patch/fix for that, that it's still included.
(In reply to Giuseppe Ghibò from comment Bug 32922 Comment 29) > Did you get the same fail with 535.xx or newer 470.239.07 You must mean 470.239.06-1, which is what is in testing repo > or it's specific of 550.54.14? All tree - now verified with 535.154.05, -desktop kernel. Now trying kernel-server-6.6.18-1.mga9.x86_64 with nvidia 535.154.05: It is not as distinctive but maybe one of five times it hangs hard, plus i also experienced three times of maybe ten that it hang hard on resume (not that it failed to wake up the monitor (like earlier -desktop kernels), it really needed power button on computer, did not even respond to REISUB) ----------- (In reply to Giuseppe Ghibò from comment Bug 32922 Comment 33) > You might try lstopo utility from hwloc package to detect if there is some weird configuration of PCIe slots, so maybe some PCIe line is shared between too many devices. Mostly these hard locks comes after the resume. > If you swap the gfx card from one PCIe slot to another? cuda-z might also show a very different bandwidth when placed in one slot or one another. OK tested: Running -desktop 6.6.18-1 and nvidia 550.54.14 - now including nvidia-cuda-opencl for the suggested testing $ lstopo --of txt ┌─────────────────────────────────────────────────────────────────────────────────────────────────┐ │ Machine (16GB total) │ │ │ │ ┌────────────────────────────────┐ ├┤╶─┬─────┼┤╶───────┬──────────────────────┐ ┌───────────┐ │ │ │ Package L#0 │ │9,8 9,8 │ PCI 07:00.0 │ │ Block fd0 │ │ │ │ │ │ │ │ │ │ │ │ │ ┌────────────────────────────┐ │ │ │ ┌──────────────────┐ │ │ 0 MB │ │ │ │ │ NUMANode L#0 P#0 (16GB) │ │ │ │ │ CoProc opencl0d0 │ │ └───────────┘ │ │ │ └────────────────────────────┘ │ │ │ │ │ │ │ │ │ │ │ │ │ 4 compute units │ │ │ │ │ ┌────────────────────────────┐ │ │ │ │ │ │ │ │ │ │ L3 (8192KB) │ │ │ │ │ 970 MB │ │ │ │ │ └────────────────────────────┘ │ │ │ └──────────────────┘ │ │ │ │ │ │ └──────────────────────┘ │ │ │ ┌────────────┐ ┌────────────┐ │ │ │ │ │ │ L2 (256KB) │ │ L2 (256KB) │ │ ├─────┼┤╶───────┬────────────────┐ │ │ │ └────────────┘ └────────────┘ │ │0,2 0,2 │ PCI 02:00.0 │ │ │ │ │ │ │ │ │ │ │ ┌────────────┐ ┌────────────┐ │ │ │ ┌────────────┐ │ │ │ │ │ L1d (32KB) │ │ L1d (32KB) │ │ │ │ │ Net enp2s0 │ │ │ │ │ └────────────┘ └────────────┘ │ │ │ └────────────┘ │ │ │ │ │ │ └────────────────┘ │ │ │ ┌────────────┐ ┌────────────┐ │ │ │ │ │ │ L1i (32KB) │ │ L1i (32KB) │ │ ├─────┼┤╶─┬─────┬─────────────┐ │ │ │ └────────────┘ └────────────┘ │ │0,2 │0,2 │ PCI 03:00.0 │ │ │ │ │ │ │ └─────────────┘ │ │ │ ┌────────────┐ ┌────────────┐ │ │ │ │ │ │ │ Core L#0 │ │ Core L#1 │ │ │ └─────┬───────────────┐ │ │ │ │ │ │ │ │ │ │ PCI 03:00.1 │ │ │ │ │ ┌────────┐ │ │ ┌────────┐ │ │ │ │ │ │ │ │ │ │ PU L#0 │ │ │ │ PU L#2 │ │ │ │ │ ┌───────────┐ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ Block sr0 │ │ │ │ │ │ │ P#0 │ │ │ │ P#1 │ │ │ │ │ │ │ │ │ │ │ │ └────────┘ │ │ └────────┘ │ │ │ │ │ 1023 MB │ │ │ │ │ │ ┌────────┐ │ │ ┌────────┐ │ │ │ │ └───────────┘ │ │ │ │ │ │ PU L#1 │ │ │ │ PU L#3 │ │ │ │ └───────────────┘ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ P#2 │ │ │ │ P#3 │ │ │ └─────┬──────────────────────────────┐ │ │ │ │ └────────┘ │ │ └────────┘ │ │ │ PCI 00:1f.2 │ │ │ │ └────────────┘ └────────────┘ │ │ │ │ │ └────────────────────────────────┘ │ ┌───────────┐ ┌───────────┐ │ │ │ │ │ Block sdb │ │ Block sda │ │ │ │ │ │ │ │ │ │ │ │ │ │ 1863 GB │ │ 465 GB │ │ │ │ │ └───────────┘ └───────────┘ │ │ │ └──────────────────────────────┘ │ └─────────────────────────────────────────────────────────────────────────────────────────────────┘ CUDA-Z Report ============= Version: 0.11.291 SVN 64 bit Built Apr 28 2023 14:29:21 http://cuda-z.sf.net/ OS Version: Linux 6.6.18-desktop-1.mga9 #1 SMP PREEMPT_DYNAMIC Sat Feb 24 02:17:35 UTC 2024 x86_64 Driver Version: 550.54.14 Driver Dll Version: 12.40 (550.54.14) Runtime Dll Version: 12.10 Core Information ---------------- Name: NVIDIA GeForce GTX 750 Compute Capability: 5.0 (Maxwell) Clock Rate: 1084.5 MHz PCI Location: 0:7:0 Multiprocessors: 4 (512 Cores) Threads Per Multiproc.: 2048 Warp Size: 32 Regs Per Block: 65536 Threads Per Block: 1024 Threads Dimensions: 1024 x 1024 x 64 Grid Dimensions: 2147483647 x 65535 x 65535 Watchdog Enabled: Yes Integrated GPU: No Concurrent Kernels: Yes Compute Mode: Default Stream Priorities: Yes Memory Information ------------------ Total Global: 970.25 MiB Bus Width: 128 bits Clock Rate: 2505 MHz Error Correction: No L2 Cache Size: 2048 KiB Shared Per Block: 48 KiB Pitch: 2048 MiB Total Constant: 64 KiB Texture Alignment: 512 B Texture 1D Size: 65536 Texture 2D Size: 65536 x 65536 Texture 3D Size: 4096 x 4096 x 4096 GPU Overlap: Yes Map Host Memory: Yes Unified Addressing: Yes Async Engine: 1 Yes, Unidirectional Performance Information ----------------------- Memory Copy Host Pinned to Device: 5835.06 MiB/s Host Pageable to Device: 4866.85 MiB/s Device to Host Pinned: 5037.22 MiB/s Device to Host Pageable: 4645.76 MiB/s Device to Device: 18.104 GiB/s GPU Core Performance Single-precision Float: 88.5483 Gflop/s Double-precision Float: 16.529 Gflop/s 64-bit Integer: 1151.24 Miop/s 32-bit Integer: 8679.45 Miop/s 24-bit Integer: 8230.53 Miop/s Generated: Mon Mar 4 19:22:03 2024 Switching the Graphics card from the blue slot (top, closest to CPU), to the orange socket two slots down. The blue is obviosly the fast one, having all pins in the conector, while the orange have most pins missing (less lanes). [morgan@svarten ~]$ lstopo --of txt ┌─────────────────────────────────────────────────────────────────────────────────────────────────┐ │ Machine (16GB total) │ │ │ │ ┌────────────────────────────────┐ ├┤╶─┬─────┼┤╶───────┬────────────────┐ ┌───────────┐ │ │ │ Package L#0 │ │0,2 0,2 │ PCI 02:00.0 │ │ Block fd0 │ │ │ │ │ │ │ │ │ │ │ │ │ ┌────────────────────────────┐ │ │ │ ┌────────────┐ │ │ 0 MB │ │ │ │ │ NUMANode L#0 P#0 (16GB) │ │ │ │ │ Net enp2s0 │ │ └───────────┘ │ │ │ └────────────────────────────┘ │ │ │ └────────────┘ │ │ │ │ │ │ └────────────────┘ │ │ │ ┌────────────────────────────┐ │ │ │ │ │ │ L3 (8192KB) │ │ ├─────┼┤╶─┬─────┬─────────────┐ │ │ │ └────────────────────────────┘ │ │0,2 │0,2 │ PCI 03:00.0 │ │ │ │ │ │ │ └─────────────┘ │ │ │ ┌────────────┐ ┌────────────┐ │ │ │ │ │ │ │ L2 (256KB) │ │ L2 (256KB) │ │ │ └─────┬───────────────┐ │ │ │ └────────────┘ └────────────┘ │ │ │ PCI 03:00.1 │ │ │ │ │ │ │ │ │ │ │ ┌────────────┐ ┌────────────┐ │ │ │ ┌───────────┐ │ │ │ │ │ L1d (32KB) │ │ L1d (32KB) │ │ │ │ │ Block sr0 │ │ │ │ │ └────────────┘ └────────────┘ │ │ │ │ │ │ │ │ │ │ │ │ │ 1023 MB │ │ │ │ │ ┌────────────┐ ┌────────────┐ │ │ │ └───────────┘ │ │ │ │ │ L1i (32KB) │ │ L1i (32KB) │ │ │ └───────────────┘ │ │ │ └────────────┘ └────────────┘ │ │ │ │ │ │ ├─────┼┤╶───────┬──────────────────────┐ │ │ │ ┌────────────┐ ┌────────────┐ │ │1,0 1,0 │ PCI 04:00.0 │ │ │ │ │ Core L#0 │ │ Core L#1 │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ ┌──────────────────┐ │ │ │ │ │ ┌────────┐ │ │ ┌────────┐ │ │ │ │ │ CoProc opencl0d0 │ │ │ │ │ │ │ PU L#0 │ │ │ │ PU L#2 │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ 4 compute units │ │ │ │ │ │ │ P#0 │ │ │ │ P#1 │ │ │ │ │ │ │ │ │ │ │ │ └────────┘ │ │ └────────┘ │ │ │ │ │ 970 MB │ │ │ │ │ │ ┌────────┐ │ │ ┌────────┐ │ │ │ │ └──────────────────┘ │ │ │ │ │ │ PU L#1 │ │ │ │ PU L#3 │ │ │ │ └──────────────────────┘ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ P#2 │ │ │ │ P#3 │ │ │ └─────┬──────────────────────────────┐ │ │ │ │ └────────┘ │ │ └────────┘ │ │ │ PCI 00:1f.2 │ │ │ │ └────────────┘ └────────────┘ │ │ │ │ │ └────────────────────────────────┘ │ ┌───────────┐ ┌───────────┐ │ │ │ │ │ Block sdb │ │ Block sda │ │ │ │ │ │ │ │ │ │ │ │ │ │ 1863 GB │ │ 465 GB │ │ │ │ │ └───────────┘ └───────────┘ │ │ │ └──────────────────────────────┘ │ └─────────────────────────────────────────────────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────────────────────────────────────────────┐ │ Host: svarten.tribun │ │ │ │ Date: mån 4 mar 2024 19:55:58 │ └─────────────────────────────────────────────────────────────────────────────────────────────────┘ CUDA-Z Report ============= (snipped out identical info here) Memory Copy Host Pinned to Device: 736.296 MiB/s Host Pageable to Device: 732.628 MiB/s Device to Host Pinned: 788.843 MiB/s Device to Host Pageable: 779.851 MiB/s Device to Device: 30.8302 GiB/s ==================================================================== RESULT: ( still on -desktop 6.6.18-1 and nvidia 550.54.14 ) Reported "Device to Device" is higher. What two devices? Because less lanes to main board, it can communicate faster to another GPU via bridge, if i had one? Good: neither vt switching nor resuming hangs. Bad: Painfully sluggish, several times slower, about half speed contra nouveau, so it is not only limited by lower bandwidth, there must be something else operating *much* more inefficiently now. Bonus quirk: Each time it resume from suspend, optical drive ejects. Painfully slow, as said - I now switch back to blue socket, and linus 6.6.18-1
(In reply to Morgan Leijström from comment #72) > (In reply to Giuseppe Ghibò from comment Bug 32922 Comment 29) > > Did you get the same fail with 535.xx or newer 470.239.07 > > You must mean 470.239.06-1, which is what is in testing repo yes 470.239.06-1 not 470.239.07 (it was a typo)
*** Bug 32541 has been marked as a duplicate of this bug. ***
ORIENTATION In this bug, focus have shifted From: the old GT218M [NVS 3100M], worked around by using Xorg modesetting To: GM107 problems with nvidia drivers having vt switching hang and hanging (server 6.6.18) or not waking up monitor after resuming from suspend (-desktop before 6.6.18)
Summary: Kernel 6.2+ regression resuming from suspend on some nvidia systems - works with free drivers or kernel linus => Kernel 6.2+ regression resuming from suspend & vt switching problems on some nvidia systems - works with free drivers or kernel linus
Device to Device means gfx card transfers within itself. Device to Host is from transfer from/to main host RAM to gfx card. You get worst performance in 2nd slot.
> > or it's specific of 550.54.14? > > All tree - now verified with 535.154.05, -desktop kernel. > > So, so it's not specific to 550. 6.6.20-1.mga9 (copr) had any chance to work better?
kernel-desktop-6.6.20-1.mga9-1-1.mga9.x86_64 with 550: OK so far three iterations each of vt switching and suspend, will report tomorrow or later if i see any problem. BTW also updated to mesa (non tainted)
Ok, then il would be fixed in a next round (when it will be ready, as it requires all the other stuff [linus+kmods]).
This is resolved :)
Resolution: (none) => FIXEDStatus: NEW => RESOLVED