It's a bugfix releases, bugfixes: https://www.nvidia.com/Download/driverResults.aspx/226768/en-us/ There is also ldetect-lst which refreshes the pci-table of the newer driver.
I assume you meant to set this to QA. Please provide packages list. ldetect-lst is still in build queue. Is it supposed to go in this bug or a separate one? That said I am already using 550, used OK on host with all three 6.6.28 kernels as host when testing VirtualBox update Bug 33273. I updated using drakrpm, manually selecting all nvidia 550 packages I want: - dkms-nvidia-current-550.90.07-1.mga9.nonfree.x86_64 - nvidia-current-cuda-opencl-550.90.07-1.mga9.nonfree.x86_64 - nvidia-current-doc-html-550.90.07-1.mga9.nonfree.x86_64 - nvidia-current-utils-550.90.07-1.mga9.nonfree.x86_64 - x11-driver-video-nvidia-current-550.90.07-1.mga9.nonfree.x86_64 kmods got built for the running kernel, and then automatically during boot when switching kernels.
CC: (none) => friAssignee: bugsquad => qa-bugs
MGA9-64, AMD Ryzen 5 2600, Nvidia 1650 super, GNOME The following 4 packages are going to be installed: - dkms-nvidia-current-550.90.07-1.mga9.nonfree.x86_64 - nvidia-current-cuda-opencl-550.90.07-1.mga9.nonfree.x86_64 - nvidia-current-utils-550.90.07-1.mga9.nonfree.x86_64 - x11-driver-video-nvidia-current-550.90.07-1.mga9.nonfree.x86_64 1.1MB of additional disk space will be used. ---- rebooted $ nvidia-smi Wed Jun 19 20:55:04 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.90.07 Driver Version: 550.90.07 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce GTX 1650 ... Off | 00000000:07:00.0 On | N/A | | 35% 37C P8 11W / 100W | 96MiB / 4096MiB | 2% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 3189 G /usr/libexec/Xorg 46MiB | | 0 N/A N/A 3328 G /usr/bin/gnome-shell 43MiB | +-----------------------------------------------------------------------------------------+ browser working calc working works for me
CC: (none) => brtians1
(In reply to Morgan Leijström from comment #1) > I assume you meant to set this to QA. > > Please provide packages list. > > ldetect-lst is still in build queue. > Is it supposed to go in this bug or a separate one? ldetect-lst is supposed to go in this bug.
Keywords: (none) => advisory
SRPMS ldetect-lst-0.6.58-1.mga9 nvidia-current-550.90.07-1.mga9.nonfree RPMS in: 9/x86_64/nonfree/updates_testing dkms-nvidia-current-550.90.07-1.mga9.nonfree nvidia-current-all-550.90.07-1.mga9.nonfree nvidia-current-cuda-opencl-550.90.07-1.mga9.nonfree nvidia-current-devel-550.90.07-1.mga9.nonfree nvidia-current-doc-html-550.90.07-1.mga9.nonfree nvidia-current-lib32-550.90.07-1.mga9.nonfree nvidia-current-utils-550.90.07-1.mga9.nonfree x11-driver-video-nvidia-current-550.90.07-1.mga9.nonfree 9/x86_64/core-updates/testing ldetect-lst-0.6.58-1.mga9 ldetect-lst-devel-0.6.58-1.mga9 9/i586/core-updates/testing ldetect-lst-0.6.58-1.mga9 ldetect-lst-devel-0.6.58-1.mga9
Continuing testing from comment 1 after having been running nvidia470 a couple days. Swithing nvidia driver by using drakx11 on running 6.6.28 server kernel: OK Rebooted, used the system for some hours, suspend over night. After having resumed from suspend, desktop was very sluggish to respond to me switching between open applications. Also the Plasma panel was unresponsive and not updating. Swithced to vt4 (Ctrl-alt-F4) and back and got full hang, black screen with frozen mouse pointer. Did not even react to REISUB. I have seen (and reported) this before, but not with our previous version. In journal i see nothing suspicious except jun 23 09:56:00 svarten.tribun DiscoverNotifier[28204]: KCrash: Attempting to start /usr/libexec/DiscoverNotifier jun 23 09:56:00 svarten.tribun DiscoverNotifier[28204]: KCrash: Application 'DiscoverNotifier' crashing... - which of course is a result of desktop crashed but not a cause.
(In reply to Morgan Leijström from comment #5) > Continuing testing from comment 1 after having been running nvidia470 a > couple days. > > Swithing nvidia driver by using drakx11 on running 6.6.28 server kernel: OK > > Rebooted, used the system for some hours, suspend over night. > > After having resumed from suspend, desktop was very sluggish to respond to > me switching between open applications. > > Also the Plasma panel was unresponsive and not updating. > > Swithced to vt4 (Ctrl-alt-F4) and back and got full hang, black screen with > frozen mouse pointer. Did not even react to REISUB. > > I have seen (and reported) this before, but not with our previous version. > > In journal i see nothing suspicious except > jun 23 09:56:00 svarten.tribun DiscoverNotifier[28204]: KCrash: Attempting > to start /usr/libexec/DiscoverNotifier > jun 23 09:56:00 svarten.tribun DiscoverNotifier[28204]: KCrash: Application > 'DiscoverNotifier' crashing... > - which of course is a result of desktop crashed but not a cause. So in your tests while 550.90.07 doesn't show any problem and works as good as 550.78, but 470.256.02 seems worst than 470.239.06?
With 470 now in testing, I had a problem once but only with kernel 6.6.34 which we will not release. 550.90.07, this bug, is the one I so far experienced problem with once, after resuming from suspend, and that with our released kernel server 6.6.28. I am now switching to kernel desktop 6.6.28, still nvidia 550.90.07.
How did you exactly suspend on ram? a) from Plasma menu Power Session -> Sleep b) using command "systemctl suspend" (as root) c) using "echo mem > /sys/power/state" (as root) in case you are using a) or b) did you find any difference in using c) instead of a) or b) with respect to your problem?
Always used a). Now tried a few cycles of a) and of c), no problem. Only difference I see is that using c) there is no login after resume. Tried with both desktop kernel 6.6.28 Then I switched to server kernel flavour. Made a few suspend-resume cycles, no problem. Then tried after uninstalling nvidia-current-cuda-opencl, as it was not installed in both cases, comment 5 and the other transient problem with nvidia470 on non release kernel. Made a few cycles, no problem. Then, suddenly, about 20 seconds after resuming OK, the screen transiently went black, then app windows repainted, then Plasma background (was transiently black) (Probably unrelated but this was after suspend method c). ) In Journal: jun 23 18:49:55 svarten.tribun kernel: QSGRenderThread[10628]: segfault at fe00000000d ip 00007f86093b7e5c sp 00007f85b7ffea80 error 4 in libQt5Quick.so.5.15.7[7f8609314000+2da000] likely on CPU 2 (core 0, socket 0) jun 23 18:49:55 svarten.tribun kernel: Code: 89 f1 48 89 d6 66 0f 1f 84 00 00 00 00 00 48 8b 56 08 48 85 d2 74 1c 8b 41 04 83 7e 1c 01 0f 45 01 39 46 18 7f 28 48 8b 76 10 <48> 8b 56 08 48 85 d2 75 e4 0f b6 5e 20 84 db 74 09 c6 46 20 00 e8 ---< a few normal post resume lines here >--- jun 23 18:49:56 svarten.tribun ksystemstats[66702]: Could not retrieve information for NVidia GPU "0000:07:00.0" System anyway kept running OK and still work OK while I write this - including Plasma panel, launching apps, switching virtual desktops. "QSGRenderThread" is not in journal except today for all time since july 4. So the problem seem to hit rarely and in somewhat random way on my system. Machine is pretty old, so it *may* be hardware problem, and maybe the problem is more likely to hit when cold (after having slept), but generally hardware faults are more usual at temperature extremes... I do not think the problem on my system should hold up this bug.
nvidia-current-cuda-opencl was pulled out of deps long ago because it doesn't fit in the live ISOs, so has to be installed manually. Apart transition from/to 550<->470, what is weird are the hang on VT switching on 550-xx that I thought were left behind with a fix long ago. BTW, during the transition 550<->470 next times try to do a power down cycle, not just reboot. As for the machine hardware, I've also an old series (in the lowlatency) based on 5.10.x series, here (it's in the oldversionscheme, so it can be installed beside other kernel without having to always push version forward, just *-desktop and *-desktop-devel are required for dkms building): https://download.copr.fedorainfracloud.org/results/ghibo/mageia9-bonus/mageia-9-x86_64/07622029-kernel/ sometimes with 5.10.x you don't get the same probs as the latest series (it's just for splitting). BTW if you get interfering from 6.6.34 during uploading/downloading, try to add it (package by package) /etc/urpmi/skip.list.
MGA9-64 Plasma, i5-7500, Quadro K620, server and desktop 6.6.28 kernel. Tested the 470 driver first, with no issues. Then, while in the desktop kernel, used MCC to switch to this nvidia-current. Rebooted into the server kernel, and the modules were built during the boot. No issues to report after the boot. The production install on the same hardware, using the desktop kernel, was updated from the previous nvidia-current a couple of days ago. There have been no issues to report so far.
CC: (none) => andrewsfarm
(In reply to Giuseppe Ghibò from comment #10) > Apart transition from/to 550<->470, what is weird are the hang on VT > switching on 550-xx that I thought were left behind with a fix long ago. Yes this is definitely a regression. Tested again and it hung when vt switching back to Plasma desktop also when not showing other problems after a resume. Others testing nvidia drivers, could you also test vt switching? i.e ! save work first: close mail program and inter having open user files etc! ctrl-alt-F4 (gives fullscreen text terminal) ctrl-alt-F1 (some log) ctrl-alt-F2 (back to Plasma desktop - here system hangs hard for me) I believe some other desktop system use login at -F2, desktop at -F3 or similar. > BTW, during the transition 550<->470 next times try to do a power down > cycle, not just reboot. OK Will try that too, and vt switching on 470. Later - Got to work... And backup files before playing much... Pity this only system I have capable of nvidia drivers is my work computer... > I've also an old series (in the lowlatency) based on 5.10.x series I may try your low latency kernels like I have earlier, if I experience more problems. But generally I like to keep to Mageia standard for QA purposes.
Quadro K620 here, and the cuda package is NOT installed. If the default is to not have it, then we must test without it as well as with it. I have never really had any reason to miss it, that I know of. Asus Prime Q270M-C motherboard, latest UEFI firmware installed, with an i5-7500 processor, 48GB of RAM, two M.2 SSDs. Logitech K330 keyboard and M325 mouse, using the Unifying receiver. I don't remember ever using these commands before, so I have no idea what is *supposed* to happen with this hardware. ctrl-alt-F4 gives me the terminal. ctrl-alt-F1 takes me back to the desktop, with a notification from kwin that desktop effects were restarted due to a graphics reset. ctrl-alt-F2 from the desktop gives me the terminal again. ctrl-alt-F1 takes me back to the desktop. ctrl-alt-F3 from the desktop also gives me the terminal. Nothing seems to lock up.
Same hardware, another install, this time I installed the cuda package. No difference in the response to the commands. (I didn't expect one, but checked, anyway) I even logged in as "tom" and tried a command before issuing the ctrl-alt-F1 command, and it dropped me back to the desktop as if I had never left.
Thank you Thomas. Good to see it is only my system that makes this problem show so far. And also for earlier version which had this problem on my hardware, no one else reported that problem. Switching vt is nothing ordinary users do. If we do not see more testers soon I think we can send this out as well as the 470.
(In reply to Morgan Leijström from comment #12) > I may try your low latency kernels like I have earlier, if I experience more > problems. But generally I like to keep to Mageia standard for QA purposes. The idea of the 5.10.x suggesting is just a further alternative to filter out possiblyl hardware degrading problems (e.g overheating, bad mems, etc.), beyond buggy firmware, buggy bios, etc.; In your case, did you changed the hardware topology recently? E.g. adding a new device, or moved a card from slots, e.g. a USB xhci_hcd device, that might cause lockup on suspend?
(In reply to Giuseppe Ghibò from comment #16) > The idea of the 5.10.x OK, will try later > did you changed the hardware No hardware at all changed for over a year.
No change in HW or SW since last comment. Today I went to customer, and a three hours later came back and resumed the computer, only to see black screen. Also this is an old issue i have not seen last couple nvidia releases. Sometimes earlier when it happened it was only screen that did not wake up but now like sometimes before i had to REISUB to make it reboot back to life. It was not a clean shutdown though; no logging between suspend and shutdown, and it performed file systems checks. -- Now I have changed to nvidia470 using drakx11, and have not installed -cuda-opencl. Shut down, And stated again. Lets see how 470 works for a few days. Later I will try the lowlatency kernel.
Remember to add module_blacklist=nouveau to the /etc/default/grub, and update-grub. What cat /proc/acpi/wakeup returns?
Why should we need to manually add module_blacklist=nouveau to the /etc/default/grub, and update-grub? (I have not added it) $ cat /etc/default/grub GRUB_CMDLINE_LINUX_DEFAULT="noiswmd nokmsboot resume=/dev/vg-mga/lv_swap audit=0 vga=794" GRUB_DEFAULT=saved GRUB_DISABLE_OS_PROBER=false GRUB_DISABLE_RECOVERY=false GRUB_DISABLE_SUBMENU=n GRUB_DISTRIBUTOR=Mageia GRUB_ENABLE_CRYPTODISK=y GRUB_GFXMODE=1024x768x32 GRUB_GFXPAYLOAD_LINUX=text GRUB_SAVEDEFAULT=true GRUB_TERMINAL_OUTPUT=gfxterm GRUB_THEME=/boot/grub2/themes/maggy/theme.txt GRUB_TIMEOUT=5 $ cat /proc/acpi/wakeup Device S-state Status Sysfs node P0P1 S4 *disabled P0P3 S4 *disabled pci:0000:00:03.0 P0P4 S4 *disabled P0P5 S4 *disabled P0P6 S4 *disabled BR1E S4 *disabled pci:0000:00:1e.0 PS2K S4 *enabled pnp:00:05 *disabled serio:serio0 PS2M S4 *disabled UAR1 S4 *disabled pnp:00:06 EUSB S4 *disabled USB0 S4 *enabled pci:0000:00:1d.0 USB1 S4 *disabled USB2 S4 *disabled USB3 S4 *disabled USBE S4 *disabled USB4 S4 *enabled pci:0000:00:1a.0 USB5 S4 *disabled USB6 S4 *disabled BR20 S4 *disabled pci:0000:00:1c.0 BR21 S4 *disabled pci:0000:00:1c.1 BR22 S4 *disabled pci:0000:00:1c.2 BR23 S4 *disabled pci:0000:00:1c.3 BR24 S4 *disabled pci:0000:00:1c.4 BR25 S4 *disabled BR26 S4 *disabled BR27 S4 *disabled SLPB S4 *disabled
It's been a while since I dealt with this, but if I recall correctly that's what the nokmsboot is for.
This is the boot command line as taken from journal: jun 25 18:50:05 svarten.tribun kernel: Command line: BOOT_IMAGE=/vmlinuz-6.6.28-desktop-1.mga9 root=/dev/mapper/vg--mga-lv_root ro noiswmd nokmsboot resume=/dev/vg-mga/lv_swap audit=0 vga=794
(In reply to Giuseppe Ghibò from comment #10) > https://download.copr.fedorainfracloud.org/results/ghibo/mageia9-bonus/ > mageia-9-x86_64/07622029-kernel/ Which version of cpupower and lib64bpf1 to use for that kernel? - Should I just keep the 6.6.28 versions? So I only need three packages of that 5.10.219-2 ? kernel-desktop, kernel-desktop-devel, kernel-userspace-headers or not even -headers ?
(In reply to Morgan Leijström from comment #23) > (In reply to Giuseppe Ghibò from comment #10) > > https://download.copr.fedorainfracloud.org/results/ghibo/mageia9-bonus/ > > mageia-9-x86_64/07622029-kernel/ > > Which version of cpupower and lib64bpf1 to use for that kernel? > - Should I just keep the 6.6.28 versions? yes, keep those 6.6.28 version. > > So I only need three packages of that 5.10.219-2 ? only 2. > kernel-desktop, kernel-desktop-devel, kernel-userspace-headers > or not even -headers ? not even headers, only kernel-desktop and kernel-desktop-devel (-devel is for dkms building).
MGA9-64, Plasma, Ryzen 5600, Nvidia 1050 The following 4 packages are going to be installed: - dkms-nvidia-current-550.90.07-1.mga9.nonfree.x86_64 - nvidia-current-cuda-opencl-550.90.07-1.mga9.nonfree.x86_64 - nvidia-current-utils-550.90.07-1.mga9.nonfree.x86_64 - x11-driver-video-nvidia-current-550.90.07-1.mga9.nonfree.x86_64 1.1MB of additional disk space will be used. --rebooted $ nvidia-smi Wed Jun 26 20:06:08 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.90.07 Driver Version: 550.90.07 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce GTX 1050 Off | 00000000:05:00.0 On | N/A | | 45% 32C P0 N/A / 75W | 395MiB / 2048MiB | 3% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 1941 G /usr/libexec/Xorg 122MiB | | 0 N/A N/A 2234 G /usr/bin/kwalletd5 1MiB | | 0 N/A N/A 2390 G /usr/bin/ksmserver 1MiB | | 0 N/A N/A 2392 G /usr/bin/kded5 1MiB | | 0 N/A N/A 2393 G /usr/bin/kwin_x11 39MiB | | 0 N/A N/A 2472 G /usr/bin/plasmashell 21MiB | | 0 N/A N/A 2500 G ...c/polkit-kde-authentication-agent-1 1MiB | | 0 N/A N/A 2503 G /usr/libexec/xdg-desktop-portal-kde 1MiB | | 0 N/A N/A 2568 G /usr/bin/nextcloud 7MiB | | 0 N/A N/A 2570 G /usr/bin/python3 1MiB | | 0 N/A N/A 2579 G /usr/libexec/kdeconnectd 1MiB | | 0 N/A N/A 2585 G /usr/bin/kaccess 1MiB | | 0 N/A N/A 2591 G /usr/bin/kalendarac 1MiB | | 0 N/A N/A 2653 G /usr/bin/akonadi_control 1MiB | | 0 N/A N/A 2719 G /usr/bin/akonadi_akonotes_resource 1MiB | | 0 N/A N/A 2720 G /usr/bin/akonadi_archivemail_agent 1MiB | | 0 N/A N/A 2721 G /usr/bin/akonadi_birthdays_resource 1MiB | | 0 N/A N/A 2722 G /usr/bin/akonadi_contacts_resource 1MiB | | 0 N/A N/A 2723 G .../bin/akonadi_followupreminder_agent 1MiB | | 0 N/A N/A 2724 G /usr/bin/akonadi_ical_resource 1MiB | | 0 N/A N/A 2733 G /usr/bin/akonadi_indexing_agent 1MiB | | 0 N/A N/A 2734 G /usr/bin/akonadi_maildir_resource 1MiB | | 0 N/A N/A 2737 G /usr/bin/akonadi_maildispatcher_agent 1MiB | | 0 N/A N/A 2738 G /usr/bin/akonadi_mailfilter_agent 1MiB | | 0 N/A N/A 2740 G /usr/bin/akonadi_mailmerge_agent 1MiB | | 0 N/A N/A 2743 G /usr/bin/akonadi_migration_agent 1MiB | | 0 N/A N/A 2748 G /usr/bin/akonadi_newmailnotifier_agent 1MiB | | 0 N/A N/A 2749 G /usr/bin/akonadi_notes_agent 1MiB | | 0 N/A N/A 2751 G /usr/bin/akonadi_sendlater_agent 1MiB | | 0 N/A N/A 2753 G /usr/bin/akonadi_unifiedmailbox_agent 1MiB | | 0 N/A N/A 13823 G /usr/bin/firefox 135MiB | | 0 N/A N/A 13949 G /usr/lib/mozilla/kmozillahelper 1MiB | | 0 N/A N/A 15764 G /usr/libexec/baloorunner 1MiB | | 0 N/A N/A 15770 G /usr/bin/konsole 1MiB | +-----------------------------------------------------------------------------------------+ firefox working as expected including videos libreoffice painting properly working as expected.
1) I have been using nvidia470-470.256.02-1 Bug 33317 again for a while with the 6.6.28 desktop kernel and can say it does not show the problem that I see for nvidia-current in this bug. 470 never showed any problem on this system. Tested vt switching, several short suspend-resume cycles, and also suspend overnight OK. I also tested it OK shortly on kernel 5.10.219-desktop-2.lowlatency.500hz.mga9, comment 24. 2) With the lowlatency kernel i have not had a hard hang, and less problem overall. But once after vt switching and then suspend-resume, after logging in desktop did not come up and after a while it shut down... ?? --- I see kernel 6.6.36 building. Ready to test? --- Weird glitch One weird thing which is not to be taken further in this bug report (especially for running non official kernel) was that I was running that lowlatency kernel and nvidia470, used drax11 to switch to nvidia-current-550.90.07-1. Then when shitting down the system it dropped to debug shell. Some interesting lines: dm_log dm_mod [last unloaded: vboxdrv] ... /shutdown: line 154: 312843 Killed $ACTION -f -n ... dracut warning: poweroff failed! I have since rebooted twice, no problem. Hmmm
I'm a bit lost. I've been using this for several days with kernel 6.6.28 with zero problems. Brian tested with nothing out of the way to report. Morgan, it is your issues that confuse me. If I'm reading correctly, your system works OK with the 470 driver, but has an intermittent problem with 550.90. Have you been able to determine if the problem is being caused by a glitch in your particular piece of aging hardware, or by the driver? Meaning, should we continue to hold this back, or let it go?
IMO let it go. The problem have shown before, then was gone for a couple versions, and now it is back. 470 had the problem too some versions ago. So I do not think it is *aging* hardware, but a rarely showing bug in nividia driver/chip *combination*, maybe in conjunction with kernel, cpu, other hardware... Pity we are so few testing compared to the myriad of chip / other hardware / kernel combinations possible.
Whiteboard: (none) => MGA9-64-OKKeywords: (none) => validated_updateCC: (none) => sysadmin-bugs
(In reply to Morgan Leijström from comment #26) > --- > > I see kernel 6.6.36 building. Ready to test? > > --- 6.6.36-2 should be the good one. It still have to rebuild the kmod-virtualbox and xtables-addons when build finished.
(In reply to Thomas Andrews from comment #27) > I've been using this for several days with kernel 6.6.28 > with zero problems. Brian tested with nothing out of the way to report. I do not see any other reports than mine on vt switching nor suspend-resume. And no report (including myself) on hibernation. These are serious problems when they occur as work may not have been saved. Other than mentioned tests this nvidia-current perform well on my system "svarten" as well. I have never seen any report on forum, nor other testers, so my system may have some kind of unusually bad luck of design in this respect. --- Also nvidia-newfeature 555.52.04-1.mga9 (fresh in testing repo today) have problems with kernel 6.6.28, both desktop and linus flavours tested. --- No hard hang with either of the three nvidia drivers with 5.10.219-desktop-2.lowlatency.500hz.mga9. Got repaint problem once after resume but fixed itself after window focus change. --- kernel 5.10.219-desktop-2.lowlatency.500hz.mga9 last tests *always* drop to debug shell (Comment 26 last part) when trying reboot or shut off. Easy to REISUB from there. Do it need correct version of cpupower or other package? --- (In reply to Giuseppe Ghibò from comment #29) > (In reply to Morgan Leijström from comment #26) > > I see kernel 6.6.36 building. Ready to test? > > 6.6.36-2 should be the good one. It still have to rebuild the > kmod-virtualbox and xtables-addons when build finished. Anwyay it is good to test local vbox kmod build is working. Next, I will try desktop kernel 6.6.36-2 next, with nvidia-current-550.90.07-1
(In reply to Morgan Leijström from comment #30) > (In reply to Thomas Andrews from comment #27) > > I've been using this for several days with kernel 6.6.28 > > with zero problems. Brian tested with nothing out of the way to report. > > I do not see any other reports than mine on vt switching nor suspend-resume. > And no report (including myself) on hibernation. > > These are serious problems when they occur as work may not have been saved. > > Other than mentioned tests this nvidia-current perform well on my system > "svarten" as well. > > I have never seen any report on forum, nor other testers, so my system may > have some kind of unusually bad luck of design in this respect. > > --- > > Also nvidia-newfeature 555.52.04-1.mga9 (fresh in testing repo today) have > problems with kernel 6.6.28, both desktop and linus flavours tested. > > --- > > No hard hang with either of the three nvidia drivers with > 5.10.219-desktop-2.lowlatency.500hz.mga9. Got repaint problem once after > resume but fixed itself after window focus change. > there are also a 5.10.220-2.ll, and 6.1.95-2.ll from the same source. > --- > > kernel 5.10.219-desktop-2.lowlatency.500hz.mga9 last tests *always* drop to > debug shell (Comment 26 last part) when trying reboot or shut off. > Easy to REISUB from there. > Do it need correct version of cpupower or other package? > no, keep the one bundled with latest 6.6.36-2.mga, if installed that one. IMO is not the aging of the hardware, but probably it could be more related towards the motherboard (maybe by hw design or buggy firmware) rather than the gfx card. Could be also that some particular BIOS setting get lost during reset/power outage, etc.? In the past we had bisected the combination as being the floppy controller responsible (or co-responsible) for problems with resume from suspend, but now probably there were others. Maybe finding the device or the bus that was affecting it could be tried to unbind that PCI device from the drivers list with a simple command like: echo "<pci-id>" > /sys/block/bus/pci/drivers/<driver>/unbind Other attempts could be to see whether with a RS232 or a crossed USB active cable (which is not easy to find, or self-building) one could debug more, or alternatively get another used motherboard socket 1156 in the 20-30E price tag.
(In reply to Morgan Leijström from comment #30) > (In reply to Thomas Andrews from comment #27) > > I've been using this for several days with kernel 6.6.28 > > with zero problems. Brian tested with nothing out of the way to report. > > I do not see any other reports than mine on vt switching nor suspend-resume. > And no report (including myself) on hibernation. > > These are serious problems when they occur as work may not have been saved. > > Other than mentioned tests this nvidia-current perform well on my system > "svarten" as well. > > I have never seen any report on forum, nor other testers, so my system may > have some kind of unusually bad luck of design in this respect. > Suspend/hibernate/resume in Mageia has never been consistent for me. I never use them with my desktops, so don't think of it, but I have used them, or tried to, with my laptops. With varying results - it has worked in the past but not for a while - and none of them have nvidia gpus. My belief is that each motherboard/firmware is enough different that one-size-fits-all solutions just don't work.
We have to consider that there were different nvidia series that passed under your motherboard since mga9: - 535.154.05 - 550.54.14 - 550.67 - 550.76 - 550.78 - 550.90.07 - 470.199.02 - 470.239.06 - 470.256.02 So for each version, beyond all the fixes we could have added on our side, there was somewhat of "compatibility" matrix (and for those upstream too). But problems of this kinds affects also other distro. So, to summarize what thomas said, with 470.256.02 it seems the most stable series on your motherboard with respect to suspend/resume with any kernel. 555.54.02 would probably shows up the same problems as 550.90.xx. With hibernation instead of suspend you get the same problems? When the system won't resume the video correctly the host is still accessible via ethernet/ssh? If not, at least it answers to ping(s)?
(In reply to Thomas Andrews from comment #27) > I've been using this for several days with kernel 6.6.28 > with zero problems. Brian tested with nothing out of the way to report. > >I do not see any other reports than mine on vt switching nor suspend-resume. >And no report (including myself) on hibernation. Against my better judgement, I booted into my test install with the server kernel, Plasma, Asus Prime Q270M-C motherboard, and nvidia Quadro K620, opened a few apps, and put it into hibernation using the "hibernate" selection of the logout menu. The LED showed hard drive access, then it shut down. I waited a few minutes, then hit the desktop's power button. The power LED lit, and that's it. No signal to the monitor, no hard drive activity, no POST. Nothing. I gave it a couple of minutes, no difference. I shut down by holding the power button, waited again, tried again, same result. Panic tried to ensue. Was my best hardware now a doorstop? I fought it back. Not knowing what else to try, I removed power from all the hardware by using the switch on the power strip. (When all else fails...) I waited until all LEDs went out, card reader, monitor, etc. Then waited another 30 seconds, restored power, and hit the desktop's power switch again. The POST appeared, then rEFInd, with the test install selected. Enter brought up the desktop, with all applications restored to their former states. WHEW! Reaching deep within for courage, I repeated the test, with different apps open. It acted exactly the same. If I used hibernate from the Plasma logout, I had to remove all power from the system to get it back. BUT IT DOES COME BACK. Morgan, hibernation works with the 550.90 driver and the server kernel on my hardware, if the user can avoid panicking.
@Thomas, good you test. I know the angst. That you had to remove power mechanically is a sign that it did not power off completely when entering hibernation. I have a similar problem with my Thinkpad T510: have to hold down power button after disk lamp have stopped and power lamp start flashing indicating kernel panic. At power on it resumes correctly so it is just the power off that fail. Why, I do not know how to investigate as it did shut off display and logging... That machine do not use nvidia drivers - too old in nvidia world. I think we should test if hibernation with free drivers works before trying nvidia as added complication. - But I am inclined to test that dare game too now. I am thinking it is my main board that have some nonstandard quirk and should not be used for thus testing, but as we are too few testing, i go on... I do not have much more time and motivation for this though. nvidia 555.54.01 is worst so far regarding suspend-resume - it always fail with kernels 6.6.28 and .34 (not with 5.10.219-desktop-2.lowlatency). It is also the only fail that returns with a text screen - looking like system journal but it is not saved as such. odd. Hm, now I see new minor version nvidia 555.54.*02* got built. Yes 470 have always seemed more reliable than 5xx on my "svarten". Last testing: kernel-desktop-6.6.36-2.mga9 + nvidia-current 550.90.07-1.mga9 I had altered two settings in BIOS related to suspend (forgot names...) bit I do not see a difference, I think. With this kernel and driver, it is back to the situation Ii experienced months ago: It succeed to resume after a short suspend sleep, but if I wait hours it resume to black screen. I issued the "REI" part of REISUB, and sddm login appeared and could log in but had no network. Hm, maybe this is the strange monitor state where I also historically could have power cycled the monitor. It is a mystery that this combination fail resuming after hour(s?) sleep and not after a minutes sleep. Due to timeout to undocumented deep sleep in monitor or graphics card? Time/date jump in the system? Chip temperature? - no it works cold start... Too many parameters for effective testing. Anyway, just now I went into BIOS and disabled the floppy driver, see if that goes better. Will see next morning. 6.6.36-2 + 550.90.07-1
Follow-up to the paragraph "Last testing" in Comment 35: I let the system sleep during coffe break... (actually I went eating red currant directly from my plant :) ) ...and now when I got back, the computer woke up to black screen. I powered cycled the monitor and could log in to restored desktop OK. So no hang, and I recognise the situation from half a year or so. Now with desktop kernel 6.6.36-2 + nvidia 550.90.07-1.
Hibernation test full OK on my "svarten". desktop kernel 6.6.36-2 + nvidia 550.90.07-1.
* Versions not in this bug but tests relevant for comparison. * 6.6.36-desktop-2.mga9 + nvidia 555.54.02 suspend-resume to text screen which do not change whatever I press. Notable successive lines in that text: note: irq/33-nvidia[6910] exited with irqs disabled Fixing recursive fault but reboot is needed! BUG: scheduling while atomic: irq/33-nvidia/6910/0x00000000 Issued REISUB, not until the "B" anything visibly happened: black, reboot. --- (In reply to Morgan Leijström from comment #35) > nvidia 555.54.01 is worst so far regarding suspend-resume - it always fail > with kernels 6.6.28 and .34 (not with 5.10.219-desktop-2.lowlatency). It is > also the only fail that returns with a text screen - looking like system > journal but it is not saved as such. odd. Now was running nvidia 555.54.02 + 5.10.219-desktop-2.lowlatency and resuming after lunch went to black screen, power cycling monitor do dot help, had to do full REISUB.
I think the hibernate/suspend discussion deserves a bug of its own, rather than here. This update has already been validated. This ongoing problem has possible sources other than nvidia video drivers. My AMD-based Pavilion, for example, will suspend by closing the lid and resume like a champ, but try to hibernate and it reboots to a new session every time. With no nvidia hardware, nvidia drivers can't be the issue. My Intel-based Probook 6550b has also had problems in this area off and on over the years, some more serious than others. The latest one generated intermittent boot failures that didn't go away until I re-installed Mageia. No nvidia there, either.
An update for this issue has been pushed to the Mageia Updates repository. https://advisories.mageia.org/MGAA-2024-0154.html
Resolution: (none) => FIXEDStatus: NEW => RESOLVED
The suspend-resume issue I see on my system "svarten" seem to be bound to nvidia driver. For some versions it is more or less mitigated by using a non-official kernel. But using free driver nouveau or modesetting, there is no problem when using any kernel. This I have now verified also with kernel-desktop-6.6.36-2.mga9. Currently also no problem when using latest nvidia470, or previous 470 and -current. So the problem that express on my system is reintroduced by the version released in this bug. Now also tested it fail with kernel-desktop-6.1.95-2.lowlatency.mga9-1-1. Also the nvidia newfeature 555.54.02 fail as reported, also tested now with linus 6.6.281, desktop 6.6.36-2, desktop 6.1.95-2.lowlatency. The reason I do not oppose 550.90.07-1 being released is that we have released several versions before incl mga9 release version with this problem, and I have not seen any other user complaining. It seem so rare I do not even think it should be in Errata, until we see at least one more user seeing this. Still, we should fight it as we know very few of users who experience problems report it so we do not know how many are affected. The problem I see is regarding Mageia most closely tied to the nvidia driver. Possibly a combination with nvidia chip, implementation, main board, and kernel. But I see no better place to handle this in our bugzilla, than for the nvidia drivers. So next up is probably reporting in a coming a nvidia-newfeature update bug. For other suspend/hibernation problems yes separate bugs, probably to set to kernel/driver maintainers. Like I had for a specific laptop in Bug 22804. That did not itself help, but some years later problem is gone. Similarly for another laptop Bug 32122. I know I have also commented on other laptops and maybe some stationary. The general feeling is that support for suspend and hibernation have substantially improved last year, after previously having regressed.
Updated https://wiki.mageia.org/en/Setup_the_graphical_server#Known_Nvidia_issues which is linked from Errata.
The problem of my system svarten hanging on resume from suspend with this nvidia-current-550.90.07-1 seem to be resolved by kernels 6.6.37-1 :)