Description of problem: When user select nvidia driver 1) kernel-desktop-devel for at least the running kernel should be installed (maybe for all installed kernels so kernels can be switched, and dkms autorebuild can work) 2) drakx11 should tell user if dkms module build fail. --- Example / how to reproduce I had switched out my nvidia card for a radeon. I removed nvidia packages, and also kernel-desktop-devel, as I thougt I would not need it. However, today I swapped in the nvidia card again to test if nvidia470 module get built with kernel 6.4.16. I used MCC to select "latest legacy", and continued with defaults, and it told i had to reboot. I did notice it was too quick for the module build to have happened, but this is QA, so lets see what happens... First, hit a separate issue: Bug 32351 - Shutdown do not happen after message it reboots due to display driver change Then it booted on nouveau. I installed kernel-desktop-devel, and again in MCC selected "latest legacy", but it did not buöid the module maybe because there was no change in my request. It appparently did set the system to use nvidia470 anyway. Inconsistent though that drakx11 at launch showed "latest legacy" and not nouveau - it use to show what is used. Maybe a side effect of Bug 32351 ? Rebooted, and the module was automatically built, and used.
This need to be fixed. I guess this is no problem if Mageia is installed on a system with nvidia GPU and user during install select proprietary driver. But say user either install Mageia with non-nvidia card and switch to nvidia later, then there is no kernel-devel package. Or he switch to nouveau or modesetting, and clean out unneeded packages, or by missing knowledge delete the devel package. drakx11 need to make sure the -devel package for the running kernel is installed. (Optionally it should install devel packages for all installed kernels, if they are not installed already, after asking, so kmopds can be built when booting then, by dkms autorebuild) also, drakx11 should check the module got built OK and if not switch to use modesetting (or tell user to do so, or just report and make sure every configuration is like before drakx11 was run. Example from today: I removed some kernels and kernel-devel files, installed kernel then used drakx11 to test this again. Output from remove-old-kernels and dkms status: System: Mageia release 9 (Official) for x86_64 | Kärnor i /boot/:5 | AUTO:0 | BEHÅLL:6 ==> kernel-desktop 1 : Behåll : U : kernel-desktop-6.5.13-1.mga9-1-1.mga9.x86_64 tor 14 dec 2023 15:21:45 2 : Behåll : : kernel-desktop-6.5.13-2.mga9-1-1.mga9.x86_64 tor 7 dec 2023 22:05:12 ==> kernel-desktop-devel 1 : Behåll : : kernel-desktop-devel-6.5.13-2.mga9-1-1.mga9.x86_64 tor 7 dec 2023 22:05:16 2 : Behåll : : kernel-desktop-devel-6.4.16-3.mga9.x86_64 ons 29 nov 2023 12:24:09 3 : Behåll : : kernel-desktop-devel-6.5.11-5.mga9.x86_64 mån 27 nov 2023 12:00:43 ==> kernel-linus 1 : Behåll : : kernel-linus-6.5.13-1.mga9.x86_64 tor 14 dec 2023 15:21:50 2 : Behåll : : kernel-linus-6.5.11-2.mga9.x86_64 tor 14 dec 2023 14:58:23 3 : Behåll : : kernel-linus-6.4.9-1.mga9.x86_64 mån 27 nov 2023 12:09:14 ==> kernel-linus-devel 1 : Behåll : : kernel-linus-devel-6.5.13-1.mga9.x86_64 tor 14 dec 2023 15:21:48 2 : Behåll : : kernel-linus-devel-6.5.11-2.mga9.x86_64 mån 27 nov 2023 11:29:57 [morgan@svarten ~]$ dkms status virtualbox, 7.0.12-2.mga9, 6.5.13-desktop-2.mga9, x86_64: installed virtualbox, 7.0.12-2.mga9, 6.4.9-1.mga9, x86_64: installed virtualbox, 7.0.12-2.mga9, 6.5.13-1.mga9, x86_64: installed virtualbox, 7.0.12-2.mga9, 6.5.11-2.mga9, x86_64: installed virtualbox, 7.0.12-2.mga9, 6.5.13-desktop-1.mga9, x86_64: installed-binary from 6.5.13-desktop-1.mga9 System is running the modesetting driver. I execute drakx11 to install nvidia-current, seem to run with no issues - but in journal i see it did not install -devel- package for the running kernel, and it did not even try to build module. [morgan@svarten ~]$ dkms status virtualbox, 7.0.12-2.mga9, 6.5.13-desktop-2.mga9, x86_64: installed virtualbox, 7.0.12-2.mga9, 6.4.9-1.mga9, x86_64: installed virtualbox, 7.0.12-2.mga9, 6.5.13-1.mga9, x86_64: installed virtualbox, 7.0.12-2.mga9, 6.5.11-2.mga9, x86_64: installed nvidia-current, 535.146.02-1.mga9.nonfree: added virtualbox, 7.0.12-2.mga9, 6.5.13-desktop-1.mga9, x86_64: installed-binary from 6.5.13-desktop-1.mga9 [morgan@svarten ~]$ What do it mean by "added" ? in the line "nvidia-current, 535.146.02-1.mga9.nonfree: added" manually installed kernel-desktop-devel-6.5.13-1.mga9-1-1.mga9.x86_64 rebooted... and dkms built the module and it works. But a normal user would have been lost. If journal is wanted, note to self: drakx11 dec 14 16:48:40
Priority: Normal => HighCC: (none) => ghibomgxSeverity: normal => major
Summary: drakx11 do not check if nvidia module really got built => drakx11 do not for nvidia check whether kernel-devel is installed, nor if nvidia module really got built
https://wiki.mageia.org/en/Mageia_9_Errata#Nvidia
Keywords: (none) => IN_ERRATA9
(In reply to Morgan Leijström from comment #1) > drakx11 need to make sure the -devel package for the running kernel is > installed. (Optionally it should install devel packages for all installed > kernels, if they are not installed already, after asking, so kmopds can be > built when booting then, by dkms autorebuild) So one of the problem of the various weakness is this, i.e. kernel is updated, but the corresponding kernel*devel is not correstly installed retrieved, so at next reboot the module is not built and thus drakx11 find something wrong and deconfigure nvidia to nouveau, but imcomplete, so results in a black screen? Is this happening? Adding all these checking stuff to drakx11 is IMHO a too complex task (unless someone with deeper knowledge of it would write the code...). To make things more robust, anyway we need to proceed for stages. First is to update dkms to the latest version. To update to the latest version we need to backport the current mageia specific patches (the mageia patches add some feature which is required for the distro) to the latest series, which is actually dkms-3.0.12: There was an attempt by tmb here, and also myself 3 years ago, but was not complete: https://svnweb.mageia.org/packages/cauldron/dkms/branches/WIP/current/SPECS/dkms.spec?view=log and in the meanwhile dkms bumped from 2.8 to 3.0.12...; it would be also fine once the port is completed if we could send upstream specifically for mageia, so to not have to redo things at any newer dkms series. As alternative what I could write could be some little external script, sort of "nvidia-sanitize", that would check all the nvidia modules built or missed (even digging into ramdisks) and checking there is a matching modules with kernel. The problem is that even with this sort of scripts, things wouldn't be completely foolproof, because it's required the knowledge to switch to console, and run it. Also once system had deconfigured to some other card the script couldn't reconfigure X11 to nvidia.
There may be many reasons a correct version of kernel-devel is missing. Easiset to think is that the sysem have no kernel-devel installed at all, i.e a system with neither nvidia GPU nor nything else that require it, i.e VirtualBox. Then the owner buys and plugs in a nvidia card, and want to use the proprietary driver. So after user have selected yes on the question to use proprietary driver, It is fundamental that drakx11 make sure the -devel- package for the running kernel (at least) is installed, before proceeding. Minimal fix: I believe it is not much work to make drakx11 just check for the -devel- package and a popup asking user to install that package and then try again (and it automatically exit drakx11). It seem to get every other package right. (Hm i have not tried with missing dkms packages) Optionally add a check of output of "dkms status" afterwards, as a catch for some problems. Related, but not same, i also hit Bug 32579 - drakx11 switching nvidia driver, next boot black screen. again, but have no time to dig. Again a similar issue hit a user on upgrade, not related to drakx11 i presume https://forums.mageia.org/en/viewtopic.php?t=15169
The bug Bug 32626 - drakrpm want to install conflicting kernel-devel packages makes this harder to automate, but using the minimal fix i suggest in previous comment we offload that to the user, and help in errata.
CC'ing our Perl gurus for help.
CC: (none) => marja11, perlSource RPM: (none) => drakx-kbd-mouse-x11
Additional test functionality needed: When selecting nvidia-newfeature, drakx11 should also check if it is available, and if not say so with a popup (as it could also do if correct kernel devel is missing), not do any change, return to drakx11 start. From Bug 32565 Comment 52
Summary: drakx11 do not for nvidia check whether kernel-devel is installed, nor if nvidia module really got built => drakx11 do not for nvidia check whether kernel-devel is installed, newfeature exist, nor if nvidia module really got built