Bug 32579 - switching nvidia driver, next boot fail graphical mode for the kernel that was running
Summary: switching nvidia driver, next boot fail graphical mode for the kernel that wa...
Status: NEW
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: 9
Hardware: All Linux
Priority: Normal critical
Target Milestone: ---
Assignee: Mageia tools maintainers
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-11-28 16:31 CET by Morgan Leijström
Modified: 2024-04-22 13:56 CEST (History)
4 users (show)

See Also:
Source RPM: dkms-2.0.19-46.mga9
CVE:
Status comment:


Attachments
Here is the good boot another kernel, dkms-autorebuild works (22.06 KB, application/x-xz)
2024-02-10 15:14 CET, Morgan Leijström
Details
Journal of booting kernel-server 6.6.28 after switching from nvidia-current to nvidia470 - screen black instead of graphics, text vt also black, i REISUB in part to force it down.txt (229.25 KB, text/plain)
2024-04-21 20:59 CEST, Morgan Leijström
Details
Terminal log of drakx11, with checking dkms before and after (27.65 KB, text/plain)
2024-04-21 21:13 CEST, Morgan Leijström
Details
Journal of booting kernel-server 6.6.28 after switching from nvidia-current to nvidia470 - screen black instead of graphics, clicked power button made it shut down cleanly.txt (251.97 KB, text/plain)
2024-04-21 21:47 CEST, Morgan Leijström
Details
Journal booting with modules_blacklist=nouveau. (179.18 KB, text/plain)
2024-04-22 09:49 CEST, Morgan Leijström
Details
Journal booting with module_blacklist=nouveau (173.69 KB, text/plain)
2024-04-22 13:56 CEST, Morgan Leijström
Details

Description Morgan Leijström 2023-11-28 16:31:28 CET Comment hidden (obsolete)
Comment 1 Morgan Leijström 2023-11-28 17:59:41 CET
Scrap Comment 0 - I mixed up which boot was which, at b-1 i manually booted to maintenance.  To investigate further...

Description of problem:
User wanting to test another driver, next boot get to black screen and normal user may not know how to get out of it.

Version-Release number of selected component (if applicable):
drakx-kbd-mouse-x11-1.38-1.mga9

How reproducible:
Always

Steps to Reproduce:
1. Use nvidia-current
2. In MCC select to change driver to nvidia newfeature
3. Reboot
 -> boot ends with black screen instead of DM login.

Easiest way out for common user:
0. Shut down (ctrl-alt-del,del seemed not to work,
   i ended using Alt-prtscr,R,E,I,S,U,B)
1. In boot menu select another kernel
   (and be patient, dkms autorebuild works)
2. Uninstall the problematic kernel
3. Install it again (or another)
4. Reboot to it
   (and be patient, dkms autorebuild works)

--------

Picking up from https://bugs.mageia.org/show_bug.cgi?id=32565#c16 :

When running drakx11, i watched "journalctl -f" that it built module.
Yes all installed kernels have their -devel- packages installed.
And autorebuild works.
It is just that drakx11 gets something wrong for the running kernel.

---

This seem to hit every time i change nvidia proprietary driver (legacy / current / newfeature) using drakx11, so i guess anybody can easily replicate. 

I have not tried to fix anything yet.
Anything else you like me to check?
(give clear simple instructions)

Summary: drakx11 switching nvidia driver makes boot command line wrong for running kernel, boots to black screen. => drakx11 switching nvidia driver, next boot black screen.
Assignee: bugsquad => mageiatools
CC: (none) => ghibomgx

Comment 2 Morgan Leijström 2023-11-28 18:01:18 CET
I think this need to be fixed before shipping nvidia newfeature,
so users do not get hit by this when they want to try it.

Blocks: (none) => 32565

Comment 3 Giuseppe Ghibò 2023-11-28 20:39:29 CET
(In reply to Morgan Leijström from comment #2)

> I think this need to be fixed before shipping nvidia newfeature,
> so users do not get hit by this when they want to try it.

Ok, let's hold on on a while the newfeature. It's on nonfree/updates_testing on the other hand.

BTW, have you checked that also the x11-driver-video-nvidia-newfeature is installed? Can you compare also the installed nvidia stuff from one to another?
I remember there was a change in the nvidia drivers in the latest stage of mga9 where we split a dependency to something not required, for making room on the ISO, I wouldn't that in this way some package is maybe missed.
Comment 4 Morgan Leijström 2023-11-28 22:53:18 CET
Some more input: 
(but a bit vague - home sick but try to do things here and there...)

I updated virtualbox, and by tried the kernel that earlier booted to black screen to verify VB kmod get built and it did; (nvidia not - was already) and then works (!) - no black screen now. Strange.  But i may also be mixing things up.

In journal I see some boots have been without 'nokmsboot' in kernel boot cmdline  and i believe but is not 100% certain that there were the failed boots mentioned.  (I have also tested Xorg modesetting, but think it was earlier).  I guess I can dig deeper and see what was happening that session, but have to leave computer for now.

Now i changed from nvidia current to newfeature, but the problem did NOT hit.
Confusing!

I believe all needed packages are in.

[morgan@svarten ~]$ rpm -qa | grep nvidia
lib64nvidia-egl-wayland1-1.1.11-1.mga9
nvidia-cuda-toolkit-12.2.2-1.mga9.nonfree
dkms-nvidia-newfeature-545.29.06-1.mga9.nonfree
nvidia-newfeature-doc-html-545.29.06-1.mga9.nonfree
nvidia-newfeature-utils-545.29.06-1.mga9.nonfree
x11-driver-video-nvidia-newfeature-545.29.06-1.mga9.nonfree
Comment 5 Morgan Leijström 2023-12-18 14:27:08 CET
Apparently not fault of drakx11:

I was running kernel-desktop-6.5.13-6,
using nvidia-current 535.146.02-1 (both from testing)

then issued 

$ LC_ALL=C sudo urpmi x11-driver-video-nvidia-newfeature

And let it go.  nonfree testing repo enabled, so it got this testing version.

Everything seemed to proceed OK incl building dkms, but next boot with that kernel: instead of sddm appearing, black screen, and no response, could not switch to another tty (ctrl-altF4), issued ctrl-altdel-del, waited, then REISUB, as i did not see anything on screen until the commputer booted again.

Now I booted with another kernel, desktop-6.5.13-5: dkms-autorebuild operated and i got to desktop and am writing this.

So what is failing when changing from one driver to another for running kernel?
- while dkms-autorebuild succeed?

Now:

$ dkms status
virtualbox, 7.0.12-2.mga9, 6.5.13-2.mga9, x86_64: installed 
virtualbox, 7.0.12-2.mga9, 6.5.13-desktop-5.mga9, x86_64: installed 
virtualbox, 7.0.12-2.mga9, 6.5.13-desktop-6.mga9, x86_64: installed 
nvidia-newfeature, 545.29.06-1.mga9.nonfree, 6.5.13-desktop-5.mga9, x86_64: installed
nvidia-newfeature, 545.29.06-1.mga9.nonfree, 6.5.13-desktop-6.mga9, x86_64: installed

___Looking a bit in journal:

Kernel boot command line is identical for booting  6.5.13-desktop-5 and  6.5.13-desktop-6, (except for vmlinuz version -5/-6)


I believe those who knows this can read something interesting in the following line from the failed boot:

dec 18 12:10:19 svarten.tribun (udev-worker)[5588]: nvidia: Process '/usr/bin/bash -c 'for i in $(cat /proc/driver/nvidia/gpus/*/information | grep Minor | cut -d \  -f 4); do /usr/bin/test -c /dev/nvidia${i} || /usr/bin/mknod -Z -m 666 /dev/nvidia${i} c 195 ${i}; done'' failed with exit code 1.


___The failing boot have this in journal:
dec 18 12:10:18 svarten.tribun kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 239
dec 18 12:10:18 svarten.tribun kernel: nvidia 0000:07:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
dec 18 12:10:18 svarten.tribun kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module  545.29.06  Thu Nov 16 01:59:08 UTC 2023
dec 18 12:10:19 svarten.tribun (udev-worker)[5588]: nvidia: Process '/usr/bin/bash -c 'for i in $(cat /proc/driver/nvidia/gpus/*/information | grep > 
dec 18 12:10:19 svarten.tribun kernel: nvidia_modeset: Unknown symbol __acpi_video_get_backlight_type (err -2)
dec 18 12:10:19 svarten.tribun (udev-worker)[5006]: nvidia: Process '/sbin/modprobe nvidia-modeset' failed with exit code 1.
dec 18 12:10:19 svarten.tribun kernel: nvidia_drm: Unknown symbol drm_kms_helper_poll_fini (err -2)
dec 18 12:10:19 svarten.tribun kernel: nvidia_drm: Unknown symbol drm_kms_helper_poll_disable (err -2)
dec 18 12:10:19 svarten.tribun kernel: nvidia_drm: Unknown symbol drm_kms_helper_poll_init (err -2)
dec 18 12:10:19 svarten.tribun kernel: nvidia_drm: Unknown symbol drm_atomic_helper_disable_plane (err -2)
dec 18 12:10:19 svarten.tribun kernel: nvidia_drm: Unknown symbol drm_atomic_helper_shutdown (err -2)
dec 18 12:10:19 svarten.tribun kernel: nvidia_drm: Unknown symbol drm_helper_hpd_irq_event (err -2)
dec 18 12:10:19 svarten.tribun kernel: nvidia_drm: Unknown symbol drm_atomic_helper_check (err -2)
dec 18 12:10:19 svarten.tribun kernel: nvidia_drm: Unknown symbol __drm_atomic_helper_plane_destroy_state (err -2)
dec 18 12:10:19 svarten.tribun kernel: nvidia_drm: Unknown symbol drm_atomic_helper_connector_destroy_state (err -2)
dec 18 12:10:19 svarten.tribun kernel: nvidia_drm: Unknown symbol drm_atomic_helper_plane_reset (err -2)
dec 18 12:10:19 svarten.tribun kernel: nvidia_drm: Unknown symbol drm_helper_mode_fill_fb_struct (err -2)
dec 18 12:10:19 svarten.tribun kernel: nvidia_drm: Unknown symbol drm_atomic_helper_set_config (err -2)
dec 18 12:10:19 svarten.tribun kernel: nvidia_drm: Unknown symbol drm_fbdev_generic_setup (err -2)
dec 18 12:10:19 svarten.tribun kernel: nvidia_drm: Unknown symbol __drm_atomic_helper_crtc_duplicate_state (err -2)
dec 18 12:10:19 svarten.tribun kernel: nvidia_drm: Unknown symbol drm_atomic_helper_connector_duplicate_state (err -2)
dec 18 12:10:19 svarten.tribun kernel: nvidia_drm: Unknown symbol __drm_atomic_helper_plane_duplicate_state (err -2)
dec 18 12:10:19 svarten.tribun kernel: nvidia_drm: Unknown symbol drm_atomic_helper_crtc_reset (err -2)
dec 18 12:10:19 svarten.tribun kernel: nvidia_drm: Unknown symbol __drm_atomic_helper_crtc_destroy_state (err -2)
dec 18 12:10:19 svarten.tribun kernel: nvidia_drm: Unknown symbol drm_kms_helper_hotplug_event (err -2)
dec 18 12:10:19 svarten.tribun kernel: nvidia_drm: Unknown symbol nvKmsKapiGetFunctionsTable (err -2)
dec 18 12:10:19 svarten.tribun kernel: nvidia_drm: Unknown symbol drm_helper_probe_single_connector_modes (err -2)
dec 18 12:10:19 svarten.tribun kernel: nvidia_drm: Unknown symbol drm_atomic_helper_swap_state (err -2)
dec 18 12:10:19 svarten.tribun kernel: nvidia_drm: Unknown symbol drm_atomic_helper_page_flip (err -2)
dec 18 12:10:19 svarten.tribun kernel: nvidia_drm: Unknown symbol drm_atomic_helper_connector_reset (err -2)
dec 18 12:10:19 svarten.tribun kernel: nvidia_drm: Unknown symbol drm_atomic_helper_update_plane (err -2)
dec 18 12:10:19 svarten.tribun (udev-worker)[5006]: nvidia: Process '/sbin/modprobe nvidia-drm' failed with exit code 1.
dec 18 12:10:19 svarten.tribun kernel: nvidia_uvm: module uses symbols nvUvmInterfaceDisableAccessCntr from proprietary module nvidia, inheriting ta>
dec 18 12:10:19 svarten.tribun kernel: nvidia-uvm: Loaded the UVM driver, major device number 237.
dec 18 12:10:19 svarten.tribun kernel: nvidia_modeset: Unknown symbol __acpi_video_get_backlight_type (err -2)
dec 18 12:10:19 svarten.tribun acpid[1359]: client connected from 5346[0:0]
dec 18 12:10:19 svarten.tribun acpid[1359]: 1 client rule loaded
dec 18 12:10:20 svarten.tribun kernel: nvidia_modeset: Unknown symbol __acpi_video_get_backlight_type (err -2)
dec 18 12:10:20 svarten.tribun boinc[5228]: Authorization required, but no authorization protocol specified
dec 18 12:10:20 svarten.tribun boinc[5228]: 18-Dec-2023 12:10:20 [climateprediction.net] Sending scheduler request: To fetch work.
dec 18 12:10:20 svarten.tribun boinc[5228]: 18-Dec-2023 12:10:20 [climateprediction.net] Requesting new tasks for CPU
dec 18 12:10:20 svarten.tribun sddm-helper[5657]: pam_unix(sddm-greeter:session): session opened for user sddm(uid=975) by (uid=0)
dec 18 12:10:20 svarten.tribun systemd-logind[1492]: New session c1 of user sddm.
dec 18 12:10:20 svarten.tribun systemd[1]: Created slice user-975.slice.
dec 18 12:10:20 svarten.tribun systemd[1]: Starting user-runtime-dir@975.service...
dec 18 12:10:20 svarten.tribun systemd[1]: Finished user-runtime-dir@975.service.
dec 18 12:10:20 svarten.tribun systemd[1]: Starting user@975.service...
dec 18 12:10:21 svarten.tribun systemd[5662]: Queued start job for default target default.target.
dec 18 12:10:21 svarten.tribun systemd[5662]: Created slice app.slice.
dec 18 12:10:21 svarten.tribun systemd[5662]: Created slice session.slice.
dec 18 12:10:21 svarten.tribun systemd[5662]: Reached target paths.target.
dec 18 12:10:21 svarten.tribun systemd[5662]: Reached target timers.target.
dec 18 12:10:21 svarten.tribun systemd[5662]: Starting dbus.socket...
dec 18 12:10:21 svarten.tribun systemd[5662]: Listening on pipewire-pulse.socket.
dec 18 12:10:21 svarten.tribun systemd[5662]: Listening on pipewire.socket.
dec 18 12:10:21 svarten.tribun systemd[5662]: Listening on dbus.socket.
dec 18 12:10:21 svarten.tribun systemd[5662]: Reached target sockets.target.
dec 18 12:10:21 svarten.tribun systemd[5662]: Reached target basic.target.
dec 18 12:10:21 svarten.tribun systemd[1]: Started user@975.service.
dec 18 12:10:21 svarten.tribun systemd[1]: Started session-c1.scope.
dec 18 12:10:21 svarten.tribun systemd[5662]: Started pipewire.service.
dec 18 12:10:21 svarten.tribun systemd[5662]: Started wireplumber.service.
dec 18 12:10:21 svarten.tribun systemd[5662]: Started pipewire-pulse.service.
dec 18 12:10:21 svarten.tribun systemd[5662]: Reached target default.target.
dec 18 12:10:21 svarten.tribun systemd[5662]: Startup finished in 301ms.
dec 18 12:10:21 svarten.tribun sddm-helper[5657]: Starting X11 session: "" "/usr/bin/sddm-greeter --socket /tmp/sddm-:0-tyyDjS --theme /usr/share/sd>

And I see it also started the tty4-session i requested (but monitor was still black), and executing my REISUB correctly.

No further errors reported.

But the screen remained black until rebooted and showing the BIOS messages.


___The working boot have this:
dec 18 12:16:21 svarten.tribun kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 239
dec 18 12:16:21 svarten.tribun kernel: nvidia 0000:07:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
dec 18 12:16:21 svarten.tribun kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module  545.29.06  Thu Nov 16 01:59:08 UTC 2023
dec 18 12:16:22 svarten.tribun kernel: nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  545.29.06  Thu Nov 16 01:47:29 >
dec 18 12:16:23 svarten.tribun acpid[8239]: client connected from 12254[0:0]
dec 18 12:16:23 svarten.tribun acpid[8239]: 1 client rule loaded
dec 18 12:16:23 svarten.tribun kernel: [drm] [nvidia-drm] [GPU ID 0x00000700] Loading driver
dec 18 12:16:23 svarten.tribun kernel: [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:07:00.0 on minor 0
dec 18 12:16:24 svarten.tribun sddm-helper[12552]: pam_unix(sddm-greeter:session): session opened for user sddm(uid=975) by (uid=0)
dec 18 12:16:24 svarten.tribun systemd[1]: Created slice user-975.slice.
dec 18 12:16:24 svarten.tribun systemd[1]: Starting user-runtime-dir@975.service...
dec 18 12:16:24 svarten.tribun systemd-logind[8365]: New session c1 of user sddm.
dec 18 12:16:24 svarten.tribun systemd[1]: Finished user-runtime-dir@975.service.
etc like above...



Not knowing better, I point finger at dkms.
Last packager kekepower... that is Jani?

CC: (none) => jani.valimaa
Summary: drakx11 switching nvidia driver, next boot black screen. => switching nvidia driver, next boot fail graphical mode for the kernel that was running
Source RPM: drakx-kbd-mouse-x11-1.38-1.mga9 => dkms-2.0.19-46.mga9

Comment 6 Morgan Leijström 2023-12-18 16:03:19 CET
Interesting:
As i thought i have experienced the unexpected remedy before, i tested again:
Installed the virtualbox for the kernel with nvidia problem:
virtualbox-kernel-desktop-latest-7.0.12-40.mga9.x86_64
- And now that kernel boots OK

So somehow that makes something get corrected.


$ uname -a
Linux svarten.tribun 6.5.13-desktop-6.mga9 #1 SMP PREEMPT_DYNAMIC Sun Dec 17 22:42:25 UTC 2023 x86_64 GNU/Linux

$ dkms status
virtualbox, 7.0.12-2.mga9, 6.5.13-2.mga9, x86_64: installed 
virtualbox, 7.0.12-2.mga9, 6.5.13-desktop-5.mga9, x86_64: installed 
virtualbox, 7.0.12-2.mga9, 6.5.13-desktop-6.mga9, x86_64: installed 
nvidia-newfeature, 545.29.06-1.mga9.nonfree, 6.5.13-desktop-5.mga9, x86_64: installed 
nvidia-newfeature, 545.29.06-1.mga9.nonfree, 6.5.13-desktop-6.mga9, x86_64: installed 
virtualbox, 7.0.12-2.mga9, 6.5.13-desktop-6.mga9, x86_64: installed-binary from 6.5.13-desktop-6.mga9

$ rpm -qa | grep "6.5.13"
kernel-linus-6.5.13-2.mga9
kernel-linus-devel-6.5.13-2.mga9
kernel-desktop-6.5.13-5.mga9-1-1.mga9
kernel-desktop-devel-6.5.13-5.mga9-1-1.mga9
cpupower-6.5.13-6.mga9
kernel-desktop-6.5.13-6.mga9
kernel-desktop-devel-6.5.13-6.mga9
kernel-desktop-devel-latest-6.5.13-6.mga9
kernel-desktop-latest-6.5.13-6.mga9
lib64bpf1-6.5.13-6.mga9
kernel-userspace-headers-6.5.13-6.mga9
virtualbox-kernel-6.5.13-desktop-6.mga9-7.0.12-40.mga9
Comment 7 Morgan Leijström 2023-12-18 16:24:25 CET
(In reply to Giuseppe Ghibò from Bug 32565 comment #22)
> Try to run this command (as root) to see if there are multiple modules:
> 
> find /var /usr/lib/modules -type f -name '*modeset*ko*' -exec sh -c "modinfo
> {} | head -9; echo \"=========\"" \;

Now I more or less accidentally fixed last hickup per comment 6, but anyway:


$ sudo find /var /usr/lib/modules -type f -name '*modeset*ko*' -exec sh -c "modinfo {} | head -9; echo \"=========\"" \;
[sudo] lösenord för morgan: 
filename:       /var/lib/dkms/nvidia-newfeature/545.29.06-1.mga9.nonfree/6.5.13-desktop-5.mga9/x86_64/module/nvidia-newfeature-modeset.ko.xz
version:        545.29.06
supported:      external
license:        NVIDIA
srcversion:     142B5DD68E774FEF417016F
depends:        nvidia,video
retpoline:      Y
name:           nvidia_modeset
vermagic:       6.5.13-desktop-5.mga9 SMP preempt mod_unload 
=========
filename:       /var/lib/dkms/nvidia-newfeature/545.29.06-1.mga9.nonfree/6.5.13-desktop-6.mga9/x86_64/module/nvidia-newfeature-modeset.ko.xz
version:        545.29.06
supported:      external
license:        NVIDIA
srcversion:     142B5DD68E774FEF417016F
depends:        nvidia,video
retpoline:      Y
name:           nvidia_modeset
vermagic:       6.5.13-desktop-6.mga9 SMP preempt mod_unload 
=========
filename:       /usr/lib/modules/6.5.13-desktop-5.mga9/dkms/drivers/char/drm/nvidia-newfeature-modeset.ko.xz
version:        545.29.06
supported:      external
license:        NVIDIA
srcversion:     142B5DD68E774FEF417016F
depends:        nvidia,video
retpoline:      Y
name:           nvidia_modeset
vermagic:       6.5.13-desktop-5.mga9 SMP preempt mod_unload 
=========
filename:       /usr/lib/modules/6.5.13-desktop-6.mga9/dkms/drivers/char/drm/nvidia-newfeature-modeset.ko.xz
version:        545.29.06
supported:      external
license:        NVIDIA
srcversion:     142B5DD68E774FEF417016F
depends:        nvidia,video
retpoline:      Y
name:           nvidia_modeset
vermagic:       6.5.13-desktop-6.mga9 SMP preempt mod_unload 
=========

I will see if i can trig the fault again... switching to nvidia470...
will report back later.
Comment 8 Morgan Leijström 2023-12-18 17:12:39 CET
Sure enough.

Continuing from comment 7, i issued
 urpmi x11-driver-video-nvidia470
And let it run.
Then rebooted to same kernel as was running, 6.5.13-desktop-6, black screen.
REISUB, and selected to boot 6.5.13-desktop-5, dkms-autorebuild did its magic and here I am:

[morgan@svarten ~]$ sudo find /var /usr/lib/modules -type f -name '*modeset*ko*' -exec sh -c "modinfo {} | head -9; echo \"=========\"" \;
[sudo] lösenord för morgan: 
filename:       /var/lib/dkms/nvidia470/470.223.02-1.mga9.nonfree/6.5.13-desktop-5.mga9/x86_64/module/nvidia-modeset.ko.xz
version:        470.223.02
supported:      external
license:        NVIDIA
srcversion:     A2A1167B9A71C47CC50AD79
depends:        nvidia,video
retpoline:      Y
name:           nvidia_modeset
vermagic:       6.5.13-desktop-5.mga9 SMP preempt mod_unload 
=========
filename:       /var/lib/dkms/nvidia470/470.223.02-1.mga9.nonfree/6.5.13-desktop-6.mga9/x86_64/module/nvidia-modeset.ko.xz
version:        470.223.02
supported:      external
license:        NVIDIA
srcversion:     A2A1167B9A71C47CC50AD79
depends:        nvidia,video
retpoline:      Y
name:           nvidia_modeset
vermagic:       6.5.13-desktop-6.mga9 SMP preempt mod_unload 
=========
filename:       /usr/lib/modules/6.5.13-desktop-5.mga9/dkms/drivers/char/drm/nvidia-modeset.ko.xz
version:        470.223.02
supported:      external
license:        NVIDIA
srcversion:     A2A1167B9A71C47CC50AD79
depends:        nvidia,video
retpoline:      Y
name:           nvidia_modeset
vermagic:       6.5.13-desktop-5.mga9 SMP preempt mod_unload 
=========
filename:       /usr/lib/modules/6.5.13-desktop-6.mga9/dkms/drivers/char/drm/nvidia-modeset.ko.xz
version:        470.223.02
supported:      external
license:        NVIDIA
srcversion:     A2A1167B9A71C47CC50AD79
depends:        nvidia,video
retpoline:      Y
name:           nvidia_modeset
vermagic:       6.5.13-desktop-6.mga9 SMP preempt mod_unload 
=========


What to check now?
Jani Välimaa 2023-12-18 17:31:30 CET

CC: jani.valimaa => (none)

Comment 9 Morgan Leijström 2023-12-18 17:38:04 CET
Stig, from dkms update request bug 31835, it seems you know dkms.

CC: (none) => smelror

Comment 10 Giuseppe Ghibò 2023-12-18 17:53:51 CET
(In reply to Morgan Leijström from comment #8)
> Sure enough.
> 
> Continuing from comment 7, i issued
>  urpmi x11-driver-video-nvidia470
> And let it run.
> Then rebooted to same kernel as was running, 6.5.13-desktop-6, black screen.
> REISUB, and selected to boot 6.5.13-desktop-5, dkms-autorebuild did its
> magic and here I am:
> 
> [morgan@svarten ~]$ sudo find /var /usr/lib/modules -type f -name
> '*modeset*ko*' -exec sh -c "modinfo {} | head -9; echo \"=========\"" \;
> [sudo] lösenord för morgan: 
> filename:      
> /var/lib/dkms/nvidia470/470.223.02-1.mga9.nonfree/6.5.13-desktop-5.mga9/
> x86_64/module/nvidia-modeset.ko.xz
> version:        470.223.02
> supported:      external
> license:        NVIDIA
> srcversion:     A2A1167B9A71C47CC50AD79
> depends:        nvidia,video
> retpoline:      Y
> name:           nvidia_modeset
> vermagic:       6.5.13-desktop-5.mga9 SMP preempt mod_unload 
> =========
> filename:      
> /var/lib/dkms/nvidia470/470.223.02-1.mga9.nonfree/6.5.13-desktop-6.mga9/
> x86_64/module/nvidia-modeset.ko.xz
> version:        470.223.02
> supported:      external
> license:        NVIDIA
> srcversion:     A2A1167B9A71C47CC50AD79
> depends:        nvidia,video
> retpoline:      Y
> name:           nvidia_modeset
> vermagic:       6.5.13-desktop-6.mga9 SMP preempt mod_unload 
> =========
> filename:      
> /usr/lib/modules/6.5.13-desktop-5.mga9/dkms/drivers/char/drm/nvidia-modeset.
> ko.xz
> version:        470.223.02
> supported:      external
> license:        NVIDIA
> srcversion:     A2A1167B9A71C47CC50AD79
> depends:        nvidia,video
> retpoline:      Y
> name:           nvidia_modeset
> vermagic:       6.5.13-desktop-5.mga9 SMP preempt mod_unload 
> =========
> filename:      
> /usr/lib/modules/6.5.13-desktop-6.mga9/dkms/drivers/char/drm/nvidia-modeset.
> ko.xz
> version:        470.223.02
> supported:      external
> license:        NVIDIA
> srcversion:     A2A1167B9A71C47CC50AD79
> depends:        nvidia,video
> retpoline:      Y
> name:           nvidia_modeset
> vermagic:       6.5.13-desktop-6.mga9 SMP preempt mod_unload 
> =========
> 
> 
> What to check now?

Check that:

- /etc/modprobe.d/display-driver.conf exists and points (dereferencing) to a valid file in /etc/nvidia-<driver>/modprobe.conf

- /etc/X11/xorg.conf still contains nvidia as Device and not something other

- that the nouveau module is not loaded (lsmod |grep nouveau)

What I wonder here is that if there is something missed, or it's the old problem that sddm starts too fast before all modules are loaded (and that sometimes popups).
Comment 11 Morgan Leijström 2023-12-18 19:38:41 CET
Please do not fullquate when not needed.
In this case could just reference "comment 8"
Less scrolling in the whole long bug.

---

$ ll /etc/modprobe.d/display-driver.conf
lrwxrwxrwx 1 root root 37 dec 14 16:49 /etc/modprobe.d/display-driver.conf -> /etc/alternatives/display-driver.conf

$ ll /etc/alternatives/display-driver.conf
lrwxrwxrwx 1 root root 28 dec 18 15:29 /etc/alternatives/display-driver.conf -> /etc/nvidia470/modprobe.conf

$ ll /etc/nvidia470/modprobe.conf
-rw-r--r-- 1 root root 377 nov  4 13:15 /etc/nvidia470/modprobe.conf

---

In  /etc/X11/xorg.conf:

Section "Device"
    Identifier "device1"
    VendorName "NVIDIA Corporation"
    BoardName "NVIDIA GeForce 745 series and later"
    Driver "nvidia"
    Option "DPMS"
    Option "DynamicTwinView" "false"
    Option "AddARGBGLXVisuals"
EndSection

---

lsmod |grep nouveau
 returns nothing

---

All above is with 6.5.13-desktop-5 booted and nvidia working.
I doubt xorg.conf or the display-driver.conf would be different depending on what i in GRUB choose to boot though. 

When booting 6.5.13-desktop-6 i have no display to work with...
but there is the journal.

What o the following line mean: (seen for boots where the display goes black)

dec 18 14:56:48 svarten.tribun (udev-worker)[5662]: nvidia: Process '/usr/bin/bash -c 'for i in $(cat /proc/driver/nvidia/gpus/*/information | grep Minor | cut -d \  -f 4); do /usr/bin/test -c /dev/nvidia${i} || /usr/bin/mknod -Z -m 666 /dev/nvidia${i} c 195 ${i}; done'' failed with exit code 1.

Similar line in Comment 5, dec 18 12:10:19.
Comment 12 Morgan Leijström 2024-01-23 18:42:14 CET
Was using nvidia470, backport_testing kernel 6.6.11-desktop-1.mga9
used MCC -> drakx11 to switch to nvidia-current updates_testing
Rebooted and this bug did *not* hit this time.
Comment 13 Thomas Andrews 2024-01-24 16:39:48 CET
Well, that could have been a disaster.

MGA9-64 Plasma, Asus Q270M-C motherboard, i5-7500, 48GB DDR4 RAM, wired Internet, nvidia Quadro K620 graphics. My research shows this graphics card is supposed to be covered by any of the three nvidia proprietary drivers we offer at the moment.

This is using a test install, recently created with the netinstall iso. There is but one kernel installed, desktop 6.5.13-6. Nvidia-current was installed at the time of creation, and was functioning perfectly. (Note I say "was")

Attempting to act as an innocent, inexperienced user, I decided to try the "newfeature" driver to see what all the fuss was about. I went to MCC, and selected that driver to be installed. I OKed the options, and went on my way. Drakx11 did NOT ask to download any drivers at all, or to remove anything. 

I was told I had to reboot to make the changes, so I did. I got a plymouth screen that just sat there. I saw no notice about modules building or installing. Pressing esc, I was greeted with line after line saying this:

"The NVIDIA probe routine was not called for 1 device(s)"

Eventually it gave up and tried to finish the boot, failing dramatically. I tried again, checking the boot options in refind, and finding "nokmsboot" was not there. I added it, and removed splash quiet, and proceeded. This one didn't give me the probe routine message, but it failed anyway.

I booted into "failsafe" mode, and tried to use drakx11 to switch to the nouveau driver. Failure. Tried nvidia-modesetting. Failure. Tried to switch back to nvidia-current. Failure again. 

At this point I couldn't think of any other options, except for possibly doing an "upgrade" install with the netinstall iso. I stopped short of doing that, and booted into my main production install on the same hardware, which fortunately is still intact. 

There is something seriously wrong here, to say the least.

CC: (none) => andrewsfarm

Comment 14 Morgan Leijström 2024-01-24 17:00:58 CET
As a start, one idea to get it up is to install at least one extra kernel.
(I suggest last linus - works best with my nvidia)

For internet access, boot to run level 3 (aka multi-user.target)
by appending " 3" without those quotes to the kernel parameters.
Comment 15 Len Lawrence 2024-01-24 20:08:13 CET
Pre-conditions met.  Running nvidia-current : NVIDIA 535.154.05.
Kernel 6.5.13-desktop-6.mga9
Used qarepo to download the RPMs.  MageiaUpdate was unable to install any of them.  This is a second try.  Lost original notes.
This had already been run according to Giuseppe's suggestion in comment 6:
# update-alternatives --set gl_conf /etc/nvidia-newfeature/ld.so.conf

$ dkms status shows:
nvidia-current, 535.154.05-1.mga9.nonfree, 6.5.13-server-6.mga9, x86_64: installed 
nvidia-current, 535.154.05-1.mga9.nonfree, 6.5.13-desktop-6.mga9, x86_64: installed 

Tried forcing it with urpmi in the localrepo directory.
$ sudo urpmi *.rpm
The following packages have to be removed for others to be upgraded:
dkms-nvidia-current-535.154.05-1.mga9.nonfree.x86_64
 (due to conflicts with dkms-nvidia-newfeature)
nvidia-current-doc-html-535.154.05-1.mga9.nonfree.x86_64
 (due to conflicts with nvidia-newfeature-all-545.29.06-1.mga9.nonfree.x86_64)
nvidia-current-utils-535.154.05-1.mga9.nonfree.x86_64
 (due to conflicts with nvidia-newfeature-utils)
x11-driver-video-nvidia-current-535.154.05-1.mga9.nonfree.x86_64
 (due to conflicts with x11-driver-video-nvidia-newfeature) (y/N) 
<Must have answered N>
# urpmi *.rpm
A requested package cannot be installed:
nvidia-newfeature-lib32-545.29.06-1.mga9.nonfree.x86_64 (due to unsatisfied libglx0)
Continue installation anyway? (Y/n) n
# urpmi *.rpm
A requested package cannot be installed:
nvidia-newfeature-lib32-545.29.06-1.mga9.nonfree.x86_64 (due to unsatisfied libglx0)
Continue installation anyway? (Y/n) n
# urpmi libglx0
No package named libglx0

Logging out just now to see what the state is.

CC: (none) => tarazed25

Comment 16 Len Lawrence 2024-01-24 20:12:50 CET
Pre-conditions met.  Running nvidia-current : NVIDIA 535.154.05.
Kernel 6.5.13-desktop-6.mga9
Used qarepo to download the RPMs.  MageiaUpdate was unable to install any of them.  This is a second try.  Lost original notes.
This had already been run according to Giuseppe's suggestion in comment 6:
# update-alternatives --set gl_conf /etc/nvidia-newfeature/ld.so.conf

$ dkms status shows:
nvidia-current, 535.154.05-1.mga9.nonfree, 6.5.13-server-6.mga9, x86_64: installed 
nvidia-current, 535.154.05-1.mga9.nonfree, 6.5.13-desktop-6.mga9, x86_64: installed 

Tried forcing it with urpmi in the localrepo directory.
$ sudo urpmi *.rpm
The following packages have to be removed for others to be upgraded:
dkms-nvidia-current-535.154.05-1.mga9.nonfree.x86_64
 (due to conflicts with dkms-nvidia-newfeature)
nvidia-current-doc-html-535.154.05-1.mga9.nonfree.x86_64
 (due to conflicts with nvidia-newfeature-all-545.29.06-1.mga9.nonfree.x86_64)
nvidia-current-utils-535.154.05-1.mga9.nonfree.x86_64
 (due to conflicts with nvidia-newfeature-utils)
x11-driver-video-nvidia-current-535.154.05-1.mga9.nonfree.x86_64
 (due to conflicts with x11-driver-video-nvidia-newfeature) (y/N) 
<Must have answered N>
# urpmi *.rpm
A requested package cannot be installed:
nvidia-newfeature-lib32-545.29.06-1.mga9.nonfree.x86_64 (due to unsatisfied libglx0)
Continue installation anyway? (Y/n) n
# urpmi *.rpm
A requested package cannot be installed:
nvidia-newfeature-lib32-545.29.06-1.mga9.nonfree.x86_64 (due to unsatisfied libglx0)
Continue installation anyway? (Y/n) n
# urpmi libglx0
No package named libglx0

Logging out just now to see what state the system is in.
Logged back in.
Have to agree with Thomas - there is something odd going on.
Comment 17 Len Lawrence 2024-01-24 21:05:00 CET
Confusion reigns.  I seem to have posted these notes on the wrong bug - switching between this bug and bug 35625 every couple of minutes has not helped.  It so happens that Morgan's list in comment 4 did the trick for 35625 but there were still problems.  Sorry about that - shall transfer all this to the correct bug.
Comment 18 Len Lawrence 2024-01-25 00:09:53 CET
Well it seems that I cannot reproduce this bug.
The only problem seems to be that MageiaUpdate doesn't think it is paid enough.

Started with nvidia-current and installed the newfeature driver using Morgan's list as detailed in bug 32565 comment 32.  Used drakx11 to select a new driver (wrong before - it does list the newfeature driver).  Everything ran smoothly, with a kernel mod rebuild at reboot.  The new driver was installed and sddm accepted the login with the desktop kernel.
$ glmark2 -b refract
=======================================================
    glmark2 2023.01
=======================================================
    OpenGL Information
    GL_VENDOR:      NVIDIA Corporation
    GL_RENDERER:    NVIDIA GeForce GTX 1080 Ti/PCIe/SSE2
    GL_VERSION:     4.6.0 NVIDIA 545.29.06
    Surface Config: buf=32 r=8 g=8 b=8 a=8 depth=24 stencil=0 samples=0
    Surface Size:   800x600 windowed
=======================================================
[refract] <default>: FPS: 5738 FrameTime: 0.174 ms
=======================================================
                                  glmark2 Score: 5737 
=======================================================

$ grep -i nvidia /etc/X11/xorg.conf
    BoardName "NVIDIA Driver: New Feature"
    Driver "nvidia"
$ sudo lsmod | grep nouveau
$ inxi -G
Graphics:
  Device-1: NVIDIA GP102 [GeForce GTX 1080 Ti] driver: nvidia v: 545.29.06
  Display: x11 server: X.org v: 1.21.1.8 with: Xwayland v: 22.1.9 driver: X:
    loaded: modesetting,nvidia,v4l gpu: nvidia resolution: 2560x1440~60Hz
  API: OpenGL v: 4.6.0 NVIDIA 545.29.06 renderer: NVIDIA GeForce GTX 1080

Tried the linus kernel with good results.  No black screen.
Switched to the 6.5.13-server-6.mga9 kernel, saw the newfeature driver being built and installed.  Straight to the desktop; glmark2 => 5803.  Desktop running normally.
Comment 19 Thomas Andrews 2024-01-25 04:00:11 CET
My problem in comment 13 came from not having the newfeature rpms in an active repo. I had misunderstood what was going on. The result was so messed up that I couldn't boot from a new install until I had formatted my /home partition. (I REALLY hope none of our users have fallen into that trap...)

Once I got my system back, and had an enabled qarepo repo with the newfeature rpms in it, with a single kernel installed I was able to switch back and forth between drivers using MCC with no problems.

Tomorrow, once I have restored my /home to the way I want it, I'll install a second kernel and see what happens.
Comment 20 Thomas Andrews 2024-01-25 17:56:02 CET
Through all of the following, I maintained the newfeature rpms in an enabled qarepo repo. (That's important.) Also, all reboots were done with "splash quiet" removed from the kernel options so I could better observe the process. BTW, I am using rEFInd as my bootloader.

I started today with newfeature installed with the desktop kernel. From there I installed kernel-linus and kernel-server. The newfeature module was built and installed on each as part of the installation. I booted into each, with no problems, ending in kernel-linus.

Using XFdrake from the command line so I could read terminal messages, I told kernel-linus to switch to the 470 driver. It did so, building and installing the module in the active kernel as well as deleting the newfeature rpms and the newfeature modules from the other two kernels. Rebooting into kernel-linus was uneventful. The new modules were built and installed as I booted into each of the other kernels in turn.

Then from kernel-server I told XFdrake to go back to the newfeature driver. It did so, building and installing on the active kernel, removing 470 from the others. Rebooting into each went as before. The previously active kernel didn't need to build the modules, but the others did.

I finished from the desktop kernel, switching to nvidia-current, in preparation to test the new update that's waiting. Again, no problems with the switch.

All has worked as expected. Mentioned before, the install was trashed on my original test when I selected the newfeature driver without having the rpms in an active repo. 

XFdrake really should be fixed so that can't happen, either by refusing to change from the current driver or by defaulting to nouveau if there is no proprietary newfeature driver present. But, the situation would be avoided if we were to go ahead and push the newfeature driver from bug 32565, because the trigger would be eliminated.
Comment 21 Morgan Leijström 2024-01-25 21:12:13 CET
(In reply to Thomas Andrews from comment #20)
> Mentioned before, the install was trashed on my
> original test when I selected the newfeature driver without having the rpms
> in an active repo. 

I hit that too, Bug 32352 - drakx11 do not for nvidia check whether kernel-devel is installed, nor if nvidia module really got built

This bug here, 32579, is *with* relevant repos enables, and that is verified by that the system after failed boot can successfully boot to desktop using another kernel, autobuilding the driver during the boot process, so every package needed IS installed and THIS bug still sometimes strike.
Comment 22 Morgan Leijström 2024-01-25 21:55:31 CET
(In reply to Giuseppe Ghibò from bug 32565 comment #37)
> Apparently with this is going unconfigured sometimes during the swap from
> newfeature <-> current with higher chances when the machine is faster.
> 
> I've an idea, see with upcoming nvidia-newfeature-545.29.06-2.mga9.

Worked here :)
By luck and coincidence or a fix I have no idea ;)

Using kernel linus 6.5.13-2.mga9
Before switching: nvidia-current from updates testing. (Was switched to OK from nvidia470 by drakx11 in comment 12, then used other kernels autorebuild OK.)  

nonfree updates_testing repo enabled
(- And yes drakx11 looks there too even if not set as updates repo.)

Launched drakx11, selected Nvidia Newfeature, and let it go.

Reboot same kernel to desktop no problem, nvidia-newfeature-545.29.06 in use :)

Will report on the driver itself in its bug 32565 tomorrow after some testing.
Comment 23 Morgan Leijström 2024-02-03 22:24:53 CET
Sigh

This bug hit now when using experimental kernel 6.6.14-2.s1.
Bug 31695 Comment 44
Was in Plasma desktop, running on nvidia-current-535.154.05-1.mga9, launched drakx11, selected newfeature, let it go.

Upon reboot it ended up in black screen.
In blind I switched to tty4, logged in as root and commanded "reboot" and it rebooted, in GRUB selected kernel 6.6.14-1.s4, and it booted up OK (dkms autorebuild working during boot)
Comment 24 Morgan Leijström 2024-02-10 14:19:19 CET
See Attachment 14360 [details] https://bugs.mageia.org/attachment.cgi?id=14360
by mistake entered in Bug 31695 

"Journal from booting same kernel after switcing nvidia 535 to 545"

Continue from Comment 23

Some times it works, sometimes not, I can not tell a clear pattern.
This time I was running experimental kernel-desktop-6.6.14-2.s1, Bug 31695 Comment 44 and nvidia535, used drakx11 to switch to nvidia-newfeature.
This bug then hit: graphics fail trying to boot that kernel.
OK booting another, dkms-autorebuild works.

This was a while ago, and now I tried again and saved and attach this log.

Instead of showing SDDM login, it boots to black screen with text(?) cursor top left.

I then issued Ctrl-Alt-F4 -> screen full black.
In blined I logged in as root and issued reboot -> worked.
I choosed another kernel (linus) -> dkms autorebuild did its job, and now I
  $ sudo journalctl -b-1 > ThisAttachement  (which I then compressed)
Comment 25 Morgan Leijström 2024-02-10 15:14:44 CET
Created attachment 14361 [details]
Here is the good boot another kernel, dkms-autorebuild works
Comment 26 Morgan Leijström 2024-02-10 15:16:00 CET
When there is problem, it seems it loads the proprietary module OK, but erroneously try to use nouveau?
Comment 27 Morgan Leijström 2024-02-11 15:47:20 CET
Had to use kernel-desktop-6.6.14-2.s1 for testing, Bug 31695 Comment 54
So I urpme and then urpmi this kernel and as expected that workaround works.
Comment 28 Morgan Leijström 2024-03-03 11:22:21 CET
I just confirm it hit again on my usual workstation/test machine svarten, changing from newfeature to 470, both in updates testing, kernel desktop 6.6.18-1.

Way out when screen go black as usual in blind hit Ctrl-Alt-F6, log in as root and issue reboot, in grub select another kernel, and remove and reinstall the problematic one.
Comment 29 Thomas Andrews 2024-03-03 17:30:03 CET
And, I still can't reproduce the issue. 

After re-reading the above comments, I wondered if my usual practice of enabling auto-login was making the difference, so I disabled that while using newfeature and kernel-desktop. Then I used '*470* in qarepo to get the rpms for switching to the driver candidate. 

I ran drakx11 from the terminal (I was asked for the user's password. Seemed strange. Shouldn't it be root's password?) and told it to switch to the 470 driver. When it was finished I reviewed what was in the terminal, and it all looked as it should.

I rebooted into the same desktop kernel (I use rEFInd. Could that make a difference?). There was a brief - 2-3 seconds or less - black screen after plymouth, but then the sddm login screen came up, ready to go. I checked after logging in, and it's running the 470.239.06-1 driver.

BTW, I'm using an Asus Prime Q270M-C motherboard, i5-7500 CPU, Quadro K620 GPU, booting from an NVME SSD. Hardware from a little over 6 years ago, so not exactly high end by today's standards, but doesn't feel like a slouch, either.
Comment 30 Morgan Leijström 2024-03-03 17:52:28 CET
How irritating.  I wonder what makes this fail often (not always) on my system.

Iĺl ask on qa and dev list if more people can try.


(In reply to Thomas Andrews from comment #29)
> And, I still can't reproduce the issue. 
> 
> After re-reading the above comments, I wondered if my usual practice of
> enabling auto-login was making the difference,

Probably not; when it fails here it is before any kind of graphic display beyond grub.

> so I disabled that while
> using newfeature and kernel-desktop. Then I used '*470* in qarepo to get the
> rpms for switching to the driver candidate. 

That is a difference.  I try to always do the user way here by *only* using drakx11 to switch. Having the nonfree updates_testing repo enabled, drakrpm fetch the testing version directly.

> I ran drakx11 from the terminal (I was asked for the user's password. Seemed
> strange. Shouldn't it be root's password?)

If user is a member of the wheel group, it ask users password.
I have it like that.

> and told it to switch to the 470
> driver. When it was finished I reviewed what was in the terminal, and it all
> looked as it should.
> 
> I rebooted into the same desktop kernel (I use rEFInd. Could that make a
> difference?). There was a brief - 2-3 seconds or less - black screen after
> plymouth, but then the sddm login screen came up, ready to go. I checked
> after logging in, and it's running the 470.239.06-1 driver.

I get a few seconds black screen with blinking text cursor (underline style) top left, then all black.


> BTW, I'm using an Asus Prime Q270M-C motherboard, i5-7500 CPU, Quadro K620
> GPU, booting from an NVME SSD. Hardware from a little over 6 years ago, so
> not exactly high end by today's standards, but doesn't feel like a slouch,
> either.

My mobo is older;
$ inxi --machine
Machine:
  Type: Desktop Mobo: ASRock model: P55 Pro serial: <superuser required>
    BIOS: American Megatrends v: P2.60 date: 08/20/2010

CPU: Intel Core i7-870
GPU: NVIDIA GM107 [GeForce GTX 750]
SATA2 SSD using LVM and LUKS
Comment 31 Thomas Andrews 2024-03-03 19:27:22 CET
(In reply to Morgan Leijström from comment #30)
> 
> (In reply to Thomas Andrews from comment #29)
> 
> > so I disabled that while
> > using newfeature and kernel-desktop. Then I used '*470* in qarepo to get the
> > rpms for switching to the driver candidate. 
> 
> That is a difference.  I try to always do the user way here by *only* using
> drakx11 to switch. Having the nonfree updates_testing repo enabled, drakrpm
> fetch the testing version directly.
> 
Except that users don't normally have any of the updates_testing repos enabled. They use the updates repos, where only the packages that have passed our tests can be found. 

After a few bouts with problems caused by leftover packages residing in updates_testing repos, I tend to use qarepo to just get the packages I want to test.nonfree testing isn't as bad in that regard as is core, but the possibility still exists. I still use drakx11 to do the actual switch.

> > I ran drakx11 from the terminal (I was asked for the user's password. Seemed
> > strange. Shouldn't it be root's password?)
> 
> If user is a member of the wheel group, it ask users password.
> I have it like that.
> 
But I don't. I only put a user in the wheel group when I test sudo. I don't care for sudo for general use, so I have no need to put my user in the wheel group. Checking authentication of Mageia tools in MCC, drakx11 is set to "default" just like all the rest of those settings. There's probably a reason that's the default, lost in the annals of time. I can change it if I want.
Comment 32 Morgan Leijström 2024-03-03 19:36:17 CET
(In reply to Thomas Andrews from comment #31)

> Except that users don't normally have any of the updates_testing repos

Of course. But that is a way to make drakx11 install the packages more like it do for users.

Anyway, this bug shows up occasionally regardless if the driver is in testing, or released.

For testing this bug, drivers from normal updates can be used.

It is just that i repeatedly hit this bug when i QA test nividia :(
Comment 33 Morgan Leijström 2024-03-03 19:36:33 CET
And now no problem on same system, running kernel linus 6.6.18-1, using drak11 to switch from 470.239.06 to 550.54.14.

Go figure...

It seems this bug join other bugs that shows up intermittently even on same systems, like Bug 32805 and Bug 32185 ...
Thomas Andrews 2024-03-05 15:53:38 CET

Blocks: 32565 => (none)

Comment 34 Morgan Leijström 2024-04-21 20:59:36 CEST Comment hidden (obsolete)
Comment 35 Morgan Leijström 2024-04-21 21:13:22 CEST
Created attachment 14506 [details]
Terminal log of drakx11, with checking dkms before and after

This is a copy of the Konsole terminal from the run of drakx11

There is one thing that *may* be relevant, but probably not:
At shutting down, it sat a minute at terminating tasks, something about X11.
I believe this often happens when switching driver while in desktop, some rest of

Bug 30727 - After updating nvidia-current, Plasma can not log out, reboot, etc until reboot.

After a minute i got tired (as i do in those cases) and 
did not want to wait for the timeout, and issued Alt+{R, E}  
(only that first part of REISUB)
- and it seemed to shut down cleanly.
Comment 36 Morgan Leijström 2024-04-21 21:47:53 CEST
Created attachment 14507 [details]
Journal of booting kernel-server 6.6.28 after switching from nvidia-current to nvidia470 - screen black instead of graphics, clicked power button made it shut down cleanly.txt

Attachment 14505 is obsolete: 0 => 1

Comment 37 Morgan Leijström 2024-04-21 21:51:11 CEST
I have verified both linus and desktop 6.6.28-1 kernels runs nicely with nvidia470 built automatically when booting them.

I will leave it as this is for a while if someone wants to see what have gone wrong with the setup for server kernel.
Comment 38 Thomas Andrews 2024-04-21 22:20:40 CEST
Once again I can't reproduce the issue. I used drakx11 to switch to nvidia470 while in the 6.6.28 server kernel, then rebooted easily back into the same kernel. 

But there is a difference between our systems that eluded me until now: Reviewing the hardware differences in comment 30, it suddenly struck me that your system is probably a legacy system, using grub for the bootloader, while mine is UEFI and I use rEFInd. I don't know if that would make the difference, but I would think it's worth thinking about.
Comment 39 Giuseppe Ghibò 2024-04-21 22:53:22 CEST
This is pretty hard. The problem when switching driver from within X11 is "probably" that it removes kernel modules from disks while in use, and also can't unload every module from memory because they are in use (to keep X up). That's what's problably causes the lockup. A more clean way would be to do the switch while in console mode (i.e. init level 3/non graphical.target), and see if that's a bit better. But then in console mode drakx11 would show the other old cursor bug...

I tried to have all the modules (including the submodules, like nvidia-modeset, nvidia-uvm) also renamed as "versioned" (e.g. nvidia470-modeset, nvidia470-uvm, etc.) and not just the main one, but it didn't helped and sometimes got worst.

For the 2nd problem, probably during the switch there is some "frame" where it goes unconfigured, and during that probably some fallback intervenes.

You might try to look when it happens: 

- /usr/sbin/update-alternatives --display gl_conf and if it's needed to set again with update-alternatives --set gl_conf /etc/nvidia470/ld.so.conf (or update-alternatives --set gl_conf /etc/nvidia-current/ld.so.conf).

- boot adding modules_blacklist=nouveau to kernel cmdline
Comment 40 Morgan Leijström 2024-04-22 09:31:48 CEST
(In reply to Thomas Andrews from comment #38)
> Once again I can't reproduce the issue. I used drakx11 to switch to
> nvidia470 while in the 6.6.28 server kernel, then rebooted easily back into
> the same kernel. 

Lucky you :)

What is the output of 
 /usr/sbin/update-alternatives --display gl_conf
on your machine?

 
> But there is a difference between our systems that eluded me until now:
> Reviewing the hardware differences in comment 30, it suddenly struck me that
> your system is probably a legacy system, using grub for the bootloader,
> while mine is UEFI and I use rEFInd. I don't know if that would make the
> difference, but I would think it's worth thinking about.

Yes mine is using legacy boot, Grub2

Other difference may be that this system i have a separate ext4 /boot partition, and everything else (/, swap, swap2 *), /home) is ext4 partitions in a LVM, using a LUKS encrypted partition on a SSD.  There is also a big rust drive for big data like ISOs, backups, ...

*) i have two swap partitions as a means to be able to delete one and increase another partition if ever needed, and i have years ago reported problems with diskdrake about manipulating swap in LVM.

(In reply to Giuseppe Ghibò from comment #39)
...
> For the 2nd problem, probably during the switch there is some "frame" where
> it goes unconfigured, and during that probably some fallback intervenes.
> 
> You might try to look when it happens: 
> 
> - /usr/sbin/update-alternatives --display gl_conf

While now running on desktop kernel:
[morgan@svarten ~]$ LC_ALL=C /usr/sbin/update-alternatives --display gl_conf
gl_conf - status is manual.
 link currently points to /etc/nvidia470/ld.so.conf
/etc/ld.so.conf.d/GL/standard.conf - priority 500
 follower nvidia-settings.xinit: (null)
 follower display-driver.conf: (null)
/etc/nvidia470/ld.so.conf - priority 9700
 follower nvidia-settings.xinit: /etc/nvidia470/nvidia-settings.xinit
 follower display-driver.conf: /etc/nvidia470/modprobe.conf
Current `best' version is /etc/nvidia470/ld.so.conf.


> - boot adding modules_blacklist=nouveau to kernel cmdline

I will next try to boot the server flavour with that.
Comment 41 Morgan Leijström 2024-04-22 09:49:55 CEST Comment hidden (obsolete)
Comment 42 Morgan Leijström 2024-04-22 13:56:57 CEST
Created attachment 14510 [details]
Journal booting with module_blacklist=nouveau

Now with the correct command module_blacklist=nouveau (not module*s*_)

Still boots to black screen, power button short click shuts down cleanly.

Attachment 14509 is obsolete: 0 => 1


Note You need to log in before you can comment on or make changes to this bug.