Description of problem: Updated last night kernel => 5.10.25-1.mga7 and nvidia => 460.67-1.mga7 and the system does not go. All I get is a black screen with some messages (unfortunately I did not get the messages). I'm using old kernel 5.10.20-2.mga7 and new nvidia-current-460.67-1.mga7 and it is working fine. I've already removed and reinstalled nvidia modules from new kernel tree and it did not help. Version-Release number of selected component (if applicable): kernel-desktop-5.10.25-1.mga7 How reproducible: Steps to Reproduce: 1. 2. 3.
About the messages: they are the same as those one sees when booting kernel-desktop-5.10.20-2.mga7 that works fine. Then, all I can report is that sddm screen does not show up, it is a no go.
Thank you for the report, and sorry about the inconvenience. It is fortunate that you have found that using the older kernel 5.10.20-2.mga7 with the new nvidia-current-460.67-1.mga7 works for you. Assigning this to the kernel team.
Assignee: bugsquad => kernel
There are lines in dmesg that may belong to this problem: [ 23.554664] NVRM: API mismatch: the client has the version 460.67, but NVRM: this kernel module has the version 460.56. Please NVRM: make sure that this kernel module and all NVIDIA driver NVRM: components have the same version. [ 23.554940] NVRM: API mismatch: the client has the version 460.67, but NVRM: this kernel module has the version 460.56. Please NVRM: make sure that this kernel module and all NVIDIA driver NVRM: components have the same version. [ 23.555245] NVRM: API mismatch: the client has the version 460.67, but NVRM: this kernel module has the version 460.56. Please NVRM: make sure that this kernel module and all NVIDIA driver NVRM: components have the same version. [ 23.555575] NVRM: API mismatch: the client has the version 460.67, but NVRM: this kernel module has the version 460.56. Please NVRM: make sure that this kernel module and all NVIDIA driver NVRM: components have the same version.
Which nvidia packages do you have installed? On the only system I have that has a nvidia gpu, I have ... [dave@x8t ~]$ rpm -qa|grep nvidia dkms-nvidia-current-460.67-1.mga8.nonfree nvidia-current-doc-html-460.67-1.mga8.nonfree nvidia-cuda-toolkit-11.2.0-8.mga8.nonfree nvidia-current-utils-460.67-1.mga8.nonfree x11-driver-video-nvidia-current-460.67-1.mga8.nonfree lib64nvidia-egl-wayland1-1.1.5-3.mga8 nvidia-current-cuda-opencl-460.67-1.mga8.nonfree [dave@x8t ~]$ uname -r 5.10.25-desktop-1.mga8
CC: (none) => davidwhodgins
I had this same issue when I upgraded earlier this week. Solved by booting into old kernel and then reinstalling (remove/add) the 5.10.25 kernel packages (desktop and dev). Am happily running on the latest packages now: [gallaghg@Wolverine ~]$ uname -a Linux Wolverine 5.10.25-desktop-1.mga7 #1 SMP Sat Mar 20 17:16:25 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux [gallaghg@Wolverine ~]$ dmesg|grep -i nvid [ 3.246716] nvidia: loading out-of-tree module taints kernel. [ 3.246725] nvidia: module license 'NVIDIA' taints kernel. [ 3.271293] nvidia-nvlink: Nvlink Core is being initialized, major device number 245 [ 3.271644] nvidia 0000:08:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem [ 3.471437] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 460.67 Thu Mar 11 00:11:45 UTC 2021 Not sure why the setup failed initially, however.
CC: (none) => guy.gallagher
(In reply to Adelson Oliveira from comment #3) > There are lines in dmesg that may belong to this problem: > > [ 23.554664] NVRM: API mismatch: the client has the version 460.67, but > NVRM: this kernel module has the version 460.56. Please > NVRM: make sure that this kernel module and all NVIDIA driver > NVRM: components have the same version. > [ 23.554940] NVRM: API mismatch: the client has the version 460.67, but > NVRM: this kernel module has the version 460.56. Please > NVRM: make sure that this kernel module and all NVIDIA driver > NVRM: components have the same version. > [ 23.555245] NVRM: API mismatch: the client has the version 460.67, but > NVRM: this kernel module has the version 460.56. Please > NVRM: make sure that this kernel module and all NVIDIA driver > NVRM: components have the same version. > [ 23.555575] NVRM: API mismatch: the client has the version 460.67, but > NVRM: this kernel module has the version 460.56. Please > NVRM: make sure that this kernel module and all NVIDIA driver > NVRM: components have the same version. This is a transaction ordering issue that happends some times when kernel and nvidia drivers gets updated at the same time... The new kernel got installed and the old dkms-nvidia-current 460.56 rebuilt its module. then kernel posttrans created the initrd, adding the "old" nvidia module. In next transaction the nvidia driver updated itself from 460.56 to 460.67 causing kernel vs userspace mismatch... when this happends, if you get as far as command prompt you should be able to resolve it with a simple "dracut -f" to get newest nvidia driver in initrd and reboot... In worst case you might need to re-trigger dkms build before creating the initrd with: /usr/sbin/dkms_autoinstaller start dracut -f or if you want to do it while runnning an older kernel: /usr/sbin/dkms_autoinstaller start 5.10.25-desktop-1.mga8 dracut -f /boot/initrd-5.10.25-desktop-1.mga8.img 5.10.25-desktop-1.mga8 (just change the "5.10.25-desktop-1.mga8" to match the kernel you want to trigger build for)
FWIW no problems here mga7-64 SDDM Plasma [morgan@svarten ~]$ uname -a Linux svarten.tribun 5.10.25-desktop-1.mga7 #1 SMP Sat Mar 20 17:16:25 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux [morgan@svarten ~]$ rpm -qa|grep nvidia dkms-nvidia-current-460.67-1.mga7.nonfree x11-driver-video-nvidia-current-460.67-1.mga7.nonfree nvidia-current-cuda-opencl-460.67-1.mga7.nonfree nvidia-cuda-toolkit-10.1.168-1.2.mga7.nonfree nvidia-current-utils-460.67-1.mga7.nonfree nvidia-current-doc-html-460.67-1.mga7.nonfree $ sudo journalctl -b | grep NVRM mar 25 23:44:19 svarten.tribun kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module 460.67 Thu Mar 11 00:11:45 UTC 2021
CC: (none) => fri
Yes, I've already noticed that this problem seemed related to updating kernel and nvidia too close in time but I could not confirm this cause-effect relationship. It is good to know! Problem solved with dracut -f in recovery mode. Just to report, the option dkms_autoinstaller start 5.10.25-desktop-1.mga7 in a session with the older kernel didn't work. The output is that the development package for this kernel is not installed although it is installed in fact. Thanks for the information and the solution! Should I mark this as solved or is this made by managers of the bugzilla?
(In reply to Adelson Oliveira from comment #8) > Yes, I've already noticed that this problem seemed related to updating > kernel and nvidia too close in time but I could not confirm this > cause-effect relationship. It is good to know! > > Problem solved with > > dracut -f > > in recovery mode. > > Just to report, the option > > dkms_autoinstaller start 5.10.25-desktop-1.mga7 > > in a session with the older kernel didn't work. The output is that the > development package for this kernel is not installed although it is > installed in fact. > > Thanks for the information and the solution! > > Should I mark this as solved or is this made by managers of the bugzilla? No, thanks reporting this is fixed.
CC: (none) => ouaurelienStatus: NEW => RESOLVEDResolution: (none) => FIXED
(In reply to Thomas Backlund from comment #6) > > This is a transaction ordering issue that happends some times when kernel > and nvidia drivers gets updated at the same time... Thats sad. Do we have a bug report for that?
(In reply to Morgan Leijström from comment #10) > (In reply to Thomas Backlund from comment #6) > > > > This is a transaction ordering issue that happends some times when kernel > > and nvidia drivers gets updated at the same time... > > Thats sad. > > Do we have a bug report for that? There is nothing that can be fixed...
I think I've found the problem. There are two nvidia set of installed modules, one in: /var/lib/dkms/nvidia-current/460.67-1.mga8.nonfree/$(uname -r)/x86_64/module/nvidia-current.ko.xz and one in: /usr/lib/modules/$(uname -r)/dkms/drivers/char/drm/nvidia-current.ko.xz The first one is generated by the command: /usr/sbin/dkms --rpm_safe_upgrade build -m nvidia-current -v 460.67-1.mga8.nonfree and the second one is generated by the command: /usr/sbin/dkms --rpm_safe_upgrade install -m nvidia-current -v 460.67-1.mga8.nonfree --force but only *if and only if* the first command is successful, otherwise the second command it's skipped. Both commands are in the dkms-nvidia-current %postinstall scriptlets. I guess for some reason (machine hang, non 0 return code exit, etc.) the 2nd command was not executed and you remain with an incomplete installation and two module sets mismatching. If you do: modinfo /var/lib/dkms/nvidia-current/460.67-1.mga8.nonfree/$(uname -r)/x86_64/module/nvidia-current.ko.xz | grep ^version and modinfo /usr/lib/modules/$(uname -r)/dkms/drivers/char/drm/nvidia-current.ko.xz | grep ^version you'll get probably 460.67 in the first case and 460.57 in the second. In that case, completing the "interrupted" installation stage with: /usr/sbin/dkms --rpm_safe_upgrade install -m nvidia-current -v 460.67-1.mga8.nonfree --force -k $(uname -r) should fix the problem. However this is a manual fixing. I'll dig to see if something can be done to get stuff more robust (or less weak).
CC: (none) => ghibomgx
Well, as I reported above, I did # dracut -f in recovery mode and now SDDM goes fine. But, anyway, I've tried both modinfo commands as suggested by Giuseppe Ghibò and got only 460.67. That may not surprise since dracut solved the problem ... Thanks any way