Bug 28035

Summary: Nvidia drivers x11-driver-video-nvidia-current-455.55-1.mga8.nonfree, bad udev rules at boot
Product: Mageia Reporter: Aurelien Oudelet <ouaurelien>
Component: RPM PackagesAssignee: Kernel and Drivers maintainers <kernel>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: Normal CC: ghibomgx
Version: Cauldron   
Target Milestone: Mageia 8   
Hardware: All   
OS: Linux   
Whiteboard:
Source RPM: x11-driver-video-nvidia-current-455.55-1.mga8.nonfree CVE:
Status comment:
Attachments: packages installed or updated since january 4th 2021.

Description Aurelien Oudelet 2021-01-07 11:28:01 CET
Created attachment 12190 [details]
packages installed or updated since january 4th 2021.

Bad udev warning since Kernel 5.10.4-4.mga8 (also with kernel 5.10.5-1.mga8)

systemd-udevd[554]: nvidia: Process '/usr/bin/bash -c '/usr/bin/test -c /dev/nvidiactl || /usr/bin/mknod -Z -m 666 /dev/nvidiactl c 195 255'' failed with exit code 1.
janv. 05 18:01:09 mageia.local systemd-udevd[539]: nvidia: Process '/usr/bin/bash -c 'for i in $(cat /proc/driver/nvidia/gpus/*/information | grep Minor | cut -d \  -f 4); do /usr/bin/test -c /dev/nvidia${i} || /usr/bin/mknod -Z -m 666 /dev/nvidia${i} c 195 ${i}; done'' failed with exit code 1.

Nothing has changed with nvidia driver since beta2.
Adding rpm -qa --last output since January 4th 2021.

nvidia drivers seems to works fine with Plasma meanwhile.
Compositor and 3D effects run well.

Assigning to Kernel and Drivers team.
Comment 1 Giuseppe Ghibò 2021-01-07 11:56:25 CET
Which version of systemd are you using? 246.9-2 or the retired 247-1 that spotted for a while in updates_testing?
Comment 2 Aurelien Oudelet 2021-01-07 11:59:04 CET
@Giuseppe:
systemd 246.9-2
Comment 3 Giuseppe Ghibò 2021-01-07 12:07:41 CET
Can you track down in journal to some older systemd upgrade?

Furthermore does the same occur with nvidia 460.27.04 in nonfree/updates_testing? 460.27.04 was a beta driver, that's why has not yet been pushed to nonfree/release, but it's pretty close to the next upcoming 460.xx. Updates_testing has been cleaned right now, but here you can still find it:

http://ftp.belnet.be/mirror/mageia/distrib/cauldron/x86_64/media/nonfree/updates_testing/
Comment 4 Aurelien Oudelet 2021-01-07 12:58:05 CET
(In reply to Giuseppe Ghibò from comment #3)
> Can you track down in journal to some older systemd upgrade?
> 
> Furthermore does the same occur with nvidia 460.27.04 in
> nonfree/updates_testing? 460.27.04 was a beta driver, that's why has not yet
> been pushed to nonfree/release, but it's pretty close to the next upcoming
> 460.xx. Updates_testing has been cleaned right now, but here you can still
> find it:
> 
> http://ftp.belnet.be/mirror/mageia/distrib/cauldron/x86_64/media/nonfree/
> updates_testing/

Yeah you're right:
rpm -qa --last | grep systemd
systemd-246.9-2.mga8.x86_64                   lun. 04 janv. 2021 10:18:17
lib64systemd0-246.9-2.mga8.x86_64             lun. 04 janv. 2021 10:18:10

So this is related to him.

Udev errors appear right after applying this systemd update.
Comment 5 Aurelien Oudelet 2021-01-07 13:04:38 CET
[RPM][9742]: erase dkms-nvidia-current-455.45.01-1.mga8.nonfree.x86_64: success
janv. 07 13:03:57 mageia.local [RPM][9742]: erase nvidia-current-doc-html-455.45.01-1.mga8.nonfree.x86_64: success
janv. 07 13:03:57 mageia.local [RPM][9742]: erase nvidia-current-utils-455.45.01-1.mga8.nonfree.x86_64: success
janv. 07 13:03:57 mageia.local [RPM][9742]: install x11-driver-video-nvidia-current-460.27.04-1.mga8.nonfree.x86_64: success
janv. 07 13:03:58 mageia.local systemd[1]: Started /usr/bin/systemctl start man-db-cache-update.
janv. 07 13:03:58 mageia.local systemd[1]: Starting man-db-cache-update.service...
janv. 07 13:03:58 mageia.local [RPM][9742]: install nvidia-current-utils-460.27.04-1.mga8.nonfree.x86_64: success
janv. 07 13:03:58 mageia.local systemd[1]: Started /usr/bin/systemctl start man-db-cache-update.
janv. 07 13:03:58 mageia.local [RPM][9742]: install dkms-nvidia-current-460.27.04-1.mga8.nonfree.x86_64: success
janv. 07 13:03:58 mageia.local [RPM][9742]: install nvidia-current-doc-html-460.27.04-1.mga8.nonfree.x86_64: success
janv. 07 13:03:58 mageia.local [RPM][9742]: install nvidia-current-cuda-opencl-460.27.04-1.mga8.nonfree.x86_64: success
janv. 07 13:03:58 mageia.local [RPM][9742]: install x11-driver-video-nvidia-current-460.27.04-1.mga8.nonfree.x86_64: success
janv. 07 13:03:58 mageia.local [RPM][9742]: Transaction ID 5ff6f810 finished: 0

Reboot
Comment 6 Aurelien Oudelet 2021-01-07 13:09:35 CET
Still a udev line error with 460.27.04

$ journalctl -b | grep nvidia

janv. 07 13:05:54 mageia.local kernel: Command line: BOOT_IMAGE=/vmlinuz-5.10.5-desktop-1.mga8 root=UUID=7c97e985-1ddb-4058-a85d-611fdaa4e144 ro nouveau.modeset=0 nvidia.modeset=1 noiswmd resume=UUID=235f257b-6f19-4537-af65-4655fb448221 audit=0
janv. 07 13:05:54 mageia.local kernel: nvidia-gpu 0000:01:00.3: enabling device (0000 -> 0002)
janv. 07 13:05:55 mageia.local kernel: nvidia: loading out-of-tree module taints kernel.
janv. 07 13:05:55 mageia.local kernel: nvidia: module license 'NVIDIA' taints kernel.
janv. 07 13:05:55 mageia.local kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 242
janv. 07 13:05:55 mageia.local kernel: nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem

janv. 07 13:05:55 mageia.local systemd-udevd[562]: nvidia: Process '/usr/bin/bash -c 'for i in $(cat /proc/driver/nvidia/gpus/*/information | grep Minor | cut -d \  -f 4); do /usr/bin/test -c /dev/nvidia${i} || /usr/bin/mknod -Z -m 666 /dev/nvidia${i} c 195 ${i}; done'' failed with exit code 1.

janv. 07 13:05:56 mageia.local kernel: nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  460.27.04  Fri Dec 11 23:24:19 UTC 2020
janv. 07 13:05:57 mageia.local kernel: [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
janv. 07 13:05:57 mageia.local kernel: [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 0
janv. 07 13:05:57 mageia.local kernel: nvidia_uvm: module uses symbols from proprietary module nvidia, inheriting taint.
janv. 07 13:05:57 mageia.local kernel: nvidia-uvm: Loaded the UVM driver, major device number 239.
janv. 07 13:06:06 mageia.local dkms-autorebuild.sh[819]: nvidia-current (460.27.04-1.mga8.nonfree): Already installed on this kernel.
Comment 7 Thomas Backlund 2021-01-07 13:50:07 CET
so what part of the for loop:
'/usr/bin/bash -c 'for i in $(cat /proc/driver/nvidia/gpus/*/information | grep Minor | cut -d \  -f 4); do /usr/bin/test -c /dev/nvidia${i} || /usr/bin/mknod -Z -m 666 /dev/nvidia${i} c 195 ${i}; done'' failed with exit code 1.

is failing ?


Someone with nvidia hw need to debug this.

I've gotten rid of all my nVidia hw as I dont want to have to rely on the proprietary drivers anymore now that there is a good Amd alternative out with open source support even before hardware launch day :)
Comment 8 Giuseppe Ghibò 2021-01-07 14:33:03 CET
What is changed between systemd-246-1.mga8 and systemd-246-2.mga8? Has it introduced some restrictions about some character usage? E.g. $ ' \ ;
Comment 9 Thomas Backlund 2021-01-07 14:46:55 CET
(In reply to Aurelien Oudelet from comment #4)

> 
> Yeah you're right:
> rpm -qa --last | grep systemd
> systemd-246.9-2.mga8.x86_64                   lun. 04 janv. 2021 10:18:17
> lib64systemd0-246.9-2.mga8.x86_64             lun. 04 janv. 2021 10:18:10
> 
> So this is related to him.
> 
> Udev errors appear right after applying this systemd update.

I think this is a red herring.

If you mean it showed up in logs directly after installing this, then that's because systemd respawns its daemons on update, so udev would re-trigger the rules...

The only change betwteen -1.mga8 and 2.mga8 is temporary removal of log warning: "SysV service '%s' lacks a native systemd unit file.", to avoid needless bugreports, as we wont fix all of them before Mageia 8 is released
Comment 10 Thomas Backlund 2021-01-07 14:59:10 CET
(In reply to Aurelien Oudelet from comment #0)
> Created attachment 12190 [details]
> packages installed or updated since january 4th 2021.
> 
> Bad udev warning since Kernel 5.10.4-4.mga8 (also with kernel 5.10.5-1.mga8)
>

Can you confirm this ? that it started with 5.10.4-4 and booting to 5.10.4-3 silences the warning ?
Comment 11 Aurelien Oudelet 2021-01-07 15:32:11 CET
Will do test tonight, after work.
Comment 12 Thomas Backlund 2021-01-07 17:06:55 CET
And if 5.10.4-3 works, does 5.10.5-1.1 from:
https://tmb.nu/Mageia/Cauldron/bugs/28035/

work too ?
Comment 13 Aurelien Oudelet 2021-01-07 21:49:28 CET
Booting with 5.10.4-3.mga8: no udev warnings.

janv. 07 21:36:41 mageia.local kernel: Kernel command line: BOOT_IMAGE=/vmlinuz-5.10.4-desktop-3.mga8 root=UUID=7c97e985-1ddb-4058-a85d-611fdaa4e144 ro nouveau.modeset=0 nvidia.modeset=1 splash quiet noiswmd resume=UUID=235f257b-6f19-4537-af65-4655fb448221 audit=0 vga=791
janv. 07 21:36:41 mageia.local kernel: nvidia-gpu 0000:01:00.3: enabling device (0000 -> 0002)
janv. 07 21:36:45 mageia.local dkms-autorebuild.sh[779]: nvidia-current (460.27.04-1.mga8.nonfree): Installing module.
janv. 07 21:36:45 mageia.local dkms-autorebuild.sh[779]: dkms build -m nvidia-current -v 460.27.04-1.mga8.nonfree -k 5.10.4-desktop-3.mga8 -a x86_64 -q --no-clean-kernel
janv. 07 21:37:43 mageia.local dkms-autorebuild.sh[779]: dkms install -m nvidia-current -v 460.27.04-1.mga8.nonfree -k 5.10.4-desktop-3.mga8 -a x86_64 -q
janv. 07 21:37:51 mageia.local kernel: nvidia: loading out-of-tree module taints kernel.
janv. 07 21:37:51 mageia.local kernel: nvidia: module license 'NVIDIA' taints kernel.
janv. 07 21:37:51 mageia.local kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 242
janv. 07 21:37:51 mageia.local kernel: nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
janv. 07 21:37:52 mageia.local kernel: nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  460.27.04  Fri Dec 11 23:24:19 UTC 2020
janv. 07 21:37:53 mageia.local kernel: [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
janv. 07 21:37:53 mageia.local kernel: [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 0
janv. 07 21:37:54 mageia.local kernel: nvidia_uvm: module uses symbols from proprietary module nvidia, inheriting taint.
janv. 07 21:37:54 mageia.local kernel: nvidia-uvm: Loaded the UVM driver, major device number 239.
janv. 07 21:43:29 mageia.local org.freedesktop.FileManager1[10490]: nvidia-current (460.27.04-1.mga8.nonfree): Installing module.

Booting with 5.10.5-1.1.mga8 from Comment 12:
janv. 07 21:45:41 mageia.local kernel: Kernel command line: BOOT_IMAGE=/vmlinuz-5.10.5-desktop-1.1.mga8 root=UUID=7c97e985-1ddb-4058-a85d-611fdaa4e144 ro nouveau.modeset=0 nvidia.modeset=1 splash quiet noiswmd resume=UUID=235f257b-6f19-4537-af65-4655fb448221 audit=0 vga=791
janv. 07 21:45:42 mageia.local kernel: nvidia-gpu 0000:01:00.3: enabling device (0000 -> 0002)
janv. 07 21:45:43 mageia.local kernel: nvidia: loading out-of-tree module taints kernel.
janv. 07 21:45:43 mageia.local kernel: nvidia: module license 'NVIDIA' taints kernel.
janv. 07 21:45:43 mageia.local kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 242
janv. 07 21:45:43 mageia.local kernel: nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
janv. 07 21:45:43 mageia.local systemd-udevd[565]: nvidia: Process '/usr/bin/bash -c '/usr/bin/test -c /dev/nvidiactl || /usr/bin/mknod -Z -m 666 /dev/nvidiactl c 195 255'' failed with exit code 1.
janv. 07 21:45:43 mageia.local systemd-udevd[570]: nvidia: Process '/usr/bin/bash -c 'for i in $(cat /proc/driver/nvidia/gpus/*/information | grep Minor | cut -d \  -f 4); do /usr/bin/test -c /dev/nvidia${i} || /usr/bin/mknod -Z -m 666 /dev/nvidia${i} c 195 ${i}; done'' failed with exit code 1.
janv. 07 21:45:43 mageia.local kernel: nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  460.27.04  Fri Dec 11 23:24:19 UTC 2020
janv. 07 21:45:44 mageia.local kernel: [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
janv. 07 21:45:44 mageia.local kernel: [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 0
janv. 07 21:45:45 mageia.local kernel: nvidia_uvm: module uses symbols from proprietary module nvidia, inheriting taint.
janv. 07 21:45:45 mageia.local kernel: nvidia-uvm: Loaded the UVM driver, major device number 239.
janv. 07 21:45:48 mageia.local dkms-autorebuild.sh[833]: nvidia-current (460.27.04-1.mga8.nonfree): Already installed on this kernel.
Comment 14 Aurelien Oudelet 2021-01-07 21:53:04 CET
Also, redoing the mentioned udev warnings as root produces nothing in system journal.

This is strange.
Comment 15 Thomas Backlund 2021-01-07 22:23:16 CET
Aother test...

make a backup of the initrd for 5.10.4-3.mga8, then recreate it an reboot...
does it still work ?
Comment 16 Aurelien Oudelet 2021-01-07 22:37:05 CET
Booted on 5.10.4-3.mga8.

# cp /boot/initrd-5.10.4-desktop-3.mga8.img /boot/initrd-5.10.4-desktop-3.mga8.img.back

# dracut --force

initrd regenerated. (see timestamps updated with ll)
Rebooted with it.

$ journalctl -b | grep nvidia
janv. 07 22:32:36 mageia.local kernel: Kernel command line: BOOT_IMAGE=/vmlinuz-5.10.4-desktop-3.mga8 root=UUID=7c97e985-1ddb-4058-a85d-611fdaa4e144 ro nouveau.modeset=0 nvidia.modeset=1 splash quiet noiswmd resume=UUID=235f257b-6f19-4537-af65-4655fb448221 audit=0 vga=791
janv. 07 22:32:36 mageia.local kernel: nvidia-gpu 0000:01:00.3: enabling device (0000 -> 0002)
janv. 07 22:32:37 mageia.local kernel: nvidia: loading out-of-tree module taints kernel.
janv. 07 22:32:37 mageia.local kernel: nvidia: module license 'NVIDIA' taints kernel.
janv. 07 22:32:37 mageia.local kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 242
janv. 07 22:32:37 mageia.local kernel: nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
janv. 07 22:32:38 mageia.local kernel: nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  455.45.01  Thu Nov  5 22:55:44 UTC 2020
janv. 07 22:32:39 mageia.local kernel: [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
janv. 07 22:32:39 mageia.local kernel: [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 0
janv. 07 22:32:40 mageia.local kernel: nvidia_uvm: module uses symbols from proprietary module nvidia, inheriting taint.
janv. 07 22:32:40 mageia.local kernel: nvidia-uvm: Loaded the UVM driver, major device number 240.
janv. 07 22:32:44 mageia.local dkms-autorebuild.sh[827]: nvidia-current (455.45.01-1.mga8.nonfree): Already installed on this kernel.

No errors.
Comment 17 Thomas Backlund 2021-01-07 23:01:47 CET
ok,
so that confirms it's the kernel.

a new test, does 5.10.5-1.2 from:
https://tmb.nu/Mageia/Cauldron/bugs/28035/ 

work ?
Comment 18 Aurelien Oudelet 2021-01-07 23:38:06 CET
(In reply to Thomas Backlund from comment #17)
> ok,
> so that confirms it's the kernel.
> 
> a new test, does 5.10.5-1.2 from:
> https://tmb.nu/Mageia/Cauldron/bugs/28035/ 
> 
> work ?

This works ! (Note that I also downgraded to the nvidia-current that is in nonfree release repo).

janv. 07 23:34:42 mageia.local kernel: Kernel command line: BOOT_IMAGE=/vmlinuz-5.10.5-desktop-1.2.mga8 root=UUID=7c97e985-1ddb-4058-a85d-611fdaa4e144 ro nouveau.modeset=0 nvidia.modeset=1 splash quiet noiswmd resume=UUID=235f257b-6f19-4537-af65-4655fb448221 audit=0 vga=791
janv. 07 23:34:42 mageia.local kernel: nvidia-gpu 0000:01:00.3: enabling device (0000 -> 0002)
janv. 07 23:34:44 mageia.local kernel: nvidia: loading out-of-tree module taints kernel.
janv. 07 23:34:44 mageia.local kernel: nvidia: module license 'NVIDIA' taints kernel.
janv. 07 23:34:44 mageia.local kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 242
janv. 07 23:34:44 mageia.local kernel: nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
janv. 07 23:34:44 mageia.local kernel: nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  455.45.01  Thu Nov  5 22:55:44 UTC 2020
janv. 07 23:34:45 mageia.local kernel: [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
janv. 07 23:34:45 mageia.local kernel: [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 0
janv. 07 23:34:46 mageia.local kernel: nvidia_uvm: module uses symbols from proprietary module nvidia, inheriting taint.
janv. 07 23:34:46 mageia.local kernel: nvidia-uvm: Loaded the UVM driver, major device number 240.
janv. 07 23:34:50 mageia.local dkms-autorebuild.sh[818]: nvidia-current (455.45.01-1.mga8.nonfree): Already installed on this kernel.


Should I uninstall 5.10.5-1.2 from: https://tmb.nu/Mageia/Cauldron/bugs/28035/ right after so you can push an update to BS?
Comment 19 Aurelien Oudelet 2021-01-07 23:40:44 CET
Note to Guiseppe that 
https://www.nvidia.com/en-us/drivers/unix/

Linux x86_64/AMD64/EM64T
Latest Production Branch Version: 460.32.03
Latest New Feature Branch Version: 455.45.01

I really wonder what is they name "Production Branch version". New LTS?
Comment 20 Giuseppe Ghibò 2021-01-07 23:42:18 CET
Yes, it's the new LTS. I've updated in the SVN 5 minutes ago. I was writing the freeze push request documentation...
Comment 21 Giuseppe Ghibò 2021-01-07 23:50:07 CET
Well, there is also a new, 390.141...
Comment 22 Thomas Backlund 2021-01-08 00:11:50 CET
(In reply to Aurelien Oudelet from comment #18)
> (In reply to Thomas Backlund from comment #17)
> > ok,
> > so that confirms it's the kernel.
> > 
> > a new test, does 5.10.5-1.2 from:
> > https://tmb.nu/Mageia/Cauldron/bugs/28035/ 
> > 
> > work ?
> 
> This works ! (Note that I also downgraded to the nvidia-current that is in
> nonfree release repo).

Great.

so a "fix" for a kernel behaviour that dates back to kernel 2.6 series for udev module loading timings, request originating from a systemd/udev bugreport gives us griefs :)


> Should I uninstall 5.10.5-1.2 from:
> https://tmb.nu/Mageia/Cauldron/bugs/28035/ right after so you can push an
> update to BS?

No need, you can keep running that for now.
The next kernel that will land in cauldron is 5.10.6-1
Comment 23 Aurelien Oudelet 2021-01-11 10:48:59 CET
Kernel-5.10.6-1.mga8 + nvidia-current-460.32.03-1.mga8.nonfree

Fixed.

Resolution: (none) => FIXED
Status: NEW => RESOLVED