Bug 25890 - dkms failed to build nvidia module 430.64 when upgrading to kernel 5.4.2
Summary: dkms failed to build nvidia module 430.64 when upgrading to kernel 5.4.2
Status: RESOLVED FIXED
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: 7
Hardware: x86_64 Linux
Priority: Normal normal
Target Milestone: ---
Assignee: QA Team
QA Contact:
URL:
Whiteboard: MGA7-64-OK
Keywords: advisory, validated_update
Depends on:
Blocks:
 
Reported: 2019-12-17 15:55 CET by laurent murphy91
Modified: 2019-12-31 17:52 CET (History)
2 users (show)

See Also:
Source RPM:
CVE:
Status comment:


Attachments
Hack to make it build (978 bytes, patch)
2019-12-28 22:54 CET, Thomas Backlund
Details | Diff

Description laurent murphy91 2019-12-17 15:55:09 CET
Description of problem:

When upgrading to the latest kernel 5.4.2, nvidia module fail to compile so I had to switch back to kern el 5.3.13

Version-Release number of selected component (if applicable):

nvidia-430.64
kernel-desktop-5.4.2

How reproducible:

dkms build -m nvidia-curent -v 430.64-1.mga7.nonfree -k 5.4.2-desktop-1.mga7 -a x86_64 --no-clean-kernel

Steps to Reproduce:
in /var/lib/dkms/nvidia-current/430.64-1.mga7.nonfree/build/make.log

DKMS make.log for nvidia-current-430.64-1.mga7.nonfree for kernel 5.4.2-desktop-1.mga7 (x86_64)
Tue 17 Dec 2019 03:25:05 PM CET
make[1]: Entering directory '/usr/src/kernel-5.4.2-desktop-1.mga7'
  SYMLINK /var/lib/dkms/nvidia-current/430.64-1.mga7.nonfree/build/nvidia/nv-kernel.o
  SYMLINK /var/lib/dkms/nvidia-current/430.64-1.mga7.nonfree/build/nvidia-modeset/nv-modeset-kernel.o
  CC [M]  /var/lib/dkms/nvidia-current/430.64-1.mga7.nonfree/build/nvidia/nv-frontend.o
  CC [M]  /var/lib/dkms/nvidia-current/430.64-1.mga7.nonfree/build/nvidia/nv-instance.o
  CC [M]  /var/lib/dkms/nvidia-current/430.64-1.mga7.nonfree/build/nvidia/nv.o
  CC [M]  /var/lib/dkms/nvidia-current/430.64-1.mga7.nonfree/build/nvidia/nv-acpi.o
  CC [M]  /var/lib/dkms/nvidia-current/430.64-1.mga7.nonfree/build/nvidia/nv-chrdev.o
  CC [M]  /var/lib/dkms/nvidia-current/430.64-1.mga7.nonfree/build/nvidia/nv-cray.o
  CC [M]  /var/lib/dkms/nvidia-current/430.64-1.mga7.nonfree/build/nvidia/nv-dma.o
  CC [M]  /var/lib/dkms/nvidia-current/430.64-1.mga7.nonfree/build/nvidia/nv-gvi.o
  CC [M]  /var/lib/dkms/nvidia-current/430.64-1.mga7.nonfree/build/nvidia/nv-i2c.o
  CC [M]  /var/lib/dkms/nvidia-current/430.64-1.mga7.nonfree/build/nvidia/nv-mempool.o
  CC [M]  /var/lib/dkms/nvidia-current/430.64-1.mga7.nonfree/build/nvidia/nv-mmap.o
  CC [M]  /var/lib/dkms/nvidia-current/430.64-1.mga7.nonfree/build/nvidia/nv-p2p.o
  CC [M]  /var/lib/dkms/nvidia-current/430.64-1.mga7.nonfree/build/nvidia/nv-pat.o
  CC [M]  /var/lib/dkms/nvidia-current/430.64-1.mga7.nonfree/build/nvidia/nv-procfs.o
  CC [M]  /var/lib/dkms/nvidia-current/430.64-1.mga7.nonfree/build/nvidia/nv-usermap.o
/var/lib/dkms/nvidia-current/430.64-1.mga7.nonfree/build/nvidia/nv-mmap.c: In function ‘nv_encode_caching’:
/var/lib/dkms/nvidia-current/430.64-1.mga7.nonfree/build/nvidia/nv-mmap.c:338:16: warning: this statement may fall through [-Wimplicit-fallthrough=]
             if (NV_ALLOW_CACHING(memory_type))
                ^
/var/lib/dkms/nvidia-current/430.64-1.mga7.nonfree/build/nvidia/nv-mmap.c:340:9: note: here
         default:
         ^~~~~~~
  CC [M]  /var/lib/dkms/nvidia-current/430.64-1.mga7.nonfree/build/nvidia/nv-vm.o
  CC [M]  /var/lib/dkms/nvidia-current/430.64-1.mga7.nonfree/build/nvidia/nv-vtophys.o
  CC [M]  /var/lib/dkms/nvidia-current/430.64-1.mga7.nonfree/build/nvidia/os-interface.o
  CC [M]  /var/lib/dkms/nvidia-current/430.64-1.mga7.nonfree/build/nvidia/os-mlock.o
  CC [M]  /var/lib/dkms/nvidia-current/430.64-1.mga7.nonfree/build/nvidia/os-pci.o
  CC [M]  /var/lib/dkms/nvidia-current/430.64-1.mga7.nonfree/build/nvidia/os-registry.o
  CC [M]  /var/lib/dkms/nvidia-current/430.64-1.mga7.nonfree/build/nvidia/os-usermap.o
  CC [M]  /var/lib/dkms/nvidia-current/430.64-1.mga7.nonfree/build/nvidia/nv-modeset-interface.o
/var/lib/dkms/nvidia-current/430.64-1.mga7.nonfree/build/nvidia/nv-vm.c: In function ‘nv_set_memory_array_type’:
/var/lib/dkms/nvidia-current/430.64-1.mga7.nonfree/build/nvidia/nv-vm.c:66:13: error: implicit declaration of function ‘set_memory_array_uc’; did you mean ‘set_pages_array_uc’? [-Werror=impli
cit-function-declaration]
             set_memory_array_uc(pages, num_pages);
             ^~~~~~~~~~~~~~~~~~~
             set_pages_array_uc
/var/lib/dkms/nvidia-current/430.64-1.mga7.nonfree/build/nvidia/nv-vm.c:69:13: error: implicit declaration of function ‘set_memory_array_wb’; did you mean ‘set_pages_array_wb’? [-Werror=impli
cit-function-declaration]
             set_memory_array_wb(pages, num_pages);
             ^~~~~~~~~~~~~~~~~~~
             set_pages_array_wb
  CC [M]  /var/lib/dkms/nvidia-current/430.64-1.mga7.nonfree/build/nvidia/nv-pci-table.o
cc1: some warnings being treated as errors
  CC [M]  /var/lib/dkms/nvidia-current/430.64-1.mga7.nonfree/build/nvidia/nv-kthread-q.o
make[2]: *** [scripts/Makefile.build:266: /var/lib/dkms/nvidia-current/430.64-1.mga7.nonfree/build/nvidia/nv-vm.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [Makefile:1644: /var/lib/dkms/nvidia-current/430.64-1.mga7.nonfree/build] Error 2
make[1]: Leaving directory '/usr/src/kernel-5.4.2-desktop-1.mga7'
make: *** [Makefile:81: modules] Error 2



If you could provide nvidia 440.44, i would be great as only nvidia 440 provides cuda 10.2 and GTX 1650 Super support.
Thanks
Comment 1 Lewis Smith 2019-12-17 19:49:02 CET
Thank you for the report.

Another one re update etc & nVidia, but I do not see a duplicate.
Assigning to kernel/drivers team.

Assignee: bugsquad => kernel

Comment 2 Thomas Backlund 2019-12-17 20:11:00 CET
I cant reproduce, neither on desktop or server kernel...
My workstation relies on the nvidia-current package so I pretty much ensure it builds / works (for me :) ) 

Anywway, I need the full /var/lib/dkms/nvidia-current/430.64-1.mga7.nonfree/build/make.log

CC: (none) => tmb

Comment 3 laurent murphy91 2019-12-17 20:41:00 CET
(In reply to Thomas Backlund from comment #2)
> I cant reproduce, neither on desktop or server kernel...
> My workstation relies on the nvidia-current package so I pretty much ensure
> it builds / works (for me :) ) 
> 
> Anywway, I need the full
> /var/lib/dkms/nvidia-current/430.64-1.mga7.nonfree/build/make.log

I've posted the full make.log

current installed rpm packages :
- dkms-nvidia-current 430.64
- nvidia-current-cuda-opencl 430.64
- nvidia-current-devel 430.64
- nvidia-current-utils 430.64
- nvidia-current-kernel-4.1.5-desktop-2.mga5 352.79
- x11-driver-video-nvidia-current 430.64
...
- kernel-desktop-latest 5.4.2
- kernel-desktop-devel-latest 5.4.2

ll /var/lib/dkms/nvidia-current/430.64-1.mga7.nonfree
total 12
drwxr-xr-x 3 root root 4096 nov.  21 22:04 5.3.11-desktop-1.mga7/
drwxr-xr-x 3 root root 4096 déc.   3 13:06 5.3.13-desktop-2.mga7/
drwxr-xr-x 8 root root 4096 déc.  17 15:25 build/
lrwxrwxrwx 1 root root   45 nov.  21 22:04 source -> /usr/src/nvidia-current-430.64-1.mga7.nonfree/

There is no 5.4.2-desktop-2.mga7 directory ???
Comment 4 laurent murphy91 2019-12-17 21:33:28 CET
If it can help...
switch back to kernel 5.3.13
cd /var/lib/dkms/nvidia-current/430.64-1.mga7.nonfree/build
make -> no problem

Edit Makefile to change :
KERNEL_UNAME ?= $(shell uname -r)
to
KERNEL_UNAME := 5.4.2-desktop-1.mga7
make clean
make -> ok

I have also installed package kernel-source-latest....
Comment 5 laurent murphy91 2019-12-17 22:04:29 CET
(In reply to laurent murphy91 from comment #4)
> If it can help...
> switch back to kernel 5.3.13
> cd /var/lib/dkms/nvidia-current/430.64-1.mga7.nonfree/build
> make -> no problem
> 
> Edit Makefile to change :
> KERNEL_UNAME ?= $(shell uname -r)
> to
> KERNEL_UNAME := 5.4.2-desktop-1.mga7
> make clean
> make -> ok
> 
> I have also installed package kernel-source-latest....

Reboot to 5.4.2 and no display....

more tests in 5.4.2 :
cd /var/lib/dkms/nvidia-current/430.64-1.mga7.nonfree/build
make clean
make -> ok

dkms build -m nvidia-current -v 430.64-1.mga7.nonfree -k 5.4.2-desktop-1.mga7 -a x86_64 --no-clean-kerenl -> failed !

So compilation works if i run manually the make but fail if launch
from dkms.
Each time I reboot to 5.4.2, I have a splash screen with 'Building nvidia-current', compilation failed and no display.

What is the difference between the make and dkms ?
If I can bypass dkms on reboot, I'm quite sure it will work.

I need your help ! Thanks !
Comment 6 Thomas Backlund 2019-12-17 23:10:15 CET

(In reply to laurent murphy91 from comment #5)
> (In reply to laurent murphy91 from comment #4)
> > If it can help...
> > switch back to kernel 5.3.13
> > cd /var/lib/dkms/nvidia-current/430.64-1.mga7.nonfree/build
> > make -> no problem
> > 
> > Edit Makefile to change :
> > KERNEL_UNAME ?= $(shell uname -r)
> > to
> > KERNEL_UNAME := 5.4.2-desktop-1.mga7
> > make clean
> > make -> ok
> > 
> > I have also installed package kernel-source-latest....


No need to install the kernel-source, it wont help (and is not used for module builds)

> 
> Reboot to 5.4.2 and no display....
> 

Thats expected as dkms also needs to install built drivers and so on...

> more tests in 5.4.2 :
> cd /var/lib/dkms/nvidia-current/430.64-1.mga7.nonfree/build
> make clean
> make -> ok
> 
> dkms build -m nvidia-current -v 430.64-1.mga7.nonfree -k
> 5.4.2-desktop-1.mga7 -a x86_64 --no-clean-kerenl -> failed !
> 
> So compilation works if i run manually the make but fail if launch
> from dkms.
> Each time I reboot to 5.4.2, I have a splash screen with 'Building
> nvidia-current', compilation failed and no display.

So we need to find out why nvidia conftest thinks it found support for set_memory_array_uc/wb

What is the output of ls -l /lib/modules/5.4.2-desktop-1.mga7/

Whats the output of "dkms status" ?

Do you have anything special added on kernel command line ?

Have you patched / rebuilt kernel in any way ?
Are you using any bumblebee or primus packages ?


Also, please revert the Makefile change, then try:

/usr/sbin/dkms_autoinstaller start 5.4.2-desktop-1.mga7
Comment 7 laurent murphy91 2019-12-17 23:22:17 CET
(In reply to Thomas Backlund from comment #6)
> 


> 
> So we need to find out why nvidia conftest thinks it found support for
> set_memory_array_uc/wb

it failed on nv_vm.c ..

> 
> What is the output of ls -l /lib/modules/5.4.2-desktop-1.mga7/

ls -l /lib/modules/5.4.2-desktop-1.mga7/
total 5540
lrwxrwxrwx  1 root root      36 déc.  17 15:03 build -> /usr/src/kernel-5.4.2-desktop-1.mga7
drwxr-xr-x 14 root root    4096 déc.  17 15:03 kernel
-rw-r--r--  1 root root 1289748 déc.   6 00:30 modules.alias
-rw-r--r--  1 root root 1244478 déc.   6 00:30 modules.alias.bin
-rw-r--r--  1 root root    6333 déc.   5 21:37 modules.builtin
-rw-r--r--  1 root root    8454 déc.   6 00:30 modules.builtin.bin
-rw-r--r--  1 root root   46498 déc.   5 21:37 modules.builtin.modinfo
-rw-r--r--  1 root root  605059 déc.   6 00:30 modules.dep
-rw-r--r--  1 root root  820354 déc.   6 00:30 modules.dep.bin
-rw-r--r--  1 root root  189690 déc.   6 00:30 modules.description
-rw-r--r--  1 root root     448 déc.   6 00:30 modules.devname
-rw-r--r--  1 root root  189968 déc.   5 21:37 modules.order
-rw-r--r--  1 root root     699 déc.   6 00:30 modules.softdep
-rw-r--r--  1 root root  553421 déc.   6 00:30 modules.symbols
-rw-r--r--  1 root root  683349 déc.   6 00:30 modules.symbols.bin
lrwxrwxrwx  1 root root      36 déc.  17 15:03 source -> /usr/src/kernel-5.4.2-desktop-1.mga7


> 
> Whats the output of "dkms status" ?


# dkms status
nvidia-current, 430.64-1.mga7.nonfree, 5.3.13-desktop-2.mga7, x86_64: installed 
nvidia-current, 430.64-1.mga7.nonfree, 5.3.11-desktop-1.mga7, x86_64: installed 

but it hangs after....

> 
> Do you have anything special added on kernel command line ?

no
> 
> Have you patched / rebuilt kernel in any way ?
> Are you using any bumblebee or primus packages ?

no. I have installed virtualbox
> 
> 
> Also, please revert the Makefile change, then try:
> 
> /usr/sbin/dkms_autoinstaller start 5.4.2-desktop-1.mga7

[root@localhost build]# /usr/sbin/dkms_autoinstaller start 5.4.2-desktop-1.mga7

nvidia-current (430.64-1.mga7.nonfree): Installing module.
..........(bad exit status: 10)
  Build failed.  Installation skipped.
Comment 8 Thomas Backlund 2019-12-17 23:36:28 CET
(In reply to laurent murphy91 from comment #7)
> 
> # dkms status
> nvidia-current, 430.64-1.mga7.nonfree, 5.3.13-desktop-2.mga7, x86_64:
> installed 
> nvidia-current, 430.64-1.mga7.nonfree, 5.3.11-desktop-1.mga7, x86_64:
> installed 
> 
> but it hangs after....
> 

Ok, this seems bad... does it really hang or does it eventually finish ? (wait up to 30 minutes or so)

Also, do you have a lot of kernels, prebuilt kmods  and dkms packages installed ?

rpm -qa |grep kernel |sort

rpm -qa |grep dkms

Also you can try to re-install affected packages in case something has corrupted them, like:

urpmi --replacepkgs dkms kernel-desktop-devel-5.4.2-1.mga7 dkms-nvidia-current
Comment 9 laurent murphy91 2019-12-18 17:49:08 CET
(In reply to Thomas Backlund from comment #8)

> Ok, this seems bad... does it really hang or does it eventually finish ?
> (wait up to 30 minutes or so)

I had 176 kernel (back from mga5, mga6)...once cleanup done, dkms status show :
[root@localhost domisse]# dkms status
nvidia-current, 430.64-1.mga7.nonfree, 5.3.13-desktop-2.mga7, x86_64: installed 
nvidia-current, 430.64-1.mga7.nonfree, 5.3.11-desktop-1.mga7, x86_64: installed 
virtualbox, 6.0.12-1.mga7, 5.3.2-desktop-1.mga7, x86_64: installed-binary from 5.3.2-desktop-1.mga7
virtualbox, 6.0.12-1.mga7, 5.2.16-desktop-2.mga7, x86_64: installed-binary from 5.2.16-desktop-2.mga7
virtualbox, 6.0.14-1.mga7, 5.3.6-desktop-2.mga7, x86_64: installed-binary from 5.3.6-desktop-2.mga7
virtualbox, 6.0.14-1.mga7, 5.3.7-desktop-4.mga7, x86_64: installed-binary from 5.3.7-desktop-4.mga7


> 
> rpm -qa |grep kernel |sort
> 

I have removed all kernel v4

kernel-desktop-5.2.16-2.mga7-1-1.mga7
kernel-desktop-5.3.11-1.mga7-1-1.mga7
kernel-desktop-5.3.13-2.mga7-1-1.mga7
kernel-desktop-5.3.2-1.mga7-1-1.mga7
kernel-desktop-5.3.6-2.mga7-1-1.mga7
kernel-desktop-5.3.7-4.mga7-1-1.mga7
kernel-desktop-5.4.2-1.mga7-1-1.mga7
kernel-desktop-devel-5.2.16-2.mga7-1-1.mga7
kernel-desktop-devel-5.3.11-1.mga7-1-1.mga7
kernel-desktop-devel-5.3.13-2.mga7-1-1.mga7
kernel-desktop-devel-5.3.2-1.mga7-1-1.mga7
kernel-desktop-devel-5.3.6-2.mga7-1-1.mga7
kernel-desktop-devel-5.3.7-4.mga7-1-1.mga7
kernel-desktop-devel-5.4.2-1.mga7-1-1.mga7
kernel-desktop-devel-latest-5.4.2-1.mga7
kernel-desktop-latest-5.4.2-1.mga7
kernel-firmware-20190603-1.mga7
kernel-firmware-nonfree-20190926-1.mga7.nonfree
kernel-source-5.4.2-1.mga7-1-1.mga7
kernel-source-latest-5.4.2-1.mga7
kernel-userspace-headers-5.4.2-1.mga7
virtualbox-kernel-5.2.16-desktop-2.mga7-6.0.12-1.mga7
virtualbox-kernel-5.3.2-desktop-1.mga7-6.0.12-2.mga7
virtualbox-kernel-5.3.6-desktop-2.mga7-6.0.14-1.mga7
virtualbox-kernel-5.3.7-desktop-4.mga7-6.0.14-4.mga7



> rpm -qa |grep dkms
> 

dkms-nvidia-current-430.64-1.mga7.nonfree
dkms-2.0.19-40.mga7
dkms-minimal-2.0.19-40.mga7


> Also you can try to re-install affected packages in case something has
> corrupted them, like:
> 
> urpmi --replacepkgs dkms kernel-desktop-devel-5.4.2-1.mga7
> dkms-nvidia-current

still the same issue...when booting in 5.4.2, dkms try to build nvidia modules and fail....as this is a compilation error, I'm quite sure it could be fix !

Any hint ?
Comment 10 laurent murphy91 2019-12-28 19:29:51 CET
I've tried the latest kernel 5.4.6-desktop-2, same problem.
dkms failed to build nvidia-current module with the same make.log (see contents in the previous posts).
If I try to compile using 'make', nvidia-current is successfully build but
the module is removed at reboot when dkms try to compile the module.
So I'm stuck with kernel 5.3.13.

Maybe an upgrade to nvidia 440.44 could fix this issue ?
Comment 11 Thomas Backlund 2019-12-28 22:54:14 CET
Created attachment 11434 [details]
Hack to make it build


Can you try tp apply the attached hack

It simply removed the conftest code that detects a function that the newer kernels dont actually support...
Comment 12 laurent murphy91 2019-12-29 22:13:09 CET
(In reply to Thomas Backlund from comment #11)
> Created attachment 11434 [details]
> Hack to make it build
> 
> 
> Can you try tp apply the attached hack
> 
> It simply removed the conftest code that detects a function that the newer
> kernels dont actually support...

dkms failure. Same make.log report.
If I try to compile with 'make', it failed also, I need to revert back to the original conftest.sh file.
Comment 13 Thomas Backlund 2019-12-29 23:22:21 CET
Yeah, sorry, I forgot a few changes in that hack...

Never mind that, I just pushed a:

nvidia-current-430.64-2.mga7.nonfree

to mga7 nonfree updates_testing with a better way to fix the mis-detection, please try it out...
Comment 14 laurent murphy91 2019-12-30 15:44:49 CET
(In reply to Thomas Backlund from comment #13)
> Yeah, sorry, I forgot a few changes in that hack...
> 
> Never mind that, I just pushed a:
> 
> nvidia-current-430.64-2.mga7.nonfree
> 
> to mga7 nonfree updates_testing with a better way to fix the mis-detection,
> please try it out...


Works great :)  I just had to reboot twice. Kernel 5.4.6 running :)
Next step would be to build nvidia driver 440.44 ? Only 440.X provides
cuda 10.2 and GTX 1650 Super support.
Thanks for your help :)

PS: Found another bug when trying to select nonfree updates_testing. I first used 'Mageia Control Center'->'Software management'->'Configure media sources'.
I was able to tick 'Enabled' to select 'Nonfree Updates Testing' but no way to tick 'Updates'.
Using 'drakrpm-edit-media --expert', the 'Updates' field was tickable.
Comment 15 Thomas Backlund 2019-12-31 17:29:06 CET
Great.

As it still works here too, we'll release it as an official update

Assignee: kernel => qa-bugs
Whiteboard: (none) => MGA7-64-OK
Keywords: (none) => advisory, validated_update
CC: (none) => sysadmin-bugs

Comment 16 Mageia Robot 2019-12-31 17:52:31 CET
An update for this issue has been pushed to the Mageia Updates repository.

https://advisories.mageia.org/MGAA-2019-0249.html

Resolution: (none) => FIXED
Status: NEW => RESOLVED


Note You need to log in before you can comment on or make changes to this bug.