Bug 20604 - Mageia-6-rc Cauldron update installing new kernel can cause loss of nvidia proprietary driver (e.g. 340)
Summary: Mageia-6-rc Cauldron update installing new kernel can cause loss of nvidia pr...
Status: RESOLVED FIXED
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: Cauldron
Hardware: x86_64 Linux
Priority: Normal normal
Target Milestone: ---
Assignee: Kernel and Drivers maintainers
QA Contact:
URL:
Whiteboard:
Keywords: 6RC
Depends on:
Blocks:
 
Reported: 2017-03-30 18:18 CEST by Maurice Batey
Modified: 2019-02-21 18:37 CET (History)
6 users (show)

See Also:
Source RPM: kernel-4.9.19-1.mga6, dkms-nvidia340-340.102-2.mga6.nonfree
CVE:
Status comment:


Attachments
Log of Cauldron update casuing loss of nvidia proprietary driver (16.04 KB, text/plain)
2017-03-30 18:21 CEST, Maurice Batey
Details
Log of Cauldron update 31/3/2017 (8.42 KB, text/plain)
2017-03-31 12:26 CEST, Maurice Batey
Details
file:///var/lib/dkms/nvidia-current/375.39-6.mga6.nonfree/4.9.14-desktop-1.mga6/x86_64/log/make.log (20.38 KB, text/plain)
2017-04-04 14:22 CEST, Morgan Leijström
Details

Description Maurice Batey 2017-03-30 18:18:56 CEST
Description of problem:


Version-Release number of selected component (if applicable):

   urpmi ?

How reproducible:

Steps to Reproduce:

1. With VirtualBox installed on Plasma Mageia-6-rc (nvidia proprietary driver in use), perform a Cauldron update that brings in a new kernel.
   (See log of such an update attached herewith)

2. Reboot

3. Reboot shows "Switched to nouveau as proprietary nvidia driver not found"

4. Try using MCC/Hardware/Graphics to accept offer of proprietary driver, but on reboot 'nouveau' still in use.
Comment 1 Maurice Batey 2017-03-30 18:21:54 CEST
Created attachment 9166 [details]
Log of Cauldron update casuing loss of nvidia proprietary driver
Maurice Batey 2017-03-30 18:22:48 CEST

Keywords: (none) => 6RC

Marja Van Waes 2017-03-30 20:43:37 CEST

CC: (none) => marja11
Assignee: bugsquad => kernel
Source RPM: urpmi ? => kernel-4.9.19-1.mga6, dkms-nvidia340-340.102-2.mga6.nonfree

Comment 2 Daniel Osmari 2017-03-31 03:26:44 CEST
I had this problem too, but nvidia340-340.102-5.mga6.nonfree.src.rpm fixed it.

CC: (none) => danielosmari

Comment 3 Maurice Batey 2017-03-31 12:26:20 CEST
Created attachment 9169 [details]
Log of Cauldron update 31/3/2017

Now attached log of Cauldron update this morning, showing that the kernel 4.9.19-2 has been installed, and also the nvidia 340.112-5 driver.

Nevertheless after reboot the nouveau driver was still in use, so used MCC/Hardware/Graphics option to accept proprietary driver.
  I then checked the nVidia desktop icon, which confirmed that I was not using the nvidia driver, and that I should execute "nvidia-xconfig" as root, which I did, then rebooted.
  After reboot the proprietary driver IS being used...

But I do hope that after the next Cauldron update it will not be back to 'nouveau' again...
Comment 4 Thomas Backlund 2017-03-31 12:34:12 CEST
(In reply to Maurice Batey from comment #3)
> Created attachment 9169 [details]
> Log of Cauldron update 31/3/2017
> 
> Now attached log of Cauldron update this morning, showing that the kernel
> 4.9.19-2 has been installed, and also the nvidia 340.112-5 driver.
> 
> Nevertheless after reboot the nouveau driver was still in use, so used

Yeah, we dont aut-switch to proprietary driver just because it's there,
it must be manually selected.


> 
> But I do hope that after the next Cauldron update it will not be back to
> 'nouveau' again...

Yeah, I dont expect any more fixes to the proprietary drivers, especially the 340 series or 304 series...

nvidia-current might get updated for more hw support, but thats all

CC: (none) => tmb

Comment 5 Maurice Batey 2017-03-31 12:42:10 CEST
> we dont auto-switch to proprietary driver just because it's there,
> it must be manually selected.

  So users have to jump through hoops to get back to the driver they originally selected?

The driver is around because the user selected it earlier, so sounds odd to dump him back into nouveau...
Comment 6 Morgan Leijström 2017-03-31 13:57:27 CEST
The user may want to switch because Plasma have troubles with the proprietary, and some applications have troubles with noveau and it seem that is never going to change...  https://bugs.kde.org/show_bug.cgi?id=344326

The current state here is that two of my laptops using 340 upgraded OK, but my workstation using nvidia latest switched to noveau when i upgraded kernel this morning...

CC: (none) => fri

Comment 7 Thomas Backlund 2017-03-31 13:59:17 CEST
(In reply to Maurice Batey from comment #5)
> > we dont auto-switch to proprietary driver just because it's there,
> > it must be manually selected.
> 
>   So users have to jump through hoops to get back to the driver they
> originally selected?
> 
> The driver is around because the user selected it earlier, so sounds odd to
> dump him back into nouveau...

This only happend because I screwed up the fixes for classical installer.

And since the system during boot realized there is no working nvidia driver, it switched back to the fallback driver, in order to atleast try and give you a working display.

The part about not autoswitch back to nvidia is that there is no easy was to make sure it was not an intentional change, so we leave the choice to the user.

So this should normally not happend.
Comment 8 Maurice Batey 2017-03-31 18:26:29 CEST
> And since the system during boot realized there is no working nvidia driver, 
> it switched back to the fallback driver, 

   But before the reboot the Cauldron update had installed the nvidia 340 driver (as can be seen in the 2nd attached update log) so there *was* a working driver.

> The part about not autoswitch back to nvidia is that there is no easy was to 
> make sure it was not an intentional change, so we leave the choice to the user.

  How about asking the user which driver he wants?!
(Or provide a system proprietary/nouveau flag

No need for guesswork!
Comment 9 Thomas Backlund 2017-03-31 18:35:01 CEST
(In reply to Maurice Batey from comment #8)
> > And since the system during boot realized there is no working nvidia driver, 
> > it switched back to the fallback driver, 
> 
>    But before the reboot the Cauldron update had installed the nvidia 340
> driver (as can be seen in the 2nd attached update log) so there *was* a
> working driver.
>


You are missing the point....

The updated nvidia driver dkms hooks broke, so the module was not built -> no working nvidia driver. That's why the switch to the nouveau driver.
Comment 10 Maurice Batey 2017-03-31 20:22:44 CEST
Yes, I really do understand that, Thomas, regarding the first of the attached Cauldron update logs.

My later remarks were w.r.t. the 2nd attached Cauldron update log (31/3/17), which shows the 340 driver was installed - later confirmed when the nvidia app told me I was not using the proprietary driver after, using MCC/Hardware/Graphics to accept it.

So I'm happy to be back with the 340 driver, and now understand the hoops that have to be jumped through to get it.

The concern I did express was about the current policy:

  "We don't auto-switch to proprietary driver just because it's there,
it must be manually selected."

P.S.
As a matter of interest, how does one request reverting to nouveau?
Comment 11 Morgan Leijström 2017-03-31 21:44:55 CEST
> As a matter of interest, how does one request reverting to nouveau?

Select Xorg,nouveau using drakx11 or MCC>Hardware
Comment 12 Morgan Leijström 2017-04-01 09:45:58 CEST
like lately, now with kernel 4.9.20-1 after selecting nvidia-latest in MCC, on next reboot it say it do not find it and fall back to nouveau

What should i look for in the logs?
Comment 13 Maurice Batey 2017-04-01 10:24:03 CEST
> like lately, now with kernel 4.9.20-1 after selecting nvidia-latest in MCC, on > next reboot it say it do not find it and fall back to nouveau

> What should i look for in the logs?

I don't understand the question!

The logs I have attached are of urpmi Cauldron updates, not of boot sequences.
Comment 14 Maurice Batey 2017-04-01 10:26:24 CEST
> How about asking the user which driver he wants?!

For example, when the boot sequence reports that the nvidia proprietary driver cannot be found, so using 'nouveau', how about instead asking which driver the user wants?
Comment 15 Morgan Leijström 2017-04-01 18:33:40 CEST
(In reply to Maurice Batey from comment #13)
> > What should i look for in the logs?
> 
> I don't understand the question!

I meant to ask a developer what i should look for (and report) in what log on *my* machine about why it fail to use nvidia-latest :)
Comment 16 Morgan Leijström 2017-04-02 16:03:54 CEST
I still have nvidia if i boot kernel 4.9.14-1.
I when i boot 4.9.19 and 4.9.20 it switches to nouveau during boot.
(And if i then select nvidia using MCC i get bug 20153 on logout attempt)

On kernel 4.9.14 i can select nouveau, an then nvidia, and no problem.

Something is wrong with later kernels, or nvidia need an update for them?
Comment 17 Morgan Leijström 2017-04-02 16:09:49 CEST
I forgot to be more specific.
On my two laptops using 340 there seem to be no problem.
The problem i have is on my desktop using nvidia-current for
Card:NVIDIA GeForce 420 series and later
Description: GM107 [GeForce GTX 750]

BTW i also have nvidia-current-cuda-opencl installed and working (with kernel 4.9.14) , used successfully by BOINC.
Comment 18 Morgan Leijström 2017-04-02 16:42:04 CEST
I wrote
"On kernel 4.9.14 i can select nouveau, an then nvidia, and no problem."
Wrong, on reboot and later it can only use nouveau, and additionally logout fail (bug 20153) after i select nvidia in MCC. (strangely bug 20153 did not show on the logout after selecting first nouveau then nvidia, it had to fail on boot first, then after i selected it in MCC it failed logout)

   To sum it up:

a month or so ago it installed nvidia-current OK (incl -cuda-opencl) on this machine with kernel 4.9.14.

now if i try nvidia-current on any of 4.9.14, 4.9.19, 4.9.20 kernels, on boot it falls back to nouveau (and i also see bug 20153)
Comment 19 Morgan Leijström 2017-04-03 09:58:59 CEST
Now i am totally confused: rebooted today and realise nvidia works on 4.9.14.
Comment 20 Morgan Leijström 2017-04-04 09:52:42 CEST
Still problem with new Nvidia driver today:
First uninstalled nvidia-current and latest kernel (4.9.20-1)
Then installed latest kernel, reboot
then installed nvidia-current-375.39-6.mga6
   (and also the cuda driver and runtime packages)

On logout bug 20153 shows and after next reboot:

# journalctl -ab | grep nvidia
apr 04 09:29:31 svarten service_harddrake[1219]: switch X.org driver from 'nv.+' to 'nouveau' (Den patentskyddade kärn-drivrutinen hittades inte för 'nvidia' X.org drivrutinen)
apr 04 09:29:31 svarten service_harddrake[1219]: removed files/directories /etc/ld.so.conf.d/nvidia.conf
apr 04 09:29:31 svarten service_harddrake[1219]: removed files/directories /etc/ld.so.conf.d/nvidia_legacy.conf


That swedish sentence mean: "(The patent protected kernel driver was not found for the 'nvidia' X.org driver)"



From previous boot (when it got installed):
[root@svarten morgan]# journalctl -ab -1 | grep nvidia
apr 04 09:25:14 svarten [RPM][4172]: install dkms-nvidia-current-375.39-6.mga6.nonfree.x86_64: success
apr 04 09:25:37 svarten [RPM][4172]: install nvidia-cuda-toolkit-8.0.61-1.mga6.nonfree.x86_64: success
apr 04 09:25:42 svarten [RPM][4172]: install x11-driver-video-nvidia-current-375.39-6.mga6.nonfree.x86_64: success
apr 04 09:25:43 svarten [RPM][4172]: install nvidia-current-doc-html-375.39-6.mga6.nonfree.x86_64: success
apr 04 09:25:47 svarten [RPM][4172]: install nvidia-current-cuda-opencl-375.39-6.mga6.nonfree.x86_64: success
apr 04 09:25:47 svarten [RPM][4172]: install x11-driver-video-nvidia-current-375.39-6.mga6.nonfree.x86_64: success
apr 04 09:25:54 svarten [RPM][4172]: install dkms-nvidia-current-375.39-6.mga6.nonfree.x86_64: success
apr 04 09:25:54 svarten [RPM][4172]: install nvidia-cuda-toolkit-8.0.61-1.mga6.nonfree.x86_64: success
apr 04 09:25:54 svarten [RPM][4172]: install x11-driver-video-nvidia-current-375.39-6.mga6.nonfree.x86_64: success
apr 04 09:25:54 svarten [RPM][4172]: install nvidia-current-doc-html-375.39-6.mga6.nonfree.x86_64: success
apr 04 09:25:54 svarten [RPM][4172]: install nvidia-current-cuda-opencl-375.39-6.mga6.nonfree.x86_64: success
                                         x11-driver-video-nvidia-current
apr 04 09:27:08 svarten drakx11[15221]: those kernel module packages can be installed: dkms-nvidia-current dkms-nvidia-current
                                         x11-driver-video-nouveau x11-driver-video-nvidia-current dkms-nvidia-current dkms-nvidia-current
apr 04 09:27:10 svarten drakx11[15221]: removed files/directories /etc/ld.so.conf.d/nvidia.conf
apr 04 09:27:10 svarten drakx11[15221]: removed files/directories /etc/ld.so.conf.d/nvidia_legacy.conf
apr 04 09:27:10 svarten drakx11[15221]: running: update-alternatives --set gl_conf /etc/nvidia-current/ld.so.conf
apr 04 09:27:10 svarten drakx11[15221]: ldconfig will be run because the GL library was switched to /etc/nvidia-current/ld.so.conf
apr 04 09:27:10 svarten drakx11[15221]: workaround buggy fglrx/nvidia driver: make dm restart xserver (#29550, #38297)
apr 04 09:28:30 svarten kernel: nvidia: module license 'NVIDIA' taints kernel.
apr 04 09:28:30 svarten kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 244
                                NVRM: nouveau, rivafb, nvidiafb or rivatv 
apr 04 09:28:30 svarten kernel: nvidia-nvlink: Unregistered the Nvlink Core, major device number 244
apr 04 09:28:32 svarten kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 244
                                NVRM: nouveau, rivafb, nvidiafb or rivatv 
apr 04 09:28:32 svarten kernel: nvidia-nvlink: Unregistered the Nvlink Core, major device number 244


Any idea what more i can check to see what is failing?
Comment 21 Morgan Leijström 2017-04-04 11:07:12 CEST
For kernel 4.9.14 it works to install nvidia
( first uninstalled and rebooted )

( including cuda -- but still the bug and workaround in https://bugs.mageia.org/show_bug.cgi?id=14462#c20 )

Bug 20153 shows after i have installed nvidia and then run XFdrake to seelct proprietary driver, but on next boot 4.9.14 use it sucessfully.
Comment 22 Thomas Backlund 2017-04-04 12:45:51 CEST
was "nokmsboot" added to kernel command line ?

I also want to see dkms make.log
Comment 23 Morgan Leijström 2017-04-04 14:22:02 CEST
Created attachment 9185 [details]
file:///var/lib/dkms/nvidia-current/375.39-6.mga6.nonfree/4.9.14-desktop-1.mga6/x86_64/log/make.log
Comment 24 Morgan Leijström 2017-04-04 14:26:46 CEST
Yes nokmsboot is on all kernels command line

Added the logfile as attachement.

Other possibly relevant info:
/boot is a separate ext4 partition
other partitions incl / are in LVM in LUKS
Comment 25 Morgan Leijström 2017-05-11 18:47:36 CEST
Still no go here on my workstation for nvidia-current 375.66-3, kernel 4.9.27 ; 
During next boot a dialog (this is new) say it need to reboot because of changed driver, and i am back at nouveau.
Comment 26 Morgan Leijström 2017-05-12 09:15:10 CEST
Could this have common problem with bug 20796 ?

__ CHECKING

# systemctl status dkms-autorebuild.service
● dkms-autorebuild.service - run dkms_autoinstaller on every boot to rebuild dkms modules for newly booted kernels
   Loaded: loaded (/usr/lib/systemd/system/dkms-autorebuild.service; disabled; vendor preset: enabled)
   Active: inactive (dead)

# dkms status
virtualbox, 5.1.22-1.mga6, 4.9.14-desktop-1.mga6, x86_64: installed 
nvidia-current, 375.66-3.mga6.nonfree, 4.9.14-desktop-1.mga6, x86_64: installed 
virtualbox, 5.1.22-1.mga6, 4.9.25-desktop-1.mga6, x86_64: installed-binary from 4.9.25-desktop-1.mga6
virtualbox, 5.1.22-1.mga6, 4.9.27-desktop-1.mga6, x86_64: installed-binary from 4.9.27-desktop-1.mga6
virtualbox, 5.1.16-1.mga6, 4.9.14-desktop-1.mga6, x86_64: installed-binary from 4.9.14-desktop-1.mga6

And yes it only builds nvividia for kernel 4.9.14, no make.log for other kernels
# ls /var/lib/dkms/nvidia-current/
375.66-3.mga6.nonfree/  kernel-4.9.14-desktop-1.mga6-x86_64@

___ TRYING

# systemctl enable dkms-autorebuild.service
Created symlink /etc/systemd/system/dkms_autoinstaller.service → /usr/lib/systemd/system/dkms-autorebuild.service.
Created symlink /etc/systemd/system/basic.target.wants/dkms-autorebuild.service → /usr/lib/systemd/system/dkms-autorebuild.service.

and running drakx11, rebooting to try...
Comment 27 Morgan Leijström 2017-05-12 11:08:36 CEST
Did not work, but now i realised i have no kernel-devel later than 4.9.14
 ( i may have forgot to reinstall -latest afte ri eralier here tried uninstalling/reinstalling kernels )

After reinstalling -latest and cleaning some kernels, I again chosed proprietary driver by drakx11 (in GUI) then executed dracut -f, rebooted and during boot it paused in text mode at 
 A start job is running for run dkms...      etc

# dkms status
virtualbox, 5.1.22-1.mga6, 4.9.14-desktop-1.mga6, x86_64: installed 
nvidia-current, 375.66-3.mga6.nonfree, 4.9.14-desktop-1.mga6, x86_64: installed 
nvidia-current, 375.66-3.mga6.nonfree, 4.9.27-desktop-1.mga6, x86_64: installed 
virtualbox, 5.1.22-1.mga6, 4.9.27-desktop-1.mga6, x86_64: installed-binary from 4.9.27-desktop-1.mga6
virtualbox, 5.1.16-1.mga6, 4.9.14-desktop-1.mga6, x86_64: installed-binary from 4.9.14-desktop-1.mga6

And now it works, including CUDA in Boinc :)
What originally broke it i have no idea.
Bottom line: Works for me now, and i will keep my eyes open next kernel update.
Comment 28 Morgan Leijström 2017-05-12 11:21:38 CEST
If nobody else have problems we can close this.
Comment 29 John Choate 2017-06-01 03:09:27 CEST
As of May 31, 2017, I still need to re-install the nvidia module NEARLY EVERY &@$! TIME I BOOT THE COMPUTER. This is regardless of whether the kernel has been updated.
With the final release of 6 looming, I hope this issue gets resolved before the final ISOs are released.

CC: (none) => jdchoate

Comment 30 Morgan Leijström 2017-06-01 08:25:41 CEST
That is frustrating.
John: do you have the possibility to make a fresh install (i.e on spare HD) to see if it works OK then?
The idea is to sort out if there is some specific difficulty with your hardware mix, or something else.
Helge Hielscher 2017-06-02 01:34:08 CEST

CC: (none) => hhielscher

Comment 31 Maurice Batey 2019-02-21 18:37:55 CET
Well, although recent events indicate that the  nouveau v. proprietary problem still occasional raises it's ugly head, we all have seen it often  enough to know how to get on with life, so Closing - but I feel sorry for the nVidia-user newcomers...

Status: NEW => RESOLVED
Resolution: (none) => FIXED


Note You need to log in before you can comment on or make changes to this bug.