Bug 20796 - proprietary nvidia graphics driver cannot be installed during or after an upgrade
Summary: proprietary nvidia graphics driver cannot be installed during or after an upg...
Status: RESOLVED FIXED
Alias: None
Product: Mageia
Classification: Unclassified
Component: Release (media or process) (show other bugs)
Version: Cauldron
Hardware: x86_64 Linux
Priority: release_blocker major
Target Milestone: Mageia 6
Assignee: ISO building group
QA Contact:
URL:
Whiteboard:
Keywords: 6RC
Depends on:
Blocks:
 
Reported: 2017-05-07 13:10 CEST by Len Lawrence
Modified: 2017-09-02 23:19 CEST (History)
7 users (show)

See Also:
Source RPM:
CVE:
Status comment: dkms-autorebuild doesn't get triggered on upgrades (new in mga6)


Attachments
Boot journal from attempt to install nvidia driver after upgrade (42.52 KB, application/octet-stream)
2017-05-07 13:13 CEST, Len Lawrence
Details
Boot journal from mga5 to cauldron upgrade on nvidia system (46.67 KB, application/octet-stream)
2017-05-09 12:55 CEST, Len Lawrence
Details
Log file from /root/drakx after upgrade (242.74 KB, application/octet-stream)
2017-05-09 12:56 CEST, Len Lawrence
Details
Boot journal from Y500 upgrade after first boot. (23.98 KB, application/octet-stream)
2017-05-09 21:15 CEST, Len Lawrence
Details
Log file from /root/drakx after upgrade (228.19 KB, application/octet-stream)
2017-05-09 21:19 CEST, Len Lawrence
Details

Description Len Lawrence 2017-05-07 13:10:37 CEST
Description of problem:
Upgrading any or mixed desktops from mga5.1 to 6RC results in a system which cannot use the proprietary nvidia driver.  Various tactics always fail and the system defaults to the nouveau fb driver.
1) Upgrade with proprietary driver specified during the install -> failure
There is a long wait after FUSE control message then "switching to nouveau driver - nvidia driver cannot be found" (not verbatim).
2) Run drakx11 from the desktop to install nvidia driver -> failure
Loads nouveau - no sign of dkms-autoinstaller working.
3) Upgrade the upgrade and install nvidia driver at the summary stage -> failure
4) Full installation on same partition - specify proprietary driver -> success
dkms-autoinstaller runs during the boot sequence.
All tests performed with an the iso written to a USB pendrive by isodumper.
System was UEFI with GTX770.

I do not know what sort of diagnostics to provide but have obtained a boot journal for one of the tests.

Version-Release number of selected component (if applicable):
Mageia-6-rc-x86_64-DVD

How reproducible:
Consistent between different hardware and desktop environments.

Steps to Reproduce:
1. Upgrade an nvidia desktop installation from mga5.1 to 6RC
2. Specify proprietary graphics driver
3. Watch boot messages for references to nouveau or lack of dkms-autoinstaller activity.
4. At the desktop use lsmod to check which graphics driver is in use.
Comment 1 Len Lawrence 2017-05-07 13:13:52 CEST
Created attachment 9275 [details]
Boot journal from attempt to install nvidia driver after upgrade

CC: (none) => tarazed25

Comment 2 Len Lawrence 2017-05-08 15:46:44 CEST
Looks like this bug is still live with the latest 64-bit classic iso, 2017-05-07.
Len Lawrence 2017-05-08 19:45:45 CEST

Priority: Normal => release_blocker

Comment 3 Len Lawrence 2017-05-08 21:53:48 CEST
Both UEFI and CSM systems are affected - nouveau driver is forced to be used.
Comment 4 Martin Whitaker 2017-05-09 09:49:37 CEST
The only thing I can see in the log is

  May 07 10:49:48 localhost urpmi[2393]: called with: dkms-nvidia

This looks wrong - there is no dkms-nvidia package. For your H/W I think it should be dkms-nvidia-current.

Do you still have your various trial installations? The journal for the first boot after the upgrade and the ddebug.log from trial 1 (upgrade with proprietary driver) might be more revealing.

If you have time, another test would be to do a clean install with the free driver, then run drakx11 to switch to the proprietary driver. Does that also fail? Does drakx11 output any helpful messages if you run it from the command line?

CC: (none) => mageia

Comment 5 Len Lawrence 2017-05-09 10:18:45 CEST
Have just spent a five days rationalizing system partitions, creating another 30 or so, which include several upgrade candidates.  I have probably already run the test you suggest but shall do it again.  All results so far have been negative for nvidia and upgrades.  No problem with straight installs.  

Shall keep and label the boot journals.  Don't remember where ddebug.log lives but shall look for it.  Had not noticed any helpful messages when running drakx11 from the command line - shall pay more attention.
Comment 6 Len Lawrence 2017-05-09 10:20:11 CEST
Found ddebug.log in /root/drakx.
Comment 7 Len Lawrence 2017-05-09 12:45:10 CEST
Ran an upgrade test on an Aorus X5 laptop with twin nvidia cards.  UEFI boot.
Base system was Mate+Xfce with nvidia kernel module.
The upgrade went smoothly but reboot took a while then a message saying system has to be rebooted due to a display driver change.  Rebooted to Mate, running under the nouveau driver.  Saved boot journal and ddebug.log.  Installed nvidia using drakx11 at the command line.  No messages of note:
"Too late to run INIT something or other".
Reboot ended up at good luck message.  Ran drakx11 from a console and rebooted.
At the desktop found that the display driver had switched back to nouveau.
Comment 8 Len Lawrence 2017-05-09 12:55:17 CEST
Created attachment 9285 [details]
Boot journal from mga5 to cauldron upgrade on nvidia system
Comment 9 Len Lawrence 2017-05-09 12:56:57 CEST
Created attachment 9286 [details]
Log file from /root/drakx after upgrade
Comment 10 Len Lawrence 2017-05-09 17:29:42 CEST
OK, speaking from a standpoint of extreme ignorance of packaging and installation scripts, could this problem be related to the fact that the first reboot after an upgrade should be treated as a First boot from the point of view of nvidia graphics so that dkms-auto-installer or whatever is triggered to rebuild the nvidia kmod?  Perusing the log files and after watching the boot messages roll by I could see no sign of any such activity.

If this idea has any merit then the problem would be a new one since the era of pre-built proprietary modules.

However please ignore this message if it is a load of old cobblers (tag?).
Comment 11 Len Lawrence 2017-05-09 21:13:18 CEST
Another upgrade, single DE on Lenovo Y500 running nvidia graphics.
The installation completed without fuss but on reboot the boot process paused to say the a reboot was needed because of display driver change; cancel in 30 seconds to abort.  This is the stage where nvidia is abandoned in favour of nouveau so to try to retain journal information I cancelled and let the boot complete.  Mate desktop with nouveau graphics.  Captured boot journal and ddebug.log.  Hopefully this journal contains more information than the previous one.
Comment 12 Len Lawrence 2017-05-09 21:15:27 CEST
Created attachment 9290 [details]
Boot journal from Y500 upgrade after first boot.
Comment 13 Len Lawrence 2017-05-09 21:19:05 CEST
Created attachment 9291 [details]
Log file from /root/drakx after upgrade
Comment 14 Marja Van Waes 2017-05-09 22:51:41 CEST
After a failed attempt to read a log and understand what's going on, now blindly assigning to the isobuilders group. Please reassign if that is wrong.

Assignee: bugsquad => isobuild
CC: (none) => marja11

Marja Van Waes 2017-05-09 22:52:13 CEST

Keywords: (none) => 6RC
Summary: 6RC proprietary nvidia graphics driver cannot be installed during or after an upgrade => proprietary nvidia graphics driver cannot be installed during or after an upgrade

Comment 15 Martin Whitaker 2017-05-09 23:06:41 CEST
@Marja, :-)

@Len, yes that second log is more informative. There should by some trace of the dkms-autorebuild service attempting to build and install the proprietary driver. Please run these two commands (as root) on the upgraded system and post the output:

  systemctl status dkms-autorebuild.service

  dkms status

I'll try to reproduce this on my nvidia system.
Comment 16 Len Lawrence 2017-05-09 23:47:06 CEST
$ sudo systemctl status dkms-autorebuild.service
● dkms-autorebuild.service - run dkms_autoinstaller on every boot to rebuild dkms modules for newly booted kernels
   Loaded: loaded (/usr/lib/systemd/system/dkms-autorebuild.service; disabled; vendor preset: enabled)
   Active: inactive (dead)

Looks like it needs to be enabled.

$ sudo dkms status
nvidia-current, 375.66-1.mga6.nonfree, 4.9.27-desktop-1.mga6, x86_64: installed
Comment 17 Len Lawrence 2017-05-09 23:54:17 CEST
$ systemctl status dkms-autorebuild.service
● dkms-autorebuild.service - run dkms_autoinstaller on every boot to rebuild dkm
   Loaded: loaded (/usr/lib/systemd/system/dkms-autorebuild.service; enabled; ve
   Active: active (exited) since Tue 2017-05-09 22:48:56 BST; 4min 49s ago
  Process: 812 ExecStart=/usr/sbin/dkms-autorebuild.sh (code=exited, status=0/SU
 Main PID: 812 (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/dkms-autorebuild.service
Comment 18 Len Lawrence 2017-05-10 00:00:31 CEST
Ran drakx11 to install the proprietary driver, rebooted and up it came.
# lsmod | grep nvidia
nvidia_modeset        790528  2
nvidia              12304384  48 nvidia_modeset
Comment 19 Charles Edwards 2017-05-10 01:25:53 CEST
I just did a 5 to 6 upgrade and can confirm dkms-autorebuild Is Not being run when the upgrade completes and Mga6 is booted.

I was able to fixed mine using a different method and in my case I was using the nvidia340 driver.

CC: (none) => cae

Comment 20 Charles Edwards 2017-05-10 23:04:53 CEST
Have now found that this same bug affects current cauldron systems.

The dkms-nvidia module is not being auto-built for any kernel during boot.
Comment 21 Rémi Verschelde 2017-05-12 08:37:47 CEST
For the reference, it's likely a regression from http://svnweb.mageia.org/packages/cauldron/dkms/current/SOURCES/dkms-no-autoload-during-install.patch?view=markup&pathrev=1099399 which aims at fixing bug 20368.
Rémi Verschelde 2017-05-12 08:38:08 CEST

Status comment: (none) => Recent regression, dkms-autorebuild doesn't get triggered

Comment 22 Martin Whitaker 2017-05-12 09:45:23 CEST
(In reply to Rémi Verschelde from comment #21)
> For the reference, it's likely a regression from
> http://svnweb.mageia.org/packages/cauldron/dkms/current/SOURCES/dkms-no-
> autoload-during-install.patch?view=markup&pathrev=1099399 which aims at
> fixing bug 20368.

Unlikely. Copying from dev@ml:

The root cause of bug 20796 is that the dkms-autorebuild service is not enabled on a mga5->mga6 upgrade. In the dkms spec file we have

  %post
  %_post_service %{name}-autorebuild

but from reading the Wiki, it appears that %_post_service will only enable a service on an install, not on an upgrade.

The dkms-autorebuild service didn't exist in mga5, so isn't already enabled.

The change to dkms can't cause a regression if dkms never gets run ;-)
Comment 23 Martin Whitaker 2017-05-12 09:48:29 CEST
@Charles, was the system you saw this problem on originally an upgrade from mga5? I'm wondering if the dkms-autorebuild service was ever enabled. According to tmb, a dkms rebuild should be automatically triggered when a new kernel is installed, so the dkms-autorebuild service should not normally be needed.
Rémi Verschelde 2017-05-12 10:25:33 CEST

Target Milestone: --- => Mageia 6
Status comment: Recent regression, dkms-autorebuild doesn't get triggered => dkms-autorebuild doesn't get triggered on upgrades (new in mga6)

Comment 24 Morgan Leijström 2017-05-12 17:20:38 CEST
Thanks for the hint on dkms-autorebuild.service !

On my year old rolling update cauldron workstation, i noticed a while ago that nvidia started to fail for kernels > 4.9.14.  Not until today when i read this bug i checked and realised that dkms-autorebuild.service was not active.

I enabled that and it works again.

(I also again installed kernel-devel (+ -latest) that i think i uninstalled myself when cleaning and reinstalling kernels earlier and then forgot to reinsyall, and ran dracut -f    - you may also want to check if you have kernel-devel installed. )

https://bugs.mageia.org/show_bug.cgi?id=20604#c26

CC: (none) => fri

Comment 25 Martin Whitaker 2017-05-13 18:18:57 CEST
A fix for enabling the dkms-autorebuild service on upgrade has been pushed, so should be on the next ISO build. I've tested it myself by patching the last ISO, and it fixed the problem on my nvidia test system.

I haven't forced the dkms-autorebuild service to be re-enabled on every update, as some users may have chosen to disable it (indeed, that was the reason it was split out of the mageia-everytime service). So current cauldron users who upgraded from a mga5 system will need to enable it manually.

I'm going to close this as fixed, but please reopen if there is still a problem with the next ISOs.

Status: NEW => RESOLVED
Resolution: (none) => FIXED

Comment 26 Chris Denice 2017-09-01 11:49:01 CEST
Hi guys,
am I correct to say that this bug is not fixed for all mga6 freezes coming from Cauldron? dkms.autobuild will stay disabled for those.

I am wondering if this fix from mga5->mga6 online upgrade too?

thanks

CC: (none) => eatdirt

Comment 27 Martin Whitaker 2017-09-02 10:19:19 CEST
(In reply to Chris Denice from comment #26)
> am I correct to say that this bug is not fixed for all mga6 freezes coming
> from Cauldron? dkms.autobuild will stay disabled for those.

The bug occurred if you updated from a version of the dkms package which didn't include the dkms-autorebuild service to one that did. Looking at the change log, the dkms-autorebuild service was added in November 2015. So if your cauldron install was done before then, I would expect you to see this bug. As noted in comment 25, the fix is not retroactive.

> I am wondering if this fix from mga5->mga6 online upgrade too?

It should do. The fix is in the dkms package, not in the installer.
Comment 28 Chris Denice 2017-09-02 23:19:38 CEST
Ok, thanks! Yes my cauldron is even older than 2015 :) I guess we're good, Cauldron users should indeed be able to enable a service ;)

Thanks.

Note You need to log in before you can comment on or make changes to this bug.