Bug 21263 - Endless reboot loop after switching from nouveau to nvidia proprietary driver
Summary: Endless reboot loop after switching from nouveau to nvidia proprietary driver
Status: REOPENED
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: 6
Hardware: All Linux
Priority: High major
Target Milestone: Mageia 6
Assignee: Mageia tools maintainers
QA Contact:
URL:
Whiteboard:
Keywords: IN_ERRATA6, PATCH
Depends on:
Blocks: 21246 21340
  Show dependency treegraph
 
Reported: 2017-07-17 16:04 CEST by Dmytro Palamarchuk
Modified: 2017-09-05 03:55 CEST (History)
8 users (show)

See Also:
Source RPM: drakxtools-17.88-1.mga6.src.rpm
CVE:
Status comment:


Attachments
Proposed fix (3.58 KB, text/plain)
2017-07-23 18:08 CEST, Martin Whitaker
Details

Description Dmytro Palamarchuk 2017-07-17 16:04:04 CEST
Description of problem:

Cycle reboot after switching from nouveua to nvidia proprietary driver. Architecture x86 and x86_64

Version-Release number of selected component (if applicable): x11-driver-video-nvidia-current.mga6 

How reproducible:

Intall a fresh copy of Mageia 6 from Live Plasma Desktop: choose option with free video driver (nouveau in case of NVidia). After instalation switch from nouveau to nvidia proprietary driver. Reboot. Press Esc key to see the boot process. Then you a message box:" Your dipslay driver is changed and need to reboot. Press Yes to reboot immediatly". After reboot you can see this message box again reboot message box again.... Cycle reboot with message box.

Note1: If I remember correctly, nvidia proprietary driver doesn't support KVM, which means that during the boot process linux kernel messages shows in lower screen resolution than resolution of flat panel. After switching from nouveau to nvidia proprietary driver all the time linux kernel messages shows in native flat panel resolution. I think after switching nouveau module doesn't add to blacklist.

Note2: I don't know if other nvidia drivers are affected by this bug.
       x11-driver-video-nvidia304
       x11-driver-video-nvidia340 

Note3: My NVidia card GeForce GTX 650 

Steps to Reproduce:
1.Intall a fresh copy of Mageia 6 from Live Plasma Desktop: choose option with free video driver (nouveau in case of NVidia)
2.After instalation switch from nouveau to nvidia proprietary driver. Reboot. Press Esc key to see the boot process
3.Press Esc key to see the boot process. Then you a message box:" Your dipslay driver is changed and need to reboot. Press Yes to reboot immediatly".
Comment 1 Marja van Waes 2017-07-17 23:32:38 CEST
Assigning to the kernel and drivers maintainers.

CC: (none) => marja11
Assignee: bugsquad => kernel

Comment 2 Martin Whitaker 2017-07-18 01:02:04 CEST
This is a combination of two problems:

  - drakx11 fails to add the "nokmsboot" kernel boot option to the grub2
    configuration when switching to the proprietary driver
  - service_harddrake tries to correct this, but doesn't actually update
    the grub2 configuration, which leads to the endless loop (bug 21250)

These two problems likely have the same underlying cause. The initial suspect was a regression caused by the fix for bug 18783, but I now think there's more to it than just that.

The workaround is to manually add the nokmsboot option. See the write-up in the errata.

Source RPM: x11-driver-video-nvidia-current.mga6 => drakxtools-17.88-1.mga6.src.rpm
CC: (none) => mageia
Summary: Cycle reboot after switching from nouveua to nvidia proprietary driver x86 x86_64 => Endless reboot loop after switching from nouveau to nvidia proprietary driver
Keywords: (none) => IN_ERRATA6
Assignee: kernel => mageiatools

Comment 3 Charles Edwards 2017-07-18 01:29:08 CEST
When I know that I am going to switch to the nvidia driver I prefer booting
with modprobe.blacklist=nouveau added to the appends and booting the system to init 3.

At init 3 I use XFdrake to change the driver to nvidia.
After|if the the module successfully builds I can use 'service dm start' or 'startx' to start X WITHOUT rebooting and the nvidia modules are happily loaded and used.

I also manually verify that 'nokmsboot' is present in /etc/default/grub.
If not I add it and run 'update_grub2'.

CC: (none) => cae

Comment 4 Dmytro Palamarchuk 2017-07-18 10:33:16 CEST
I think it will be good to add workaround to errata until the bug will be resolved.
Comment 5 Martin Whitaker 2017-07-18 22:07:59 CEST
(In reply to Dmytro Palamarchuk from comment #4)
> I think it will be good to add workaround to errata until the bug will be
> resolved.

It's there already.
Rémi Verschelde 2017-07-19 10:39:14 CEST

Target Milestone: --- => Mageia 6
Priority: Normal => High

Comment 6 Martin Whitaker 2017-07-23 18:08:29 CEST
Created attachment 9511 [details]
Proposed fix

This patch, combined with attachment 9116 [details] from bug 18783, fixes this bug. I've also tested that it doesn't introduce any regressions in the Live installer or when switching from grub to grub2.
Martin Whitaker 2017-07-23 18:09:04 CEST

Keywords: (none) => PATCH

Rémi Verschelde 2017-07-24 14:12:04 CEST

CC: (none) => thierry.vignaud
Blocks: (none) => 21340

Martin Whitaker 2017-07-25 01:16:23 CEST

Blocks: (none) => 21246

Comment 7 Dmytro Palamarchuk 2017-07-25 16:15:19 CEST
Ok. I close the issue. 

Now I can't test, because I don't have any other machine with Nvidia card, except my home PC with fresh Mageia 6 installation.

Resolution: (none) => FIXED
Status: NEW => RESOLVED

Comment 8 Rémi Verschelde 2017-07-25 16:18:10 CEST
Please don't close it, it's still a bug that we want to fix (and there's a proposed fix in comment 6).

Resolution: FIXED => (none)
Status: RESOLVED => REOPENED

Trou Du Cul Merdeux 2017-07-26 20:52:56 CEST

CC: (none) => trouducul
Severity: major => critical

Samuel Verschelde 2017-07-26 21:35:56 CEST

CC: trouducul => (none)
Severity: critical => major

Florian Hubold 2017-07-27 12:03:08 CEST

CC: (none) => doktor5000

Comment 9 Dmytro Palamarchuk 2017-07-27 17:47:11 CEST
I apologize for closing. I understand it fixed, when I read the comment6 first time,
Rémi Verschelde 2017-08-03 16:35:02 CEST

See Also: (none) => https://bugs.mageia.org/show_bug.cgi?id=21250

Comment 10 Thierry Vignaud 2017-08-03 17:35:50 CEST
Martin's fix looks OK as usual.
However I wouldn't push "Care is taken to not overwrite the default_append value if it is defined. This allows default_append to be an empty string, if
that is ever required." to mga5, that changes our decade old behaviour.
Actually, I wouldn't change it in cauldron too w/o discussing.
I'm not sure there's a valid case where it could be empty.
Comment 11 Martin Whitaker 2017-08-04 21:31:23 CEST
(In reply to Thierry Vignaud from comment #10)
> However I wouldn't push "Care is taken to not overwrite the default_append
> value if it is defined. This allows default_append to be an empty string, if
> that is ever required." to mga5, that changes our decade old behaviour.
> Actually, I wouldn't change it in cauldron too w/o discussing.
> I'm not sure there's a valid case where it could be empty.

Well, this is to avoid overriding manual changes made by the user to the /etc/default/grub file - just in case someone has a valid reason for wanting GRUB_CMDLINE_LINUX_DEFAULT to be an empty string. I don't feel that strongly about it though, so feel free to change that (and wait for someone to complain...)

P.S. I forgot to say, perl_checker needs to be taught to recognise the // operator. I hacked my local copy to get it to run.
Comment 12 macxi 2017-08-14 22:35:04 CEST
In Mageia 6 I added "nokmsboot" by editing the "Append" options in the "Set up boot system" section of the MCC.
After reboot, the warning appears: The system must be rebooted due to changing the video driver ".
After 30 seconds, the system reboots and Mageia 6 opens, but with the nouveau driver.

/etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT=" splash quiet noiswmd nokmsboot resume=UUID=885a30f8-76cb-482a-9eab-154662d80d9e audit=0"

lspci -v|less:
01:00.0 VGA compatible controller: NVIDIA Corporation G98 [GeForce 8400 GS Rev. 2] (rev a1) (prog-if 00 [VGA controller])
        Subsystem: Elitegroup Computer Systems Device 2015
        Flags: bus master, fast devsel, latency 0, IRQ 27
        Memory at fa000000 (32-bit, non-prefetchable) [size=16M]
        Memory at d0000000 (64-bit, prefetchable) [size=256M]
        Memory at f8000000 (64-bit, non-prefetchable) [size=32M]
        I/O ports at af00 [size=128]
        [virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
        Capabilities: <access denied>
        Kernel driver in use: nouveau
        Kernel modules: nvidiafb, nouveau

CC: (none) => terraagua

Comment 13 macxi 2017-08-22 23:04:21 CEST
Today I tried to set up the Nvidia driver again after upgrading from Mageia 6 to kernel-desktop-4.9.43-1.mga6-1-1.mga6.x86_64
After reconfiguring and rebooting the computer,  I confirmed with the command "lspci -v | less" that the "nvidia" driver is installed and working properly, replacing the nouveau driver.

[mageia6@local]$ lspci -v|less

01:00.0 VGA compatible controller: NVIDIA Corporation G98 [GeForce 8400 GS Rev. 2] (rev a1) (prog-if 00 [VGA controller])
        Subsystem: Elitegroup Computer Systems Device 2015
        Flags: bus master, fast devsel, latency 0, IRQ 29
        Memory at fa000000 (32-bit, non-prefetchable) [size=16M]
        Memory at d0000000 (64-bit, prefetchable) [size=256M]
        Memory at f8000000 (64-bit, non-prefetchable) [size=32M]
        I/O ports at af00 [size=128]
        [virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
        Capabilities: <access denied>
        Kernel driver in use: nvidia
        Kernel modules: nvidiafb, nouveau, nvidia340
Comment 14 macxi 2017-08-23 02:18:40 CEST
I made two other installations of Mageia 6 that did not install the nvidia driver, just the nouveau driver.

I checked that both installations had the "nokmsboot" option added in the etc/defeast/grub file 

These two installations of Mageia 6 have the same feature: their boot menus are not saved in mbr.
Comment 15 Bruno Cornec 2017-08-24 02:20:12 CEST
Not sure if it helps, but in my case, the link for the nvidia driver wasn't correctly created. I had to do:
ln -sf /usr/lib64/nvidia340/xorg/nvidia_drv.so /usr/lib64/xorg/modules/drivers/ 
to have the X server launching

CC: (none) => bruno

Comment 16 Hoyt Duff 2017-09-05 03:55:15 CEST
Sorry to be late to the party, but when using drakx11 to change to the nvidia driver using a multi-head hardware configuration, drakx11 fails to change all but the first instance of the driver leaving the remainder as "nouveau". As well, even if it identifies that more than one head is present when changing the xor.conf file the nvidia driver, it fails to enable the "TwinView" option, leaving it turned off. I always need to change TwinView manually and manually change the second head driver to nvidia from nouveau. If I don't do that, i will get dropped to recovery mode. It's really not a big deal once you are aware that it needs to be changed manually, but maybe this could be in addressed the errata.

I'm running two cards and three monitors now, but I can revert back to one card and two monitors if you need any information from me. For three heads, neither drakx11 or nvidia-settings is up the the configuration task. 8(

CC: (none) => hoyt


Note You need to log in before you can comment on or make changes to this bug.