Bug 21263 - Endless reboot loop after switching from nouveau to nvidia proprietary driver
Summary: Endless reboot loop after switching from nouveau to nvidia proprietary driver
Status: RESOLVED FIXED
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: 6
Hardware: All Linux
Priority: High major
Target Milestone: Mageia 6
Assignee: Mageia tools maintainers
QA Contact:
URL:
Whiteboard:
Keywords: IN_ERRATA6, PATCH
Depends on:
Blocks: 21246 21340
  Show dependency treegraph
 
Reported: 2017-07-17 16:04 CEST by Dmytro Palamarchuk
Modified: 2018-06-11 23:10 CEST (History)
9 users (show)

See Also:
Source RPM: drakxtools-17.88-1.mga6.src.rpm
CVE:
Status comment:


Attachments
Proposed fix (3.58 KB, text/plain)
2017-07-23 18:08 CEST, Martin Whitaker
Details
Proposed fix v2 (3.02 KB, text/plain)
2017-12-08 10:13 CET, Martin Whitaker
Details
This patch from bug 18783 is also needed... (1.56 KB, text/plain)
2017-12-08 23:11 CET, Martin Whitaker
Details

Description Dmytro Palamarchuk 2017-07-17 16:04:04 CEST
Description of problem:

Cycle reboot after switching from nouveua to nvidia proprietary driver. Architecture x86 and x86_64

Version-Release number of selected component (if applicable): x11-driver-video-nvidia-current.mga6 

How reproducible:

Intall a fresh copy of Mageia 6 from Live Plasma Desktop: choose option with free video driver (nouveau in case of NVidia). After instalation switch from nouveau to nvidia proprietary driver. Reboot. Press Esc key to see the boot process. Then you a message box:" Your dipslay driver is changed and need to reboot. Press Yes to reboot immediatly". After reboot you can see this message box again reboot message box again.... Cycle reboot with message box.

Note1: If I remember correctly, nvidia proprietary driver doesn't support KVM, which means that during the boot process linux kernel messages shows in lower screen resolution than resolution of flat panel. After switching from nouveau to nvidia proprietary driver all the time linux kernel messages shows in native flat panel resolution. I think after switching nouveau module doesn't add to blacklist.

Note2: I don't know if other nvidia drivers are affected by this bug.
       x11-driver-video-nvidia304
       x11-driver-video-nvidia340 

Note3: My NVidia card GeForce GTX 650 

Steps to Reproduce:
1.Intall a fresh copy of Mageia 6 from Live Plasma Desktop: choose option with free video driver (nouveau in case of NVidia)
2.After instalation switch from nouveau to nvidia proprietary driver. Reboot. Press Esc key to see the boot process
3.Press Esc key to see the boot process. Then you a message box:" Your dipslay driver is changed and need to reboot. Press Yes to reboot immediatly".
Comment 1 Marja Van Waes 2017-07-17 23:32:38 CEST
Assigning to the kernel and drivers maintainers.

CC: (none) => marja11
Assignee: bugsquad => kernel

Comment 2 Martin Whitaker 2017-07-18 01:02:04 CEST
This is a combination of two problems:

  - drakx11 fails to add the "nokmsboot" kernel boot option to the grub2
    configuration when switching to the proprietary driver
  - service_harddrake tries to correct this, but doesn't actually update
    the grub2 configuration, which leads to the endless loop (bug 21250)

These two problems likely have the same underlying cause. The initial suspect was a regression caused by the fix for bug 18783, but I now think there's more to it than just that.

The workaround is to manually add the nokmsboot option. See the write-up in the errata.

Source RPM: x11-driver-video-nvidia-current.mga6 => drakxtools-17.88-1.mga6.src.rpm
CC: (none) => mageia
Summary: Cycle reboot after switching from nouveua to nvidia proprietary driver x86 x86_64 => Endless reboot loop after switching from nouveau to nvidia proprietary driver
Keywords: (none) => IN_ERRATA6
Assignee: kernel => mageiatools

Comment 3 Charles Edwards 2017-07-18 01:29:08 CEST
When I know that I am going to switch to the nvidia driver I prefer booting
with modprobe.blacklist=nouveau added to the appends and booting the system to init 3.

At init 3 I use XFdrake to change the driver to nvidia.
After|if the the module successfully builds I can use 'service dm start' or 'startx' to start X WITHOUT rebooting and the nvidia modules are happily loaded and used.

I also manually verify that 'nokmsboot' is present in /etc/default/grub.
If not I add it and run 'update_grub2'.

CC: (none) => cae

Comment 4 Dmytro Palamarchuk 2017-07-18 10:33:16 CEST
I think it will be good to add workaround to errata until the bug will be resolved.
Comment 5 Martin Whitaker 2017-07-18 22:07:59 CEST
(In reply to Dmytro Palamarchuk from comment #4)
> I think it will be good to add workaround to errata until the bug will be
> resolved.

It's there already.
Rémi Verschelde 2017-07-19 10:39:14 CEST

Target Milestone: --- => Mageia 6
Priority: Normal => High

Comment 6 Martin Whitaker 2017-07-23 18:08:29 CEST
Created attachment 9511 [details]
Proposed fix

This patch, combined with attachment 9116 [details] from bug 18783, fixes this bug. I've also tested that it doesn't introduce any regressions in the Live installer or when switching from grub to grub2.
Martin Whitaker 2017-07-23 18:09:04 CEST

Keywords: (none) => PATCH

Rémi Verschelde 2017-07-24 14:12:04 CEST

Blocks: (none) => 21340
CC: (none) => thierry.vignaud

Martin Whitaker 2017-07-25 01:16:23 CEST

Blocks: (none) => 21246

Comment 7 Dmytro Palamarchuk 2017-07-25 16:15:19 CEST
Ok. I close the issue. 

Now I can't test, because I don't have any other machine with Nvidia card, except my home PC with fresh Mageia 6 installation.

Resolution: (none) => FIXED
Status: NEW => RESOLVED

Comment 8 Rémi Verschelde 2017-07-25 16:18:10 CEST
Please don't close it, it's still a bug that we want to fix (and there's a proposed fix in comment 6).

Resolution: FIXED => (none)
Status: RESOLVED => REOPENED

Trou Du Cul Merdeux 2017-07-26 20:52:56 CEST

Severity: major => critical
CC: (none) => trouducul

Samuel Verschelde 2017-07-26 21:35:56 CEST

Severity: critical => major
CC: trouducul => (none)

Florian Hubold 2017-07-27 12:03:08 CEST

CC: (none) => doktor5000

Comment 9 Dmytro Palamarchuk 2017-07-27 17:47:11 CEST
I apologize for closing. I understand it fixed, when I read the comment6 first time,
Rémi Verschelde 2017-08-03 16:35:02 CEST

See Also: (none) => https://bugs.mageia.org/show_bug.cgi?id=21250

Comment 10 Thierry Vignaud 2017-08-03 17:35:50 CEST
Martin's fix looks OK as usual.
However I wouldn't push "Care is taken to not overwrite the default_append value if it is defined. This allows default_append to be an empty string, if
that is ever required." to mga5, that changes our decade old behaviour.
Actually, I wouldn't change it in cauldron too w/o discussing.
I'm not sure there's a valid case where it could be empty.
Comment 11 Martin Whitaker 2017-08-04 21:31:23 CEST
(In reply to Thierry Vignaud from comment #10)
> However I wouldn't push "Care is taken to not overwrite the default_append
> value if it is defined. This allows default_append to be an empty string, if
> that is ever required." to mga5, that changes our decade old behaviour.
> Actually, I wouldn't change it in cauldron too w/o discussing.
> I'm not sure there's a valid case where it could be empty.

Well, this is to avoid overriding manual changes made by the user to the /etc/default/grub file - just in case someone has a valid reason for wanting GRUB_CMDLINE_LINUX_DEFAULT to be an empty string. I don't feel that strongly about it though, so feel free to change that (and wait for someone to complain...)

P.S. I forgot to say, perl_checker needs to be taught to recognise the // operator. I hacked my local copy to get it to run.
Comment 12 macxi 2017-08-14 22:35:04 CEST
In Mageia 6 I added "nokmsboot" by editing the "Append" options in the "Set up boot system" section of the MCC.
After reboot, the warning appears: The system must be rebooted due to changing the video driver ".
After 30 seconds, the system reboots and Mageia 6 opens, but with the nouveau driver.

/etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT=" splash quiet noiswmd nokmsboot resume=UUID=885a30f8-76cb-482a-9eab-154662d80d9e audit=0"

lspci -v|less:
01:00.0 VGA compatible controller: NVIDIA Corporation G98 [GeForce 8400 GS Rev. 2] (rev a1) (prog-if 00 [VGA controller])
        Subsystem: Elitegroup Computer Systems Device 2015
        Flags: bus master, fast devsel, latency 0, IRQ 27
        Memory at fa000000 (32-bit, non-prefetchable) [size=16M]
        Memory at d0000000 (64-bit, prefetchable) [size=256M]
        Memory at f8000000 (64-bit, non-prefetchable) [size=32M]
        I/O ports at af00 [size=128]
        [virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
        Capabilities: <access denied>
        Kernel driver in use: nouveau
        Kernel modules: nvidiafb, nouveau

CC: (none) => terraagua

Comment 13 macxi 2017-08-22 23:04:21 CEST
Today I tried to set up the Nvidia driver again after upgrading from Mageia 6 to kernel-desktop-4.9.43-1.mga6-1-1.mga6.x86_64
After reconfiguring and rebooting the computer,  I confirmed with the command "lspci -v | less" that the "nvidia" driver is installed and working properly, replacing the nouveau driver.

[mageia6@local]$ lspci -v|less

01:00.0 VGA compatible controller: NVIDIA Corporation G98 [GeForce 8400 GS Rev. 2] (rev a1) (prog-if 00 [VGA controller])
        Subsystem: Elitegroup Computer Systems Device 2015
        Flags: bus master, fast devsel, latency 0, IRQ 29
        Memory at fa000000 (32-bit, non-prefetchable) [size=16M]
        Memory at d0000000 (64-bit, prefetchable) [size=256M]
        Memory at f8000000 (64-bit, non-prefetchable) [size=32M]
        I/O ports at af00 [size=128]
        [virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
        Capabilities: <access denied>
        Kernel driver in use: nvidia
        Kernel modules: nvidiafb, nouveau, nvidia340
Comment 14 macxi 2017-08-23 02:18:40 CEST
I made two other installations of Mageia 6 that did not install the nvidia driver, just the nouveau driver.

I checked that both installations had the "nokmsboot" option added in the etc/defeast/grub file 

These two installations of Mageia 6 have the same feature: their boot menus are not saved in mbr.
Comment 15 Bruno Cornec 2017-08-24 02:20:12 CEST
Not sure if it helps, but in my case, the link for the nvidia driver wasn't correctly created. I had to do:
ln -sf /usr/lib64/nvidia340/xorg/nvidia_drv.so /usr/lib64/xorg/modules/drivers/ 
to have the X server launching

CC: (none) => bruno

Comment 16 Hoyt Duff 2017-09-05 03:55:15 CEST
Sorry to be late to the party, but when using drakx11 to change to the nvidia driver using a multi-head hardware configuration, drakx11 fails to change all but the first instance of the driver leaving the remainder as "nouveau". As well, even if it identifies that more than one head is present when changing the xor.conf file the nvidia driver, it fails to enable the "TwinView" option, leaving it turned off. I always need to change TwinView manually and manually change the second head driver to nvidia from nouveau. If I don't do that, i will get dropped to recovery mode. It's really not a big deal once you are aware that it needs to be changed manually, but maybe this could be in addressed the errata.

I'm running two cards and three monitors now, but I can revert back to one card and two monitors if you need any information from me. For three heads, neither drakx11 or nvidia-settings is up the the configuration task. 8(

CC: (none) => hoyt

Comment 17 Martin Whitaker 2017-12-08 10:13:02 CET
Created attachment 9822 [details]
Proposed fix v2

Although I think it would be better not to override user choices, in the interest of getting this fix released, I've updated the patch to override an empty string with the default append value, so preserving the existing behaviour.

Attachment 9511 is obsolete: 0 => 1

Comment 18 Martin Whitaker 2017-12-08 23:11:49 CET
Created attachment 9825 [details]
This patch from bug 18783 is also needed...

Attaching it here, as I forgot it when I tested my revised patch.
Comment 19 Mageia Robot 2018-01-09 22:18:18 CET
commit 6fb73fba36f9fcf88251f3bd3ad8ee67950c8541
Author: Martin Whitaker <mageia@...>
Date:   Sun Jul 23 16:18:49 2017 +0100

    Combine bootloader perImageAppend and default_append keys.
    
    This allows changes to the append options to propagate to the grub2
    configuration file, thus fixing mga#21263 and mga#21250.
---
 Commit Link:
   http://gitweb.mageia.org/software/drakx/commit/?id=6fb73fba36f9fcf88251f3bd3ad8ee67950c8541

 Bug links:
   Mageia
      https://bugs.mageia.org/21263
      https://bugs.mageia.org/21250
Comment 20 Maurice Batey 2018-04-26 18:15:57 CEST
> Dmytro Palamarchuk 2017-07-17 16:04:04 CEST
> Description of problem:
> Cycle reboot after switching from nouveua to nvidia proprietary driver. 

Have just been hit with this same situation on UEFI-installed Plasma 64-bit Mageia-6 (fully updated), on PC with ASUS Z270-K motherboard.

PC has 2 video cards: Intel HD630, and nVidia GT730.

Wanting to check out the nVidia, I plugged the monitor's HDMI cable into the PC's 'nVidia socket, and Mageia-6 (booting via rEFInd*) came up using the 'nouveau' driver.
  Assuming 'nouveau' would not produce better results than Intel HD630, I then finessed installation of the proprietary nVidia driver, and rebooted.
  But rebooting caused the booting loop described in this bug report.

To get back to a usable Mageia-6 I reconnected the monitor to the PC's Intel HD630 HDMI socket, after which Mageia-6 has behaved normally.
  N.B. Windows10 comes up normally whichever graphics socket is used.

Bug report 19058 suggests there is no inherent Linux problem with the nVidia GT730 driver.

  * The /boot/refind_linux.conf file contains:

"Boot with standard options"  "root=/dev/sda4 ro splash quiet noiswmd resume=/dev/sda5"
"Boot to single-user mode"    "root=/dev/sda4 ro splash quiet noiswmd resume=/dev/sda5"
"Boot with minimal options"   "ro root=/dev/sda4"

Can the 'nokmsboot' solution somehow be applied here?

CC: (none) => maurice

Comment 21 Martin Whitaker 2018-04-27 23:35:34 CEST
(In reply to Maurice Batey from comment #20)
>   * The /boot/refind_linux.conf file contains:
> 
> "Boot with standard options"  "root=/dev/sda4 ro splash quiet noiswmd
> resume=/dev/sda5"
> "Boot to single-user mode"    "root=/dev/sda4 ro splash quiet noiswmd
> resume=/dev/sda5"
> "Boot with minimal options"   "ro root=/dev/sda4"
> 
> Can the 'nokmsboot' solution somehow be applied here?

Yes, you need to add the nokmsboot option to all three lines if you want to use the proprietary nvidia driver. You can test this before modifying the file by pressing F2 after selecting the Mageia icon in rEFInd (which brings up the options menu), then pressing F2 again to edit the boot command line.
Comment 22 Maurice Batey 2018-04-28 13:15:11 CEST
Thank you, Martin!
  Not 100% sure of the correct syntax, but I assume one just inserts the string "NOKMSBOOT" (or "nokmsboot"?) into the list at the end of each line, e.g.

"Boot with standard options"  "root=/dev/sda4 ro splash quiet noiswmd nokmsboot
 resume=/dev/sda5"

"Boot to single-user mode"    "root=/dev/sda4 ro splash quiet noiswmd nokmsboot
resume=/dev/sda5"


"Boot with minimal options"   "ro root=/dev/sda4 nokmsboot"

N.B. Recently discovered the reEFInd script 'mkrlconf' for generating the  /boot/refind_linux.conf file, but perhaps does not know about 'nokmsboot'.
Comment 23 Maurice Batey 2018-05-07 20:21:33 CEST
Yes, that did the trick - many thanks, Martin!
  Now booting with nVidia proprietary driver (390.42).
Martin Whitaker 2018-05-29 23:16:10 CEST

See Also: (none) => https://bugs.mageia.org/show_bug.cgi?id=23097

Comment 24 Martin Whitaker 2018-06-11 23:10:21 CEST
Fix released in drakxtools-17.88.2-1.mga6.src.rpm.

Resolution: (none) => FIXED
Status: REOPENED => RESOLVED


Note You need to log in before you can comment on or make changes to this bug.