Bug 21263 - Endless reboot loop after switching from nouveau to nvidia proprietary driver
Summary: Endless reboot loop after switching from nouveau to nvidia proprietary driver
Status: REOPENED
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: 6
Hardware: All Linux
Priority: High major
Target Milestone: Mageia 6
Assignee: Mageia tools maintainers
QA Contact:
URL:
Whiteboard:
Keywords: IN_ERRATA6, PATCH
Depends on:
Blocks: 21246 21340
  Show dependency treegraph
 
Reported: 2017-07-17 16:04 CEST by Dmytro Palamarchuk
Modified: 2017-08-14 22:35 CEST (History)
6 users (show)

See Also:
Source RPM: drakxtools-17.88-1.mga6.src.rpm
CVE:
Status comment:


Attachments
Proposed fix (3.58 KB, text/plain)
2017-07-23 18:08 CEST, Martin Whitaker
Details

Description Dmytro Palamarchuk 2017-07-17 16:04:04 CEST
Description of problem:

Cycle reboot after switching from nouveua to nvidia proprietary driver. Architecture x86 and x86_64

Version-Release number of selected component (if applicable): x11-driver-video-nvidia-current.mga6 

How reproducible:

Intall a fresh copy of Mageia 6 from Live Plasma Desktop: choose option with free video driver (nouveau in case of NVidia). After instalation switch from nouveau to nvidia proprietary driver. Reboot. Press Esc key to see the boot process. Then you a message box:" Your dipslay driver is changed and need to reboot. Press Yes to reboot immediatly". After reboot you can see this message box again reboot message box again.... Cycle reboot with message box.

Note1: If I remember correctly, nvidia proprietary driver doesn't support KVM, which means that during the boot process linux kernel messages shows in lower screen resolution than resolution of flat panel. After switching from nouveau to nvidia proprietary driver all the time linux kernel messages shows in native flat panel resolution. I think after switching nouveau module doesn't add to blacklist.

Note2: I don't know if other nvidia drivers are affected by this bug.
       x11-driver-video-nvidia304
       x11-driver-video-nvidia340 

Note3: My NVidia card GeForce GTX 650 

Steps to Reproduce:
1.Intall a fresh copy of Mageia 6 from Live Plasma Desktop: choose option with free video driver (nouveau in case of NVidia)
2.After instalation switch from nouveau to nvidia proprietary driver. Reboot. Press Esc key to see the boot process
3.Press Esc key to see the boot process. Then you a message box:" Your dipslay driver is changed and need to reboot. Press Yes to reboot immediatly".
Comment 1 Marja van Waes 2017-07-17 23:32:38 CEST
Assigning to the kernel and drivers maintainers.

CC: (none) => marja11
Assignee: bugsquad => kernel

Comment 2 Martin Whitaker 2017-07-18 01:02:04 CEST
This is a combination of two problems:

  - drakx11 fails to add the "nokmsboot" kernel boot option to the grub2
    configuration when switching to the proprietary driver
  - service_harddrake tries to correct this, but doesn't actually update
    the grub2 configuration, which leads to the endless loop (bug 21250)

These two problems likely have the same underlying cause. The initial suspect was a regression caused by the fix for bug 18783, but I now think there's more to it than just that.

The workaround is to manually add the nokmsboot option. See the write-up in the errata.

Source RPM: x11-driver-video-nvidia-current.mga6 => drakxtools-17.88-1.mga6.src.rpm
CC: (none) => mageia
Summary: Cycle reboot after switching from nouveua to nvidia proprietary driver x86 x86_64 => Endless reboot loop after switching from nouveau to nvidia proprietary driver
Keywords: (none) => IN_ERRATA6
Assignee: kernel => mageiatools

Comment 3 Charles Edwards 2017-07-18 01:29:08 CEST
When I know that I am going to switch to the nvidia driver I prefer booting
with modprobe.blacklist=nouveau added to the appends and booting the system to init 3.

At init 3 I use XFdrake to change the driver to nvidia.
After|if the the module successfully builds I can use 'service dm start' or 'startx' to start X WITHOUT rebooting and the nvidia modules are happily loaded and used.

I also manually verify that 'nokmsboot' is present in /etc/default/grub.
If not I add it and run 'update_grub2'.

CC: (none) => cae

Comment 4 Dmytro Palamarchuk 2017-07-18 10:33:16 CEST
I think it will be good to add workaround to errata until the bug will be resolved.
Comment 5 Martin Whitaker 2017-07-18 22:07:59 CEST
(In reply to Dmytro Palamarchuk from comment #4)
> I think it will be good to add workaround to errata until the bug will be
> resolved.

It's there already.
Rémi Verschelde 2017-07-19 10:39:14 CEST

Target Milestone: --- => Mageia 6
Priority: Normal => High

Comment 6 Martin Whitaker 2017-07-23 18:08:29 CEST
Created attachment 9511 [details]
Proposed fix

This patch, combined with attachment 9116 [details] from bug 18783, fixes this bug. I've also tested that it doesn't introduce any regressions in the Live installer or when switching from grub to grub2.
Martin Whitaker 2017-07-23 18:09:04 CEST

Keywords: (none) => PATCH

Rémi Verschelde 2017-07-24 14:12:04 CEST

Blocks: (none) => 21340
CC: (none) => thierry.vignaud

Martin Whitaker 2017-07-25 01:16:23 CEST

Blocks: (none) => 21246

Comment 7 Dmytro Palamarchuk 2017-07-25 16:15:19 CEST
Ok. I close the issue. 

Now I can't test, because I don't have any other machine with Nvidia card, except my home PC with fresh Mageia 6 installation.

Resolution: (none) => FIXED
Status: NEW => RESOLVED

Comment 8 Rémi Verschelde 2017-07-25 16:18:10 CEST
Please don't close it, it's still a bug that we want to fix (and there's a proposed fix in comment 6).

Resolution: FIXED => (none)
Status: RESOLVED => REOPENED

Trou Du Cul Merdeux 2017-07-26 20:52:56 CEST

CC: (none) => trouducul
Severity: major => critical

Samuel Verschelde 2017-07-26 21:35:56 CEST

CC: trouducul => (none)
Severity: critical => major

Florian Hubold 2017-07-27 12:03:08 CEST

CC: (none) => doktor5000

Comment 9 Dmytro Palamarchuk 2017-07-27 17:47:11 CEST
I apologize for closing. I understand it fixed, when I read the comment6 first time,
Rémi Verschelde 2017-08-03 16:35:02 CEST

See Also: (none) => https://bugs.mageia.org/show_bug.cgi?id=21250

Comment 10 Thierry Vignaud 2017-08-03 17:35:50 CEST
Martin's fix looks OK as usual.
However I wouldn't push "Care is taken to not overwrite the default_append value if it is defined. This allows default_append to be an empty string, if
that is ever required." to mga5, that changes our decade old behaviour.
Actually, I wouldn't change it in cauldron too w/o discussing.
I'm not sure there's a valid case where it could be empty.
Comment 11 Martin Whitaker 2017-08-04 21:31:23 CEST
(In reply to Thierry Vignaud from comment #10)
> However I wouldn't push "Care is taken to not overwrite the default_append
> value if it is defined. This allows default_append to be an empty string, if
> that is ever required." to mga5, that changes our decade old behaviour.
> Actually, I wouldn't change it in cauldron too w/o discussing.
> I'm not sure there's a valid case where it could be empty.

Well, this is to avoid overriding manual changes made by the user to the /etc/default/grub file - just in case someone has a valid reason for wanting GRUB_CMDLINE_LINUX_DEFAULT to be an empty string. I don't feel that strongly about it though, so feel free to change that (and wait for someone to complain...)

P.S. I forgot to say, perl_checker needs to be taught to recognise the // operator. I hacked my local copy to get it to run.
Comment 12 macxi macxi 2017-08-14 22:35:04 CEST
In Mageia 6 I added "nokmsboot" by editing the "Append" options in the "Set up boot system" section of the MCC.
After reboot, the warning appears: The system must be rebooted due to changing the video driver ".
After 30 seconds, the system reboots and Mageia 6 opens, but with the nouveau driver.

/etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT=" splash quiet noiswmd nokmsboot resume=UUID=885a30f8-76cb-482a-9eab-154662d80d9e audit=0"

lspci -v|less:
01:00.0 VGA compatible controller: NVIDIA Corporation G98 [GeForce 8400 GS Rev. 2] (rev a1) (prog-if 00 [VGA controller])
        Subsystem: Elitegroup Computer Systems Device 2015
        Flags: bus master, fast devsel, latency 0, IRQ 27
        Memory at fa000000 (32-bit, non-prefetchable) [size=16M]
        Memory at d0000000 (64-bit, prefetchable) [size=256M]
        Memory at f8000000 (64-bit, non-prefetchable) [size=32M]
        I/O ports at af00 [size=128]
        [virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
        Capabilities: <access denied>
        Kernel driver in use: nouveau
        Kernel modules: nvidiafb, nouveau

CC: (none) => terraagua


Note You need to log in before you can comment on or make changes to this bug.