Bug 19984 - Installing nvidia 340 driver from the Classical DVD causes boot failure
Summary: Installing nvidia 340 driver from the Classical DVD causes boot failure
Status: RESOLVED FIXED
Alias: None
Product: Mageia
Classification: Unclassified
Component: Installer (show other bugs)
Version: Cauldron
Hardware: All Linux
Priority: release_blocker critical
Target Milestone: ---
Assignee: Kernel and Drivers maintainers
QA Contact:
URL:
Whiteboard:
Keywords: 6sta2
Depends on: 17194
Blocks:
  Show dependency treegraph
 
Reported: 2016-12-19 16:35 CET by Thomas Andrews
Modified: 2017-03-27 18:03 CEST (History)
6 users (show)

See Also:
Source RPM:
CVE:
Status comment: Reproducible with nvidia340 regardless of boot option, and nouveau with nomodeset/nokmsboot


Attachments
report.bug.xz from 32-bit install (162.73 KB, application/x-xz)
2017-01-14 00:26 CET, Martin Whitaker
Details
dkms build log from 32-bit install (5.92 KB, application/gzip)
2017-01-14 00:30 CET, Martin Whitaker
Details
dkms build log from rebuild after booting system (5.23 KB, application/gzip)
2017-01-14 00:33 CET, Martin Whitaker
Details
bugreport for nvidia304 (163.46 KB, application/x-xz)
2017-01-15 18:36 CET, papoteur
Details
dkms build log from a 64-bit system (9.13 KB, application/x-tar)
2017-03-06 22:23 CET, Thomas Andrews
Details

Description Thomas Andrews 2016-12-19 16:35:56 CET
Description of problem:
If I choose to install the nvidia 340 driver from the sta2 Classical DVD, the install seems to proceed normally. But, upon the first boot I see a screen that is blank except for three question marks. Each question mark is highlighted, and then I get another screen with a message that the system is switching to the nouveau driver. The boot then proceeds to a normal-looking Plasma desktop.

If I re-install, this time choosing NOT to install the 340 driver, the boot looks normal. But, if I then choose to use MCC to install the 340 driver, it again appears to go normally, but when I reboot I get a message that the driver needs the "nodkmsboot" kernel option. Rebooting with that option again gives me the three question marks, and so on.

If I re-install again, this time ignoring the 340 driver entirely, everything works as it should.

On a related note, when booted when the 340 driver is installed, none of the "leave" options appear to work. I was able to use "shutdown now" from the command line to shut down the system. This problem does not occur with the nouveau driver when the 340 driver is NOT installed.
Thomas Andrews 2016-12-19 16:36:56 CET

Keywords: (none) => 6sta2

Thomas Andrews 2016-12-19 16:37:43 CET

Priority: Normal => High

Comment 1 Thomas Andrews 2016-12-19 16:41:55 CET
Forgot to mention, if the 340 driver is installed, and I then, through MCC, tell the system not to use it, upon rebooting the system attempts to rebuild the kernel module, apparently failing near the end.
Marja Van Waes 2016-12-19 17:18:34 CET

CC: (none) => marja11
Assignee: bugsquad => kernel

Comment 2 Martin Whitaker 2016-12-23 01:06:01 CET
I suspect you are hitting the same plymouth bug I found and described in bug 19890 (which I suspected might come back to bite us with proprietary drivers).

On a system with the 340 driver installed and using the nokmsboot option, could you try editing the file /usr/share/plymouth/plymouth.defaults and change the DeviceTimeout value to 1. If I'm right, that should "fix" the bootsplash issue.

I also once experienced a problem with non-working "leave" options in Plasma, but am not sure under what circumstances. In my case it was only the panel that was affected - selecting shutdown from the main menu worked.

CC: (none) => mageia

Comment 3 Thomas Andrews 2016-12-23 02:13:21 CET
If by "fix" you mean I don't see the screen with the three question marks, It didn't help. It is going ahead and booting with the 340 driver, though.

On the "leave" problem, that went away with the next boot after installing the 340 driver.
Comment 4 Martin Whitaker 2016-12-29 20:36:31 CET
Having finally got the Live DVD to install the 340 driver on boot, I have now observed the same grey screen/three question marks issue. I also see it if I add the nomodeset (or nokmsboot) option with the nouveau driver.
Comment 5 Thomas Andrews 2016-12-29 23:24:28 CET
Oh, good. I was beginning to wonder if it was just me.
Rémi Verschelde 2017-01-04 09:53:04 CET

Status comment: (none) => Reproducible with nvidia340 regardless of boot option, and nouveau with nomodeset/nokmsboot
Priority: High => release_blocker
CC: (none) => tmb
See Also: (none) => https://bugs.mageia.org/show_bug.cgi?id=19890
Summary: Installing nvidia 340 driver from the Classical DVD causes boot problems => Plymouth issues (grey screen/3 question marks) with nvidia 340 driver or nouveau with nomodeset/nokmsboot

Comment 6 Thomas Andrews 2017-01-04 15:27:20 CET
As of the updates to server kernel 4.9.0-3 and nvidia 340 (not sure of the number at the moment) driver last week, this problem has disappeared. I no longer see the gray screen with the three question marks, and the boot appears to go normally, not taking any longer than it should. I even see the screen with the nvidia symbol come up partway through the boot, which I didn't see before with Mageia 6.

I do remember that I had to add the "nokmsboot" kernel option manually. At the time I installed the nvidia 340 driver via MCC, it was not added as part of the installation. However, the drakxtools have been updated since then, and for all I know that has been fixed, too. 

I'll know more when I test the next round of pretest sta2 isos. There have been so many updates since the last round that I doubt another test of the old ones would be valid. If there is a problem with the proprietary nvidia340 installation then, I will file a new bug.

For now, I believe this bug should be marked as resolved and fixed.
Comment 7 Martin Whitaker 2017-01-04 18:38:23 CET
I still see this bug with a freshly built Live DVD using kernel 4.9.0-3, both with nvidia340 and nouveau drivers. In my case it is the same issue seen in bug 19890 - if the necessary video driver isn't present in the initrd or is prevented from loading by the nokmsboot option, the default value of the plymouth device timeout causes it to timeout right in the gap when the framebuffer device is not available. Either increasing or decreasing the timeout value fixes this, but it might be hard to find a value that works well for all systems. Large values should be safe, but leave you looking at a blank screen for a long time.
Comment 8 Thomas Andrews 2017-01-04 19:08:46 CET
Hmmm. I had forgotten the change you asked to make in comment 2. I just went back and looked, and the value is still set at "1." One thing I didn't notice before, though, is that the file on my machine is named "plymouthd.defaults," not "plymouth.defaults."
Comment 9 Martin Whitaker 2017-01-04 19:19:03 CET
(In reply to Thomas Andrews from comment #8)
> Hmmm. I had forgotten the change you asked to make in comment 2. I just went
> back and looked, and the value is still set at "1." One thing I didn't
> notice before, though, is that the file on my machine is named
> "plymouthd.defaults," not "plymouth.defaults."

That's just me not typing the name right!

The other thing that could have made a difference is if your initrd got rebuilt and now includes the nvidia340 driver when previously it didn't (just speculating here...)
Comment 10 Thomas Andrews 2017-01-05 20:29:20 CET
(In reply to Martin Whitaker from comment #9)

> 
> The other thing that could have made a difference is if your initrd got
> rebuilt and now includes the nvidia340 driver when previously it didn't
> (just speculating here...)

Could be. I just tried changing the timeout value back to "5" and it still booted without the question marks. There was one point where I had a black screen with a mouse cursor in the center that was longer than before, but it still proceeded normally from there.
Comment 11 Thomas Andrews 2017-01-06 03:22:21 CET
I just did a new install(for other reasons) from the last Classical 64 DVD from pretesting, the one that never made it to QA. My procedure was to not install the proprietary driver at first, establish specific media on the first boot, get all updates so that I had the latest drakxtools and the latest kernel, then reboot and install the nvidia340 driver from Hardware in MCC.

Nokmsboot was NOT added to the kernel options for me. I had to do it manually. Please let me know if I should open another, separate bug concerning this specific issue. It needs to be fixed.

The three question marks are back. After a while, the desktop came up normally, using the 340 driver.

So, it appears that I was wrong in Comment 6. Sigh.
Comment 12 Thomas Andrews 2017-01-06 03:28:49 CET
Oh, yes, I almost forgot. In the session where I installed the nvidia 340 driver, the "Leave" options from the panel and the desktop context menu would not work. Other functions did. I used the "shutdown" command from Konsole to, well, shut down. On the next boot, "Leave" functioned normally.

Yet another bug report, perhaps?
Comment 13 Mageia Robot 2017-01-06 16:33:25 CET
commit bd5f5a5290e66ce1b90b1ff87b1ae134ee7e4a9b
Author: Martin Whitaker <mageia@...>
Date:   Fri Jan 6 15:08:10 2017 +0000

    live.cfg: try to avoid plymouth falling back to text mode (mga#19984)
    
    The default device timeout of 5s is not long enough for a Live boot.
---
 Commit Link:
   http://gitweb.mageia.org/software/build-system/draklive-config/commit/?id=bd5f5a5290e66ce1b90b1ff87b1ae134ee7e4a9b
Comment 14 Thomas Andrews 2017-01-13 17:22:51 CET
Still valid with the Jan 11 Classical iso, except worse. I believe there are several bugs involved here, and someone with more knowledge than I needs to sort this out. I'll describe what happened, which is similar to my original bug report, although I was looking for some things this time that I wasn't before:

I installed Plasma from the Classical iso on my MBR system, using the "Use existing partitions" partitioning option, the one I believe most current Mageia 5 users would want to choose. I chose to format both / and /home, intending to restore personal data on /home later from backups. 

The first time, I said "yes" to the proprietary driver, and it appeared to go normally, as did the rest of the install. I checked before attempting the boot, and the installer had added the "nokmsboot" kernel option, as expected. The boot failed altogether. I got the screen with the question marks, and after what seemed like a rather long time that went away and I had a blank, black screen with what looked like a blinking text cursor. I let that sit there for about 5 minutes, and aborted the boot.

I then did the whole install over again, choosing the same options except that I did not format /home this time, and I refused the proprietary driver. The install appeared to go normally. I checked before the first boot, and the "nokmsboot" option was not there, as expected. During the boot I again saw the three question marks, but not for as long, and the boot was successful.

After getting updates, which included drakconf and associated tools, and restoring /home data from backups, I booted again just to see if it would still work, and was successful.

I then used MCC/Hardware to install the proprietary driver. It appeared to go normally, but after installing it and closing MCC, an attempt to re-open it showed a blank, grey window. Also, the "Leave" options would not work, so I used "shutdown now" to exit. 

I rebooted, discovering that this time the "nokmsboot" kernel option had NOT been added, and I had to add it manually. With "nokmsboot" in place, the boot proceeded, once again to the three question marks, but as long as that long ago first boot, and the boot was successful, using the 340 driver.

Changing the description back to similar to my original, as I believe that is the most important part, and it is the reason I originally filed the bug. The Classical installer is not installing the driver correctly, resulting in a failed boot. In my opinion the question mark screen is a relatively minor issue, and not necessarily a release blocker as long as the boot is successful.

Summary: Plymouth issues (grey screen/3 question marks) with nvidia 340 driver or nouveau with nomodeset/nokmsboot => Installing nvidia 340 driver from the Classical DVD causes boot failure

Comment 15 Martin Whitaker 2017-01-14 00:26:41 CET
Created attachment 8853 [details]
report.bug.xz from 32-bit install

I was able to reproduce this bug by installing from the Jan 11 Mageia-6-sta2-i586-DVD ISO. The nvidia340 kernel module was built and installed by the installer, but on booting the installed system, the module could not be loaded. Booting to run level 3 and attempting to manually load the module (using 'modprobe -v') gave:

  insmod /lib/modules/4.9.2-server-1.mga6/dkms/drivers/char/drm/nvidia340.ko.xz 
  modprobe: ERROR: could not insert 'nvidia340': Exec format error

Forcing dkms to rebuild the module fixed the problem.

A previous attempt to reproduce the problem using the 64-bit ISO did not give any such error - the installed system booted to a working desktop straight away. One thing I noticed is that the 32-bit install has chosen to use the server kernel.
Comment 16 Martin Whitaker 2017-01-14 00:30:16 CET
Created attachment 8854 [details]
dkms build log from 32-bit install

Here's the dkms build log from the build done during installation
Comment 17 Martin Whitaker 2017-01-14 00:33:04 CET
Created attachment 8855 [details]
dkms build log from rebuild after booting system

And in case it helps, the log from the rebuild.
Comment 18 Thomas Andrews 2017-01-14 00:44:01 CET
(In reply to Martin Whitaker from comment #15)

> 
> A previous attempt to reproduce the problem using the 64-bit ISO did not
> give any such error - the installed system booted to a working desktop
> straight away. One thing I noticed is that the 32-bit install has chosen to
> use the server kernel.

Hmmm. Because my BIOS erroneously reports ECC memory instead of the non-ECC memory that the motherboard manual says is required, the Classical installers always choose the server kernel for this hardware.
Comment 19 papoteur 2017-01-15 18:36:49 CET
Created attachment 8857 [details]
bugreport for nvidia304

I seem to be affected by the same bug, with nvidia304 driver.
What is similar:
kernel server 32bits
At first boot, I get a dialog box saying that nouveau will be select.
What is different : nvidia304

CC: (none) => yves.brungard_mageia

Marja Van Waes 2017-01-16 00:07:12 CET

CC: (none) => laidlaws

Comment 20 papoteur 2017-01-21 12:52:55 CET
I just installed with sta2 dated 20th Jan.
I get the same behavior.
At first boot, I get:
....(bad exit status:10)
build failed Installation skipped
I get also a window named "Display driver setup"
"The display driver has been automatically switched to "nouveau"
Reason: le pilote propriétaire n'a pas été trouvé pour le pilote X.org "nvidia"

At the next boot, I get 
nvidia304: Already installed on this kernel
 in the boot messages, without splash screen.

I don't know why kernel-server is selected. Ram size is 4Gb.
Comment 21 papoteur 2017-01-21 13:09:02 CET
@ Martin
How to get a make.log? Or where is it, if already stored?
Comment 22 Doug Laidlaw 2017-01-21 13:24:52 CET
> I don't know why kernel-server is selected. Ram size is 4Gb.

Even with 4 Gb, kernel-server is the default for a 32-bit installation.

> How to get a make.log?  

It is probably somewhere in the journal, which I don't fully understand.  Martin was talking about the log from DKMS.
Comment 23 Martin Whitaker 2017-01-21 16:57:50 CET
(In reply to papoteur from comment #21)
> @ Martin
> How to get a make.log? Or where is it, if already stored?

The make.log from the dkms build can be found in

  /var/lib/dkms/$driver_name/$driver_version/$kernel_version/$arch/log/make.log

where in my case

  $driver_name = nvidia340
  $driver_version = 340.101-2.mga6.nonfree
  $kernel_version = 4.9.2-server-1.mga6
  $arch = i586
Comment 24 Rémi Verschelde 2017-03-06 10:41:21 CET
Is this issue still reproducible in current sta2 ISOs?
Comment 25 Doug Laidlaw 2017-03-06 11:54:06 CET
I was in only for Comment 22.  I shouldn't be on this bug at all.
Doug Laidlaw 2017-03-06 11:54:34 CET

CC: laidlaws => (none)

Comment 26 Thomas Andrews 2017-03-06 14:59:05 CET
(In reply to Rémi Verschelde from comment #24)
> Is this issue still reproducible in current sta2 ISOs?

I have not tried to install the nvidia driver from the iso since my Jan13 report, as I had heard nothing to indicate that there had been any progress on it. Our attention had been elsewhere.

I tend to believe it is still valid, but I will try it again this morning and report back. Give me a couple of hours or so to get it done.
Comment 27 Thomas Andrews 2017-03-06 16:02:58 CET
Still valid with the current 64-bit Classical iso. I have not tried the 32-bit iso, but I have no reason to expect a different result.

This time I removed "splash quiet" on the first boot. Everything proceeded normally until the building on the nvidia module. This seemed to stop early with a message "bad exit status: 5" before proceeding with the boot. The boot finally stopped, hung up, after a series of messages about "no such helper" regarding "pptp," "netbios-ns," "irc," "irc-0," and "snmp."

On previous occasions, I re-installed, thus wiping out any log of that first boot. This time I will not do that. If someone could instruct me on how to obtain that log when I can't boot into the affected install, I would be happy to do so. 

There is another Mageia 6 install, my production install, on another hard drive of the same hardware. I should be able to boot into that, even with the failed install on the secondary hard drive. Or, I could boot one of the Lives to do the trick, if that would work better.
Comment 28 Thomas Andrews 2017-03-06 16:17:45 CET
I decided to try booting a second time, again without "splash quiet" and again it failed. So now there should be two logs of failed boots.

This time there was no delay while the nvidia module was built, but the boot hung up at the same point, with the same messages, except in a different order.

Typing this from my production install, meaning of course that I am still able to boot into it.
Comment 29 Thomas Andrews 2017-03-06 16:48:16 CET
Forgot to mention that I did NOT get updates through the iso at the end of the install. I never do. More than half the time MIRRORLIST hooks me up with an inadequate mirror on official releases, and it's even worse with Cauldron. I always connect with a specific mirror, one I know to usually be reliable, after the first boot.

So if there is something in the repos that would have fixed this, I didn't install it.
Comment 30 Alfred Kretschmer 2017-03-06 17:57:07 CET
Today i made a bare metal install with netinstall-iso on my old laptop with nvidia 6100go, driver version is 304.
Installed the Nvidiadriver while installing but after first reboot nouveau was loaded. As a workaround i rebuild the initrd omitting nouveau (dracut -f --omit-driver=nouveau --xz)and after next reboot nvidiadriver worked.

CC: (none) => alfred.kretschmer

Comment 31 Martin Whitaker 2017-03-06 20:40:47 CET
@TJ, I think your problem is the same as I am seeing - a failure in the dkms build. The boot log is not that helpful - the errors show up in the dkms build log (see comment 23). If you have a working install that can see that disk, you can use diskdrake to mount the root partition on any directory of your choice (I usually use /mnt/tmp) and then retrieve the log file.

@Alfred, that sounds like the nokmsboot option didn't get added to your boot command line during the install. Can you check if that is so?
Comment 32 Alfred Kretschmer 2017-03-06 20:53:40 CET
(In reply to Martin Whitaker from comment #31)
> 
> @Alfred, that sounds like the nokmsboot option didn't get added to your boot
> command line during the install. Can you check if that is so?

I did not check the command line options, so I'll do a new install and report later.
Comment 33 Thomas Andrews 2017-03-06 22:23:45 CET
Created attachment 9033 [details]
dkms build log from a 64-bit system

OK, here's the dkms log from my 64-bit system that failed. To my untrained eye it looks like mostly gibberish, but maybe someone else can make some sense of it...
Comment 34 Alfred Kretschmer 2017-03-07 15:06:19 CET
(In reply to Alfred Kretschmer from comment #32)
> (In reply to Martin Whitaker from comment #31)
> > 
> > @Alfred, that sounds like the nokmsboot option didn't get added to your boot
> > command line during the install. Can you check if that is so?
> 
> I did not check the command line options, so I'll do a new install and
> report later.

Did a new install this morning, and yes nokmsboot option is set, but ... this time i ran into the "good luck" error :(
Comment 35 Doug Laidlaw 2017-03-08 14:22:40 CET
I can't boot either. This is the sta2 Classical that has just been released.

/boot/grub2/grub.cfg was created as grub.cfg.new. (twice.)
The bootup exits the 3 question marks, and hangs at "Checking for new hardware."

I will file a new detailed report tomorrow.

CC: (none) => laidlaws

Comment 36 Martin Whitaker 2017-03-11 00:58:14 CET
Just to recap, there are at least two different bugs described here:

1) On many systems, when using a proprietary video driver, plymouth times out and falls back to text mode (grey screen and three question marks) before the video driver gets loaded. The default timeout of 5 seconds is not long enough.

2) dkms often fails to build the proprietary video driver correctly, either resulting in the system falling back to using the free driver or failing to start the X server at all (usually resulting in the "good luck" message).

(1) can be worked around by increasing the DeviceTimeout value in /usr/share/plymouth/plymouthd.defaults. The worst case found so far required it to be increased to 11 seconds. This is not a very satisfactory solution, as it leaves the user looking at a blank screen (or maybe some text error messages) for a long time.

(2) is due to /sbin/dkms_autoinstaller being called twice during boot, once by the dkms-autoload service, and once by the mandriva-everytime service. As these services are started at the same time, this means two dkms processes are running simultaneously, hence the erratic results. This is the result of the changes described in bug 17194 not being applied correctly. A patch has been added to the initscripts package to remove the call to dkms_autoinstaller from the mandrake-everytime script, but it's not being applied because it's listed as Source106, not Patch106.
Martin Whitaker 2017-03-12 12:15:02 CET

Depends on: (none) => 17194

Comment 37 Doug Laidlaw 2017-03-12 12:40:56 CET
"it's listed as Source106, not Patch106."  I assume that you are referring to the sources I see in an SRPM.

I was in this only to explain that every 32-bit installation gets the "server" kernel."  I have no issue with the main bug, and am unsubscribing.
Doug Laidlaw 2017-03-12 12:41:18 CET

CC: laidlaws => (none)

Comment 38 Maurice Batey 2017-03-16 18:26:00 CET
> I can't boot either. This is the sta2 Classical that has just been released.
>...
> The bootup ... hangs at "Checking for new hardware."

Same here:

 Aiming to retrieve the nVidia driver, rebooting showed:

"Kernel development file for 4.9.14-1 kernel-desktop not found."

-so installed it from MCC and rebooted:
   Result: "nvidia (340.102-1.mga6.nonfree): Installing module
                ..............
                ....  (bad exit status:5)
              "Checking for new h/w etc.etc then FROZE

Rebooted:  "nvidia (340.102-1.mga6.nonfree):  Already installed in the kernel!
               Checking for new h/w  then FROZE
    Used TTY2 to do 188-package Cauldron update

Rebooted: (No mention of nvidia...)   
      "Checking for new h/w etc.etc" then FROZE

etc, etc,

CC: (none) => maurice

Comment 39 Thomas Andrews 2017-03-25 00:21:33 CET
This bug has been fixed for the x86_64 nvidia340 driver on my hardware, as of the March 23 Classical DVD RC test isos. 

I still have to try my nvidia 304 machine for the same bug, but I will be unable to do that before tomorrow at the earliest.
Comment 40 Thomas Andrews 2017-03-25 16:54:39 CET
This bug has been fixed on my nvidia304 hardware (server kernel) in both 32-bit and 64-bit, as of the March 23 Classical rc test isos.

I'm tempted to mark it "Resolved," but I think it needs more tests from the others who have commented first.
Comment 41 papoteur 2017-03-26 12:57:48 CEST
I just redo an installation with DVD classical  bits. The driver is nvidia304.
I chose it during installation.
After reboot, nividia304 is used in Plasma and in IceWM.
Thus this bug is solved.
But (it should always have a but) the Plasma session is unusable, I think plasma-shell is crashing.
IceWM is usable, but I get lot of bad displaying (is it what is named tearing?)
I tried Xrender instead of Opengl 2.0 as Compositer, nothing is changed.
Comment 42 papoteur 2017-03-27 18:03:27 CEST
Thus we can close this one.
The bad displaying is another bug.

Status: NEW => RESOLVED
Resolution: (none) => FIXED


Note You need to log in before you can comment on or make changes to this bug.