Bug 12893 - System won't boot up after recent kernel and Nvidia updates (plymouth problem)
Summary: System won't boot up after recent kernel and Nvidia updates (plymouth problem)
Status: RESOLVED OLD
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: 4
Hardware: All Linux
Priority: Normal critical
Target Milestone: ---
Assignee: Anssi Hannula
QA Contact:
URL:
Whiteboard:
Keywords: NEEDINFO
Depends on:
Blocks:
 
Reported: 2014-02-27 10:42 CET by Sascha Schroeder
Modified: 2015-10-27 06:57 CET (History)
12 users (show)

See Also:
Source RPM:
CVE:
Status comment:


Attachments
depmod: do not allow partial matches with "search" directive (1.25 KB, patch)
2014-03-19 00:31 CET, Anssi Hannula
Details | Diff
troubleshooting data requested in comment 16 (25.16 KB, application/gzip)
2014-04-07 23:20 CEST, Mike Rambo
Details
troubleshooting data requested in comment 19 (20.16 KB, application/gzip)
2014-04-11 01:25 CEST, Mike Rambo
Details
Workaround for bug 12893 (25.45 KB, application/gzip)
2014-04-12 18:03 CEST, Mike Rambo
Details

Description Sascha Schroeder 2014-02-27 10:42:14 CET
Description of problem:

After the recent updates (today) of kernel 3.12.13 and Nvidia 331.49 the system will not boot up anymore.

There is only one fail in total which says "failed to start plymouth to quit" or something at boot. It's a known Fedora bug.

Booting into console and "dracut -f" didn't help! 

Fix this please.

Right now I can boot into 3.12.9 and see Nvidia 331.49 working also, but this Nvidia driver isn't working with newest kernel it seems.

Version-Release number of selected component (if applicable):
Kernel 3.12.13
Nvidia 331.49

How reproducible:


Steps to Reproduce:
1. install recent kernel updates
2. also Nvidia is updated automatically to 331.49
3. reboot
4. one "failed" message appears regarding plymouth boot screen
5. system hangs here, won't boot up
6. even booting into console and doing "dracut -f" didn't help


Reproducible: 

Steps to Reproduce:
Comment 1 Thomas Backlund 2014-02-27 11:07:42 CET
try to remove "splash quiet" from boot options

CC: (none) => tmb

Comment 2 Henrik Christiansen 2014-02-27 13:04:03 CET
I've had that to on a laptop(x586).
Changed xorg.conf to nv driver and it booted.
Had to uninstall/install all nvidia again to make nvidia work again.

CC: (none) => hc

Comment 3 claire robinson 2014-02-27 14:15:43 CET
With 'splash quite' removed from boot options it will show you what is going on.

Aalso when it reaches the point at which it fails, see if you are able to use ctrl-alt-f2 to get to a tty and if so log in as root and try to recover some logs.

journalctl -b -a --no-pager > journal.txt

Also /var/log/xorg.0.log might be useful.

CC: (none) => eeeemail

Comment 4 claire robinson 2014-02-27 14:16:23 CET
sorry, Xorg.0.log
Comment 5 Sascha Schroeder 2014-02-27 14:35:00 CET
I was able to follow the hint of Henrik: I uninstalled everything I had with "nvidia" in it and reconfigured anything again with

XFdrake

then rebooted.

Now it seems to run again, nothing to remove from grub.

But it's a bug, isn't it??
Comment 6 Maurice Batey 2014-03-02 16:20:42 CET
I hit the 'hung boot' problem yesterday, in the following circumstances:

On 64-bit Mageia-4, installed MCC's Virtualbox (4.3.6). This installed various kernels, then invited me to reboot.
  On reboot, I was told that the system had reverted to the Nouveau nVidia driver, so I went into MCC/Hardware/Graphics and finessed the proprietary driver, which was then installed.
  However, all reboots after that stalled after this sequence:

 --------------------------------------------
 [OK] Started LSB: Network monitoring daemon
 [FAILED] Failed to detect Wait for Plymouth Boot Screen to quit
         See ... for details.
 [OK] Reached target Multi-User System
 [OK] Reached target Graphical Interface
 --------------------------------------------

Recovered by booting Mageia-4 in Safe Mode, entered Root password, and running /usr/bin/XFdrake, which allowed me to change the graphics settings to 'Nouveau', after which I could boot into the system normally.

Then got MCC to remov all the nVidia stuff and re-finessed the nVidia proprietary driver in MCC/Hardware/Graphics.  
  Now all OK, and VirtualBox still works...

But until this is sorted out properly there will perhaps be many nVidia users who will try Mageia-4 and abandon it after being hit with this problem...

CC: (none) => maurice

Comment 7 Sascha Schroeder 2014-03-09 15:45:32 CET
Mageia 4 is a complete mess:

* tearing everywhere, even with official Nvidia drivers
* different resolutions by apps/games/etc. mess up my home screen symbols when quiting these programs
* KDE system settings is missing the normal screen adjustment window. I can not get it back, no matter what I do
* every second time I log out and back in again or reboot, KDE doesn't get it should use oxygen everywhere -> 
this leads to GTK elements inside all of the programs, e.g. installation window, Firefox, Thunderbird, etc. You always recognize this when your mouse cursor changed. I don't know where the system is getting this information, it's just wrong. When hovering over the task bar or window frame, you see your mouse cursor YOU selected, everything else is messed up with this Gnome stuff

So far Mageia 4 is the worst ever. Sorry guys, this is simply not cool.
Comment 8 egc 2014-03-16 03:19:31 CET
(In reply to Sascha Schroeder from comment #7)
> Mageia 4 is a complete mess:
>
Unfortunately that's true concerning nvidia cards ... i get a black screen every second time the system wakes up from hibernate.

CC: (none) => egc

Comment 9 Mike Rambo 2014-03-18 13:09:30 CET
While the cause is still unknown afaik, doktor5000 and trex78 helped identify (what I understand to be) extra packages being installed that break the nvidia installation. A thread in the advanced support forum area has <a href="https://forums.mageia.org/en/viewtopic.php?f=8&t=7129">details</a> - at least regarding the problem cited by the OP in this bug. I have not yet experienced any of the other problems above.

The bottom line from that forum thread is that having the nvidia-*-kernel packages installed alongside x11-driver-video-nvidia-current-331.49-1.mga4.nonfree
and dkms-nvidia-current-331.49-1.mga4.nonfree results in broken nvidia. There were three nvidia-*-kernel packages that XFdrake force to be installed that, if left installed, resulted in the breakage. Manually removing them any time XFdrake installs them (and leaving only the x11-driver and dkms) packages listed above prevents this breakage.

In my case switching to the nouveau driver also restored operation although there was then a warning at every boot about a possible conflict between the driver X was using and an already existing kernel driver.

I have a system which, at least at this point, very reliably produces this breakage if testing and reproducibility is a problem.

Thanks to doktor5000 and trex78 for helping me figure out what to do to restore operation.

CC: (none) => mrambo

Florian Hubold 2014-03-18 22:19:23 CET

CC: (none) => doktor5000

Comment 10 Anssi Hannula 2014-03-19 00:31:48 CET
Created attachment 5064 [details]
depmod: do not allow partial matches with "search" directive

I'd say this is a kmod bug, depmod does not respect ordering specified in /etc/depmod.d/dkms.conf, attached patch sent upstream.

CC: (none) => anssi.hannula

Comment 11 Anssi Hannula 2014-03-27 01:34:54 CET
This should be fixed in kmod-12-2.2.mga3 and kmod-15-2.2.mga4, which are now available in the Mageia 3 and Mageia 4 core/updates_testing media.

See https://wiki.mageia.org/en/Enabling_the_Testing_media if you want to test it.

For the record, the fix was accepted upstream for the next version of kmod.

Suggested advisory:
============
The depmod tool in Mageia 3 and Mageia 4 contains a bug which causes, under some circumstances, an installed "X-kernel-version" package (e.g. "nvidia-current-kernel-3.12.13-desktop-2.mga4") of an older version to override a newer installed "dkms-X" package (e.g. "dkms-nvidia-current").

If this happens with the proprietary NVIDIA driver, for example, the system will not boot up properly as the X server startup will fail due to a kernel module version mismatch.

This update fixes that issue.

References:
https://bugs.mageia.org/show_bug.cgi?id=12893
https://github.com/cshorler/hal-flash
============

Uploaded to mga3+mga4 nonfree/updates_testing:

Source packages:
kmod-12-2.2.mga3
kmod-15-2.2.mga4

Binary packages:
kmod-12-2.2.mga3
lib(64)kmod2-12-2.2.mga3
lib(64)kmod-devel-12-2.2.mga3
kmod-15-2.2.mga4
lib(64)kmod2-15-2.2.mga4
lib(64)kmod-devel-15-2.2.mga4


Testing information:
===============
1. Install both "dkms-nvidia-current" and a "nvidia-current-kernel-CURRENTKERNELVERSION" packages, and make sure the installation succeeds. You also need the kernel-devel-CURRENTKERNELVERSION package installed beforehand. You can determine CURRENTKERNELVERSION by running "uname -r".
2. Check the first line of output of "modinfo nvidia-current" command.
WRONG output:
filename:       /lib/modules/X/dkms-binary/Y
CORRECT output:
filename:       /lib/modules/X/dkms/Y

Note that even the old kmod package may have CORRECT output, there is a 50% chance of reproducing the issue and running the command repeatedly will not change the result.
===============

Whiteboard: (none) => MGA3TOO
Status: NEW => ASSIGNED
Hardware: x86_64 => All
Assignee: bugsquad => qa-bugs
Source RPM: (none) => kmod

Comment 12 Anssi Hannula 2014-03-27 01:37:33 CET
(In reply to Anssi Hannula from comment #11)
> https://github.com/cshorler/hal-flash

The above line doesn't belong in the advisory, a copy-paste mistake.
Comment 13 user7 2014-03-27 11:16:05 CET
Sascha Schroeder: I had the same problem with my mouse cursor. I fixed it by changing the mouse cursor theme in KDE system settings to a different one, then changing it back to oxygen. After a reboot my mouse cursor now stays the oxygen cursor, in all applications. That being said, thank you for reporting bugs but please try to open a different bug report for each bug (or ask on the forums for help first). Otherwise bug reports will get messy and it will be harder to solve the problems. Thanks.

CC: (none) => wassi

Marcello Anni 2014-03-27 15:57:11 CET

CC: (none) => marcello.anni

Comment 14 Sascha Schroeder 2014-03-27 18:57:39 CET
(In reply to user7 from comment #13)
> Sascha Schroeder: I had the same problem with my mouse cursor. I fixed it by
> changing the mouse cursor theme in KDE system settings to a different one,
> then changing it back to oxygen. After a reboot my mouse cursor now stays
> the oxygen cursor, in all applications. That being said, thank you for
> reporting bugs but please try to open a different bug report for each bug
> (or ask on the forums for help first). Otherwise bug reports will get messy
> and it will be harder to solve the problems. Thanks.

Tried this, over and over again when encountering the problem. This didn't resolve my issue. What I did was a complete reinstall and right now I don't have this issue anymore. I think something inside KDE messed up, I don't know.

And of course you are right with your last statement, I totally agree with that. I apologize. I will open new tickets when I have the time. Regards
Comment 15 Mike Rambo 2014-03-29 21:04:14 CET
I just tried the process outlined in comment 11. It didn't help in my case.

I installed the updated kmod files specified. Rebooted (just in case).

Used urpme to remove the files specified below that have been implicated in this problem (this while the system was running with the nouveau driver).

Ran XFdrake to reconfigure for the proprietary driver. This reinstalled the five files mentioned above and specified below.

Rebooted to find X dead again.

Following are the files seeming implicated in the problem as installed on my system when I ran XFdrake. The thing I notice is that two of them specify 331.49-2 and the rest 331.49-1. If this means what I took it to mean it is opposite what you mentioned in comment 11 (which was an older nvidia-current-kernel version with a newer dkms version). Here the dkms appears older than the nvidia-current-kernel. Or perhaps I'm over my head...

nvidia-current-doc-html-331.49-1.mga4.nonfree
nvidia-current-kernel-3.12.13-desktop-2.mga4-331.49-2.mga4.nonfree
nvidia-current-kernel-desktop-latest-331.49-2.mga4.nonfree
x11-driver-video-nvidia-current-331.49-1.mga4.nonfree
dkms-nvidia-current-331.49-1.mga4.nonfree
Comment 16 Anssi Hannula 2014-03-30 01:06:26 CET
(In reply to Mike Rambo from comment #15)

Thanks for testing. Unfortunately it seems something else is wrong on your system, let's try to figure it out.

First, enable display driver helper debugging by running the following command as root:
sed -i "s,^# DEBUG,DEBUG," /sbin/display_driver_helper

This will take effect starting from the next boot.

Now, get to the problematic situation and try to boot up the system - X will not start if your issue persists.

Then, please run the following commands as root user in this order, and attach boot.txt and ddh_debug.txt here:
journalctl -axb > boot.txt
cat /dev/ddh_debug > ddh_debug.txt

Then, please provide the output of the following commands when the system is in the failed state (run in this order, please):
cat /proc/cmdline
update-alternatives --display gl_conf
dkms status
cat /etc/modprobe.d/display-driver.conf
modinfo nvidia-current
modprobe nvidia
dmesg | tail

Also attach your /etc/X11/xorg.conf and /var/log/Xorg.0.log. I know you have provided some of this already, but it is better that every output and log is from the same boot, just in case.

I expect that ddh_debug.txt may contain the most important information in this case, but better gather too many logs than too few..

Assignee: qa-bugs => anssi.hannula

Comment 17 Anssi Hannula 2014-03-30 20:00:03 CEST
(In reply to Anssi Hannula from comment #11)
I've opened bug #13119 for the kmod update, since it was actually an unrelated bug.

See comment #16 for required information for solving this bug.

Keywords: (none) => NEEDINFO
Source RPM: kmod => (none)
Whiteboard: MGA3TOO => (none)

Comment 18 Mike Rambo 2014-04-07 23:20:50 CEST
Created attachment 5105 [details]
troubleshooting data requested in comment 16

Here is the data requested in comment 16. I did not notice that 'modprobe nvidia' did not return anything (I was just redirecting to files) until after I undid the video driver and rebooted to upload the data. Of course, by then it was too late to make sure I typed it correct or do any verification.

Thanks
Comment 19 Anssi Hannula 2014-04-08 13:34:40 CEST
Unfortunately nothing seemed off in the logs.

Next, please give me the output of these two commands:
journalctl -axb > boot.txt
ls -l /dev/nvidia*

This time from both unsuccessful NVIDIA boots and successful NVIDIA boots.
Comment 20 Mike Rambo 2014-04-11 01:25:44 CEST
Created attachment 5109 [details]
troubleshooting data requested in comment 19

This is the data requested (comment 19) with a failed nvidia boot (dead X). I cannot give you data from a sucessful nvidia boot as there have been no sucessful nvidia boots since the updates installed on March 2. I referenced a thread on the forums in my first comment above that has a few details. I don't recall everything in that set of updates but do know there was mariadb stuff and both a new kernel and nvidia package. The proprietary driver has not worked on this system since that batch of updates. With the nouveau driver X will work but there are no /dev/nvidia* files though you probably already know that.
egc 2014-04-12 01:44:26 CEST

CC: egc => (none)

Comment 21 Mike Rambo 2014-04-12 18:03:29 CEST
Created attachment 5111 [details]
Workaround for bug 12893

It looks like this may not be a Mageia problem but rather something with the new nvidia driver that was a part of that update March 2.

I manually downloaded and installed...

x11-driver-video-nvidia-current-325.15-1.mga4.nonfree
nvidia-current-doc-html-325.15-1.mga4.nonfree
dkms-nvidia-current-325.15-1.mga4.nonfree

I configured the nvidia driver with XFdrake. It installed two more packages...

nvidia-current-kernel-desktop-latest-331.49-2.mga4.nonfree
nvidia-current-kernel-3.12.13-desktop-2.mga4-331.49-2.mga4.nonfree

which I immediately removed.

After rebooting the nvidia driver works - at least so far. The newest tarball I supplied has this information plus some of the other information you have asked for in the past. At present my system currently appears to be working as it should. I have, however, already gotten update notifications to update the three packages with 325.15 up to their newer 331.49 variants. I'll have to look for a way to tell urpmi not to update these packages (which I at present do not know).

Thanks for the help. Let me know if you need anything more.
Comment 22 Fabrice DANT 2014-07-02 20:21:51 CEST
I have the same problem with the latest 331.49 Nvidia driver version. Exactly as it is described up there. Since I made this driver update, both my laptop and desktop can't boot.
And also (since I upgraded to MGA4), just like "egc" had mentioned on comment 8: "Unfortunately that's true concerning nvidia cards ... i get a black screen every second time the system wakes up from hibernate."

It would be really appreciable to solve this problem without any command line, because it is absolutely not user friendly. Maybe an automatic uninstallation process of the previous drivers would prevent this kind of bug, because I had this problem several times now.

I switched to Nouveau drivers, but I can't play with Steam (Metro Last Light) anymore. I'm a little tired to manually uninstall, then install again these drivers each time this problem appears.

CC: (none) => fabricedant

Comment 23 Gerald 2015-02-05 14:22:47 CET
Same problem here (and i found this bug also in Mageia 5).

Since Mageia 3 is no longer supported, and all delays/problems with Mageia 5, I decided to install Mageia 4 today.

After a huge number of updates (urpmi --auto --auto-update) and a restart, the message "Failed to start Wait for Plymouth Boot Screen to Quit" appeared and the boot hangs. In Xorg.0.log I found: "NVIDIA: Failed to load the NVIDIA kernel module".

This bug report is almost a year old now, it is a critical bug, and the last months nothing seems to happen to fix it .....

This is very disappointing. Does nobody care to fix this? As Maurice Batey warned before: "But until this is sorted out properly there will perhaps be many nVidia users who will try Mageia-4 and abandon it after being hit with this problem..."

CC: (none) => g.sprik

Comment 24 claire robinson 2015-02-05 14:35:13 CET
You're probably experiencing a different issue.

See bug 14990 and try the command given in bug 14990 comment 5

The depmod line given is for server kernel so adapt for whichever you are using.
Comment 25 Pierre Jarillon 2015-02-06 15:32:38 CET
I have this problem with the latest update: "Failed to start Wait for Plymouth Boot to Quit."

Same if I try to boot with oldest kernels.
Plymouth is not started.

CC: (none) => jarillon

Comment 26 Samuel Verschelde 2015-09-21 13:20:15 CEST
Mageia 4 changed to end-of-life (EOL) status on 2015-09-19. It is is no longer 
maintained, which means that it will not receive any further security or bug 
fix updates.

Package Maintainer: If you wish for this bug to remain open because you plan to 
fix it in a currently maintained version, simply change the 'version' to a later 
Mageia version.

Bug Reporter: Thank you for reporting this issue and we are sorry that we weren't 
able to fix it before Mageia 4's end of life. If you are able to reproduce it 
against a later version of Mageia, you are encouraged to click on "Version" and 
change it against that version of Mageia. If it's valid in several versions, 
select the highest and add MGAxTOO in whiteboard for each other valid release.
Example: it's valid in cauldron and Mageia 5, set to cauldron and add MGA5TOO.

Although we aim to fix as many bugs as possible during every release's lifetime, 
sometimes those efforts are overtaken by events. Often a more recent Mageia 
release includes newer upstream software that fixes bugs or makes them obsolete.

If you would like to help fixing bugs in the future, don't hesitate to join the
packager team via our mentoring program [1] or join the teams that fit you 
most [2].

[1] https://wiki.mageia.org/en/Becoming_a_Mageia_Packager
[2] http://www.mageia.org/contribute/
Comment 27 Marja Van Waes 2015-10-27 06:57:31 CET
As announced over a month ago, Mageia 4 changed to end-of-life (EOL) status on 2015-09-19. It is is no longer maintained, which means that it will not receive any further security or bug fix updates.

This issue may have been fixed in a later Mageia release, so, if you still see it and didn't already do so: please upgrade to Mageia 5 (or, if you read this much later than this is written: make sure you run a currently maintained Mageia version)

If you are able to reproduce it against a maintained version of Mageia, you are encouraged to 
1. reopen this bug report, by changing the "Status" from "RESOLVED - OLD" to "REOPENED"
2. click on "Version" and change it against that version of Mageia. If you know it's valid in several versions, select the highest and add MGAxTOO in whiteboard for each other valid release.
Example: it's valid in cauldron and Mageia 5, set to cauldron and add MGA5TOO.
3. give as much relevant information as possible. If you're not an experienced bug reporter and have some time: please read this page:
https://wiki.mageia.org/en/How_to_report_a_bug_properly

If you see a similar issue, but are _not_sure_ it is the same, with the same cause, then please file a new bug report and mention this one in it (please include the bug number, too). 


If you would like to help fixing bugs in the future, don't hesitate to join the
packager team via our mentoring program [1] or join the teams that fit you 
most [2].
[1] https://wiki.mageia.org/en/Becoming_a_Mageia_Packager
[2] http://www.mageia.org/contribute/

Status: ASSIGNED => RESOLVED
Resolution: (none) => OLD


Note You need to log in before you can comment on or make changes to this bug.