Bug 15288 - Enabling an nvidia discrete GPU with bbswitch/bumblebee causes hard freezes with kernel 3.19.0
Summary: Enabling an nvidia discrete GPU with bbswitch/bumblebee causes hard freezes w...
Status: RESOLVED FIXED
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: Cauldron
Hardware: All Linux
Priority: High normal
Target Milestone: ---
Assignee: Mageia Bug Squad
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-02-14 02:05 CET by Rémi Verschelde
Modified: 2015-02-24 16:53 CET (History)
3 users (show)

See Also:
Source RPM: kernel-desktop-3.19.0-1.mga5
CVE:
Status comment:


Attachments
/var/log/dmesg (65.47 KB, text/plain)
2015-02-14 02:09 CET, Rémi Verschelde
Details
/var/log/Xorg.0.log (21.64 KB, text/plain)
2015-02-14 02:11 CET, Rémi Verschelde
Details
/var/log/Xorg.8.log (11.20 KB, text/plain)
2015-02-14 02:13 CET, Rémi Verschelde
Details
/var/log/syslog (131.01 KB, text/plain)
2015-02-14 02:14 CET, Rémi Verschelde
Details

Description Rémi Verschelde 2015-02-14 02:05:17 CET
Since kernel 3.19.0 (RC7, but also final) landed into cauldron, I am having issues with bbswitch/bumblebee and the nvidia-current nonfree driver.

Running simple commands like "optirun glxspheres64" causes a hard freeze (but not reliably, it does it most of the time but it also worked fine from time to time, though the discrete GPU could not be turned off anymore in such cases).

I also reported this issue upstream on bumblebee's GitHub, since the issue might be a compatibility issue with kernel 3.19.0 on their side: https://github.com/Bumblebee-Project/Bumblebee/issues/632

Reproducible: 

Steps to Reproduce:
Comment 1 Rémi Verschelde 2015-02-14 02:07:40 CET
CC'ing Thomas, I sincerely hope that you can help me debug this because this is a huge regression for optimus users (at least for those that want to use their Nvidia GPU) a couple of weeks before the release.

I'll attach relevant logs.

Priority: Normal => High
CC: (none) => tmb

Comment 2 Rémi Verschelde 2015-02-14 02:09:34 CET
Created attachment 5913 [details]
/var/log/dmesg

The "ACPI Warning" and "NVRM" entries seem particularly relevant to my issue, though I don't know how to interpret them.
Comment 3 Rémi Verschelde 2015-02-14 02:11:35 CET
Created attachment 5914 [details]
/var/log/Xorg.0.log

Might not be that relevant, bumblebee spawns applications on display :8 IIUC.
Comment 4 Rémi Verschelde 2015-02-14 02:13:38 CET
Created attachment 5915 [details]
/var/log/Xorg.8.log

Some interesting errors in this one. AFAIU it represents what happened when I started "optirun -b virtualgl glxspheres64" and the computer froze.
Comment 5 Rémi Verschelde 2015-02-14 02:14:33 CET
Created attachment 5916 [details]
/var/log/syslog

And last, the syslog snippet corresponding to this session.
Thomas Backlund 2015-02-14 08:10:52 CET

Attachment 5913 mime type: application/octet-stream => text/plain

Comment 6 Thomas Backlund 2015-02-14 08:35:07 CET
(In reply to Rémi Verschelde from comment #2)
> Created attachment 5913 [details]
> /var/log/dmesg
> 
> The "ACPI Warning" and "NVRM" entries seem particularly relevant to my
> issue, though I don't know how to interpret them.


I'd say Acpi warning is a red herring in this case.
The NVRM messages about conflicting drivers should be mostly harmless as the system usually is capable of unloading the conflicting ones when needed


(In reply to Rémi Verschelde from comment #4)
> Created attachment 5915 [details]
> /var/log/Xorg.8.log
> 
> Some interesting errors in this one. AFAIU it represents what happened when
> I started "optirun -b virtualgl glxspheres64" and the computer froze.

This one seems more relevant:

(EE) /dev/dri/card0: failed to set DRM interface version 1.4: Permission denied


(In reply to Rémi Verschelde from comment #5)
> Created attachment 5916 [details]
> /var/log/syslog
> 
> And last, the syslog snippet corresponding to this session.

And here bbswitch tried to register an already registered driver:

Feb  9 12:16:57 localhost kernel: [  148.152022] bbswitch: enabling discrete graphics
Feb  9 12:16:58 localhost kernel: [  148.867350] ------------[ cut here ]------------
Feb  9 12:16:58 localhost kernel: [  148.867358] WARNING: CPU: 3 PID: 5614 at fs/proc/generic.c:372 proc_register+0x135/0x1c0()
Feb  9 12:16:58 localhost kernel: [  148.867360] proc_dir_entry 'driver/nvidia' already registered
Thierry Vignaud 2015-02-14 14:54:01 CET

CC: (none) => mageia, thierry.vignaud

Comment 7 Rémi Verschelde 2015-02-17 14:32:31 CET
Upstream says that my packaging is a bit funky:

> Your configuration looks very unusual. Normally nouveau kernel module is
> blacklisted, and nvidia kernel module is not loaded before bbswitch. In your
> syslog, both nouveau and nvidia modules are loaded before bumblebee loads
> bbswitch. I don't know what specifically can trigger freezes, but your
> situation is a minefield, so I strongly recommend you to look into why modules
> are loaded automatically rather than through optirun/bumblebeed, and fix that.

It used to work until now, but maybe something changed in the way modules are loaded that make that my bumblebee package does not do what's needed? Should I blacklist nouveau in the bumblebee-nvidia flavour?
Comment 8 Rémi Verschelde 2015-02-24 16:46:34 CET
I added this file to the bumblebee package, and it seems to solve the hard freezes:
$ cat /etc/modprobe.d/bumblebee.conf 
blacklist nvidia-current
blacklist nouveau

Now the remaining issue is that once started, the Nvidia GPU can't be powered off anymore since the nvidia module stays in use even when the process that was using it gets killed.

Severity: critical => normal

Comment 9 Rémi Verschelde 2015-02-24 16:47:40 CET
I'll close this one as fixed for now, I will see if another bug report is needed for the issue mentioned in comment 8.
Comment 10 Rémi Verschelde 2015-02-24 16:53:44 CET
As per above comment.

Status: NEW => RESOLVED
Resolution: (none) => FIXED


Note You need to log in before you can comment on or make changes to this bug.