Bug 9216 - Nvidia driver not working and nouveau failing
Summary: Nvidia driver not working and nouveau failing
Status: RESOLVED DUPLICATE of bug 8773
Alias: None
Product: Mageia
Classification: Unclassified
Component: Installer (show other bugs)
Version: Cauldron
Hardware: i586 Linux
Priority: release_blocker major
Target Milestone: ---
Assignee: Mageia Bug Squad
QA Contact:
URL:
Whiteboard:
Keywords: PATCH
Depends on:
Blocks:
 
Reported: 2013-03-01 09:31 CET by Alejandro Vargas
Modified: 2013-03-03 12:48 CET (History)
4 users (show)

See Also:
Source RPM:
CVE:
Status comment:


Attachments
service_harddrake with nvidia304.ko (18.54 KB, text/plain)
2013-03-01 23:20 CET, Luc Menut
Details

Description Alejandro Vargas 2013-03-01 09:31:50 CET
Description of problem:

I had Mageia work working without problems in my PC. Just downloaded and installed Mageia 3 beta2. Updated from the installer. No extra packets installed.

My video card is:

00:0d.0 VGA compatible controller: NVIDIA Corporation C61 [GeForce 6150SE nForce 430] (rev a2)

At boot.log I see:

nvidia304 (304.64-4.mga3.nonfree): Already installed on this kernel.
Verificando si hay hardware nuevoUsing `/etc/ld.so.conf.d/GL/standard.conf' to provide `gl_conf'.

But the screen shows a popup that says: "The display driver has been automatically switched to 'noveau'. _Reason: can't find the propietary kernel driver for X.org 'nvidia'. 

lsmod does not show any nvidia module but the rpms are installed:

# rpm -qa |grep nvidia
x11-driver-video-nvidia304-304.64-4.mga3.nonfree
nvidia304-doc-html-304.64-4.mga3.nonfree
nvidia304-kernel-3.8.0-server-3.mga3-304.64-14.mga3.nonfree
dkms-nvidia304-304.64-4.mga3.nonfree
nvidia304-kernel-desktop-latest-304.64-8.mga3.nonfree
nvidia304-kernel-server-latest-304.64-8.mga3.nonfree
nvidia304-kernel-3.8.0-desktop-0.rc4.1.mga3-304.64-8.mga3.nonfree
nvidia304-kernel-3.8.0-server-0.rc4.1.mga3-304.64-8.mga3.nonfree


Second problem: nouveau driver hangs the system. If you selects OK to use nouveau, the kdm login screen appears OK, but when you log-in the screen becomes diagonal blue bars and system hangs (no numlock/capslock working, no ctrl+alt+backspace working).


How reproducible:

Just install Mageia 3 beta2 on a system with a GeForce 6150SE



Reproducible: 

Steps to Reproduce:
Comment 1 Alejandro Vargas 2013-03-01 10:33:49 CET
New updates today (kernel and modules) the problem stills there:

# rpm -qa |grep -i nvidia

x11-driver-video-nvidia304-304.64-4.mga3.nonfree
nvidia304-doc-html-304.64-4.mga3.nonfree
nvidia304-kernel-3.8.0-desktop-3.mga3-304.64-14.mga3.nonfree
nvidia304-kernel-3.8.0-server-3.mga3-304.64-14.mga3.nonfree
dkms-nvidia304-304.64-4.mga3.nonfree
nvidia304-kernel-server-latest-304.64-14.mga3.nonfree
nvidia304-kernel-desktop-latest-304.64-14.mga3.nonfree
nvidia304-kernel-3.8.0-server-0.rc4.1.mga3-304.64-8.mga3.nonfree

# modprobe nvidia
modprobe: FATAL: Module nvidia not found.
Comment 2 Alejandro Vargas 2013-03-01 10:35:35 CET
More tests:

# dkms_autoinstaller restart

nvidia304 (304.64-4.mga3.nonfree): Already installed on this kernel.

# modprobe nvidia304
modprobe: ERROR: could not insert 'nvidia304': No such device
Comment 3 Sander Lepik 2013-03-01 10:48:09 CET
Which grub version are you using? Does that grub have nokmsboot added for current kernel?

Keywords: (none) => NEEDINFO
CC: (none) => sander.lepik

Comment 4 Alejandro Vargas 2013-03-01 11:20:19 CET
(In reply to Sander Lepik from comment #3)
> Which grub version are you using? 

The grub is what was installed by the mageia 3 installer:

# rpm -qa |grep grub
grub-doc-0.97-38.mga3
grub-0.97-38.mga3

> Does that grub have nokmsboot added for
> current kernel?

Where can I check this? At menu.lst?


# cat /boot/grub/menu.lst

timeout 10
color black/cyan yellow/cyan
gfxmenu (hd0,1)/gfxmenu
default 0

title linux
kernel (hd0,1)/vmlinuz BOOT_IMAGE=linux root=/dev/vg0/raiz splash quiet resume=UUID=e65dc753-094e-452e-9f71-4957a62ab616 vga=788
root (hd0,1)
initrd /initrd.img

title linux-nonfb
kernel (hd0,1)/vmlinuz BOOT_IMAGE=linux-nonfb root=/dev/vg0/raiz resume=UUID=e65dc753-094e-452e-9f71-4957a62ab616
root (hd0,1)
initrd /initrd.img

title failsafe
kernel (hd0,1)/vmlinuz BOOT_IMAGE=failsafe root=/dev/vg0/raiz nokmsboot failsafe
root (hd0,1)
initrd /initrd.img

title server 3.8.0rc4-1.mga3
kernel (hd0,1)/vmlinuz-3.8.0-server-0.rc4.1.mga3 BOOT_IMAGE=server_3.8.0rc4-1.mga3 root=/dev/vg0/raiz splash quiet resume=UUID=e65dc753-094e-452e-9f71-4957a62ab616 vga=788
root (hd0,1)
initrd /initrd-3.8.0-server-0.rc4.1.mga3.img

title server 3.8.0-3.mga3
kernel (hd0,1)/vmlinuz-3.8.0-server-3.mga3 BOOT_IMAGE=server_3.8.0-3.mga3 root=/dev/vg0/raiz splash quiet resume=UUID=e65dc753-094e-452e-9f71-4957a62ab616 vga=788
root (hd0,1)
initrd /initrd-3.8.0-server-3.mga3.img

title desktop 3.8.0-3.mga3
kernel (hd0,1)/vmlinuz-3.8.0-desktop-3.mga3 BOOT_IMAGE=desktop_3.8.0-3.mga3 root=/dev/vg0/raiz splash quiet resume=UUID=e65dc753-094e-452e-9f71-4957a62ab616 vga=788
root (hd0,1)
initrd /initrd-3.8.0-desktop-3.mga3.img
Comment 5 Sander Lepik 2013-03-01 11:41:24 CET
So only failsafe option seems to have nokmsboot. What if you add it manually for "title linux" option as well?
Comment 6 Alejandro Vargas 2013-03-01 11:54:22 CET
(In reply to Sander Lepik from comment #5)
> So only failsafe option seems to have nokmsboot. What if you add it manually
> for "title linux" option as well?

Ok, changed this:

title linux
kernel (hd0,1)/vmlinuz BOOT_IMAGE=linux root=/dev/vg0/raiz splash quiet resume=UUID=e65dc753-094e-452e-9f71-4957a62ab616 vga=788 nokmsboot

(added nokmsboot)

Executed drakx11 and answered "yes" to use propietary driver. Changed again "X at startup" (that is unchecked as default, reported at https://bugs.mageia.org/show_bug.cgi?id=9218 ) and rebooted.

Result: nothing changed: graphis mode starts ok, framebuffer changes the screen resolution, then the window indicating "the display driver has been automatically switched to 'nouveau'.

# dkms_autoinstaller restart

nvidia304 (304.64-4.mga3.nonfree): Already installed on this kernel.

# modprobe nvidia304
modprobe: ERROR: could not insert 'nvidia304': No such device
Comment 7 Alejandro Vargas 2013-03-01 12:01:34 CET
By the way, after selecting nouveau, kdm login works ok but once you log-in the system hangs (even ssh is dead). 

Resetting produces the same results, but after 2 o 3 tries, login works and I am able to log-in in KDE. But selecting again nvidia propietary drivers, the "cycle" starts again.
Comment 8 Sander Lepik 2013-03-01 12:04:55 CET
Is nokmsboot still present in menu.lst? I'm afraid that reconfiguring X is removing it and that's the problem. Check menu.lst again. If nokmsboot is missing then add it back and just reboot. Don't try to reconfigure X again, it will probably remove it. Or first reconfigure X to use nvidia and before rebooting check for nokmsboot and add it if needed.
Comment 9 Sander Lepik 2013-03-01 12:07:54 CET
Also try to add nokmsboot right after root=/dev/vg0/raiz - I'm not 100% sure but I kinda remember that the position of this parameter was important and adding it in the end might not work..
Comment 10 Alejandro Vargas 2013-03-01 13:26:06 CET
(In reply to Sander Lepik from comment #8)
> Is nokmsboot still present in menu.lst? I'm afraid that reconfiguring X is
> removing it and that's the problem. Check menu.lst again. If nokmsboot is

You are right. I reconfigured X with drakx11, then changed menu.lst and rebooted but with the same result. 

Also, when I chec, the menu.lst again, the nokmsboot was removed. I think it has been removed by the program that switches to nouveau.

I checked again and see that drakx11 added the nokmsboot automatically to menu.lst.

I think the problem is the propietary driver can't be loaded because it is conflicting with another driver. 

I am trying to disable dm from startup, but chkconfig says it is not a service... where is it started? What happend to rc scripts? and rc.local, I frequently use it!! ?? Well... I will need to disable it using drakx11 (that disables x11 by default). Checked that nokmsboot is there and reboot...

Try  modprobe nvidia304
The dmesg messages are thuis:

[   60.631870] NVRM: The NVIDIA probe routine was not called for 1 device(s).
[   60.631878] NVRM: This can occur when a driver such as nouveau, rivafb,
NVRM: nvidiafb, or rivatv was loaded and obtained ownership of
NVRM: the NVIDIA device(s).
[   60.631882] NVRM: Try unloading the conflicting kernel module (and/or
NVRM: reconfigure your kernel without the conflicting
NVRM: driver(s)), then try loading the NVIDIA kernel module
NVRM: again.
[   60.631886] NVRM: No NVIDIA graphics adapter probed!


Then... nouveau is loaded???

# lsmod |grep nouveau
nouveau               833186  1 
mxm_wmi                12893  1 nouveau
wmi                    18590  2 mxm_wmi,nouveau
video                  18690  1 nouveau
ttm                    71480  1 nouveau
drm_kms_helper         43161  1 nouveau
drm                   227906  3 ttm,drm_kms_helper,nouveau
i2c_algo_bit           13197  1 nouveau
i2c_core               30189  5 drm,drm_kms_helper,i2c_algo_bit,nouveau,i2c_nforce2
button                 13599  1 nouveau

Yes!! Then this is preventing nvidia to load... Let's try to boot with linux-nonfb... 

A new error to report: linux-nonfb loads framebuffer... ( https://bugs.mageia.org/show_bug.cgi?id=9220 )

Ok, lets try rmmod -f nouveau... Framebuffer is broken but happyly, I am accessing via ssh. Now let's try to modprobe nvidia304

OK!! It loads now!! 

[  357.638071] nvidia: module license 'NVIDIA' taints kernel.
[  357.668053] nvidia 0000:00:0d.0: setting latency timer to 64
[  357.668064] vgaarb: device changed decodes: PCI:0000:00:0d.0,olddecodes=io+mem,decodes=none:owns=io+mem
[  357.668414] NVRM: loading NVIDIA UNIX x86 Kernel Module  304.64  Tue Oct 30 11:09:29 PDT 2012

OK, now drakx11 to activate x at startup, and service dm start starts X ok. Log-in kde works OK.


Then, the problem is the system is loading nouveau even with nonfb option.
Sander Lepik 2013-03-01 13:46:41 CET

CC: (none) => thierry.vignaud, tmb

Comment 11 Alejandro Vargas 2013-03-01 13:47:27 CET
New test: added to /etc/modules the following

blacklist nouveau
nvidia304

But nouveau stills being loaded, may be because many other modules depends on it:

nouveau               833186  1 
mxm_wmi                12893  1 nouveau
wmi                    18590  2 mxm_wmi,nouveau
video                  18690  1 nouveau
ttm                    71480  1 nouveau
drm_kms_helper         43161  1 nouveau
drm                   227906  3 ttm,drm_kms_helper,nouveau
i2c_algo_bit           13197  1 nouveau
i2c_core               30189  5 drm,drm_kms_helper,i2c_algo_bit,nouveau,i2c_nforce2
button                 13599  1 nouveau

Googling I've found that it is a known problem: https://bugzilla.redhat.com/show_bug.cgi?id=611427

ftp://download.nvidia.com/XFree86/Linux-x86/256.44/README/commonproblems.html


I've tried all the sugested solutions: add this lines to /etc/modules

blacklist nouveau
options nouveau modeset=0

And add rdblacklist=nouveau to menu.lst but the problem persists.
Comment 12 Thomas Backlund 2013-03-01 14:10:56 CET
Is your system fully updated ?

what is the version of dracut ?

rpm -q dracut 

what is the contents of /boot ?

ls -l  /boot



when you have booted into a working system,

what does this command return:

lsinitrd | grep grep


If that returns nothing, do:

dracut -f


and try this again:

lsinitrd | grep grep


if it now finds grep, try to reboot and see if its booting correctly again.
Comment 13 Alejandro Vargas 2013-03-01 14:39:03 CET
Found a solution for the problem:


The problem is the FILENAME of the nvidia driver. It's name is nvidia304 instead of nvidia. Then it is loaded OK but some program thinks it is not loaded and swithcs to nouveau.

Solution: rename or link nvidia304 to nvidia

What I made:

# cd /lib/modules/3.8.0-desktop-3.mga3/dkms-binary/drivers/char/drm
# ln -s nvidia304.ko.xz nvidia.ko.xz
# depmod -a

Then re-run drakx11, select propietary driver, reboot and all works OK... until a kernel update. It must be fixed in the kernel. I am posting a new bug related to the kernel module
Alejandro Vargas 2013-03-01 14:39:36 CET

Priority: Normal => release_blocker

Alejandro Vargas 2013-03-01 14:42:41 CET

Keywords: NEEDINFO => PATCH

Comment 14 Thomas Backlund 2013-03-01 15:55:52 CET
(In reply to Alejandro Vargas from comment #13)
> Found a solution for the problem:
> 
> 
> The problem is the FILENAME of the nvidia driver. It's name is nvidia304
> instead of nvidia. Then it is loaded OK but some program thinks it is not
> loaded and swithcs to nouveau.
> 
> Solution: rename or link nvidia304 to nvidia
> 


Thats not the correct fix, it only hides the problem in your case.

We have a module aliasing and update-alternatives setup that copes with
the names of the differently named nvidia modules we provide, and this has worked for a long time...

So something else is broken in your system....




And you didn't reply to my questions in comment 12
Comment 15 Alejandro Vargas 2013-03-01 16:58:38 CET
My system is a cometely fresh and updated instalation of the dvd of Mageia 3, then if there is a problem on my system, it was caused for the jstaller of te packets.

I will run the commands idicated on comment 12, but the problem is very clear, either the name is wrong or some other config related to the name is wrong. I think the normal use is the filename of a module is the same as the module name but any other solution you can find would be ok if nvidia cards works...
Comment 16 Luc Menut 2013-03-01 23:19:00 CET
Hmm, I wonder if we don't have a bug specific to nvidia304 in harddrake.

@Alejandro, after making a copy of the original /usr/share/harddrake/service_harddrake, could you try to edit as root this file, and add "nvidia304.ko" in the list of module_names at line 179 (or replace by the service_harddrake in attachment).
After this change, could you try to reconfigure X with drakx11 and choosing the nvidia driver.

CC: (none) => lmenut

Comment 17 Luc Menut 2013-03-01 23:20:07 CET
Created attachment 3570 [details]
service_harddrake with nvidia304.ko
Comment 18 Thomas Backlund 2013-03-01 23:39:29 CET
(In reply to Luc Menut from comment #16)
> Hmm, I wonder if we don't have a bug specific to nvidia304 in harddrake.
> 


Ah, indeed... nice catch. we do miss the 304 one.

I'll commit the fix to svn and release a new drakxtools
Comment 19 Manuel Hiebel 2013-03-01 23:53:37 CET
(and we have already a bug about nvidia 304: https://bugs.mageia.org/show_bug.cgi?id=8773 )
Comment 20 Thomas Backlund 2013-03-02 00:07:30 CET
Ok, harddrake-15.24.1-1.mga3 with a fix for nvidia304 driver. 

Please install that, reconfigure your system to use the proprietary driver, 
and reboot.

does the system now work ?
Comment 21 Manuel Hiebel 2013-03-03 12:13:54 CET
definitely a duplicate

*** This bug has been marked as a duplicate of bug 8773 ***

Status: NEW => RESOLVED
Resolution: (none) => DUPLICATE

Comment 22 Alejandro Vargas 2013-03-03 12:34:57 CET
Thomas: what if the version of nvidia304 hanged to nvidia305?

I think the problem is the name itself. It is the first time I see a kernel module which filename is different from the module name (the modle name is nvidia, but the filename is nvidia304).  Adding nvidia304 to harddrake will fix the problem for now but the next mandriva version or a user custom update may have the same problem.
Comment 23 Thomas Backlund 2013-03-03 12:48:58 CET
Well,
if/when we add another named module we will update service_harddrake again,

for the specially named kernel modules, this is needed as we must be able to
install them all on a system, wich is something we rely on with LiveCDs


and I assume you mean "next _mageia_ version"

As for what "user custom update" does, that is not something we can do anything about. if enduser decides to "break" his/her system they get to keep the pieces.

Note You need to log in before you can comment on or make changes to this bug.