Bug 28825 - Problems with nvidia and kernel compatibility in Mageia 8?
Summary: Problems with nvidia and kernel compatibility in Mageia 8?
Status: RESOLVED OLD
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: 8
Hardware: All Linux
Priority: Normal normal
Target Milestone: ---
Assignee: Kernel and Drivers maintainers
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-04-22 00:13 CEST by Adelson Oliveira
Modified: 2024-01-01 15:09 CET (History)
3 users (show)

See Also:
Source RPM: nvidia and kernel
CVE:
Status comment:


Attachments
Xorg.0.log when X server works (49.64 KB, text/plain)
2021-04-22 03:44 CEST, Adelson Oliveira
Details
xorg.conf generated by XFdrake (14.41 KB, text/plain)
2021-04-22 16:40 CEST, Adelson Oliveira
Details
xorg.conf after mageia-prime-install (4.09 KB, text/plain)
2021-04-22 17:07 CEST, Adelson Oliveira
Details

Description Adelson Oliveira 2021-04-22 00:13:35 CEST
Justo to give a chronological description:

I've just made an update 7 to 8 about 8 hours ago. Everything went fine and I could boot mageia 8 without problems. Then I've added the repo:

https://mirrors.mageia.org/mirrors/mageia.c3sl.ufpr.br

and got the warn of new updates to glibc, kernel, and many other packages

Then after updating the following happened:
1-My second monitor was no longer seen;
2-After reconfiguring X server with XFdrake to recover the second monitor I got a black screen;
3-After removing /etc/X11/xorg.conf, the X server came back and the second monitor too but there is still something weird. 

During boot, I get an warning of possible fail in X server load due to a conflict between the kernel module loaded and the X server marked to be used. Although the X server does not fail, when I try,

$ nvidia-settings
(nvidia-settings:192925): Gtk-WARNING **: 19:08:32.934: Theme parsing error: gtk.css:2:33: Failed to import: Error in opening file /home/adhefe/.config/gtk-3.0/window_decorations.css: file or directory not found

(nvidia-settings:192925): GLib-GObject-CRITICAL **: 19:08:32.986: g_object_unref: assertion 'G_IS_OBJECT (object)' failed

ERROR: nvidia-settings could not find the registry key file or the X server is not
       accessible. This file should have been installed along with this driver at
       /usr/share/nvidia/nvidia-application-profiles-key-documentation. The application
       profiles will continue to work, but values cannot be prepopulated or validated, and
       will not be listed in the help text. Please see the README for possible values and
       descriptions.

*****
Nvidia packages:

x11-driver-video-nvidia-current-460.67-1.mga8.nonfree
nvidia-current-utils-460.67-1.mga8.nonfree
nvidia-current-doc-html-460.67-1.mga8.nonfree
dkms-nvidia-current-460.67-1.mga8.nonfree
nvidia-current-cuda-opencl-460.67-1.mga8.nonfree
lib64nvidia-egl-wayland1-1.1.5-3.mga8

Kernel packages:
kernel-desktop-5.10.30-1.mga8-1-1.mga8
kernel-firmware-nonfree-20210310-1.mga8.nonfree
kernel-userspace-headers-5.10.30-1.mga8
kernel-firmware-20201218-1.mga8
kernel-desktop-devel-latest-5.10.30-1.mga8
kernel-desktop-latest-5.10.30-1.mga8
kernel-desktop-devel-5.10.30-1.mga8-1-1.mga8
Comment 1 Dave Hodgins 2021-04-22 00:49:27 CEST
It's working ok for me ...
[dave@x8t ~]$ uname -a
Linux x8t.hodgins.homeip.net 5.10.30-desktop-1.mga8 #1 SMP Wed Apr 14 09:10:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
[dave@x8t ~]$ rpm -qa|grep -e nvidia -e kernel|sort -V
dkms-nvidia-current-460.67-1.mga8.nonfree
kernel-desktop-5.10.30-1.mga8-1-1.mga8
kernel-desktop-5.10.32-1.mga8-1-1.mga8
kernel-desktop-devel-5.10.30-1.mga8-1-1.mga8
kernel-desktop-devel-5.10.32-1.mga8-1-1.mga8
kernel-desktop-devel-latest-5.10.32-1.mga8
kernel-desktop-latest-5.10.32-1.mga8
kernel-firmware-20201218-1.mga8
kernel-firmware-nonfree-20210310-1.mga8.nonfree
kernel-userspace-headers-5.10.32-1.mga8
lib64nvidia-egl-wayland1-1.1.5-3.mga8
libnvidia-egl-wayland1-1.1.5-3.mga8
nvidia-current-cuda-opencl-460.67-1.mga8.nonfree
nvidia-current-doc-html-460.67-1.mga8.nonfree
nvidia-current-lib32-460.67-1.mga8.nonfree
nvidia-current-utils-460.67-1.mga8.nonfree
virtualbox-kernel-5.10.30-desktop-1.mga8-6.1.20-1.mga8
virtualbox-kernel-desktop-latest-6.1.20-1.mga8
x11-driver-video-nvidia-current-460.67-1.mga8.nonfree

It's also working under the 5.10.32 kernel currently in updates testing.

I'm using mageia-prime as this is a dual gpu system ...
# lspcidrake -v|grep Card
Card:ATI Volcanic Islands and later (amdgpu): Advanced Micro Devices, Inc. [AMD/ATI]|Renoir [DISPLAY_VGA] (vendor:1002 device:1636 subv:1043 subd:1e21) (rev: c6)
Card:NVIDIA GeForce 635 series and later: NVIDIA Corporation|TU106M [GeForce RTX 2060 Mobile] [DISPLAY_VGA] (vendor:10de device:1f15 subv:1043 subd:1e21) (rev: a1)

$ tree -ifa|grep window_decorations.css
./.config/gtk-3.0/window_decorations.css -> /usr/share/themes/Breeze/window_decorations.css
$ rpm -q -f /usr/share/themes/Breeze/window_decorations.css
kde-gtk-config-5.20.4-1.mga8

Is kde-gtk-config installed?

CC: (none) => davidwhodgins

Comment 2 Adelson Oliveira 2021-04-22 00:51:30 CEST
I'd like to change this bug so as to make reference to mageia-prime instead of kernel and nvidia.

This is a hybrid graphics notebook.

If I remove xorg.conf X server initializes well.

If I run mageia-prime-install, then X server does not initializes well.

Configurations were OK with mageia-prime up to the updates about 3 hours ago...

Thanks and sorry for the confusing report...
Comment 3 Adelson Oliveira 2021-04-22 00:53:15 CEST
Yes, 

kde-gtk-config-5.20.4-1.mga8
Comment 4 Adelson Oliveira 2021-04-22 01:03:06 CEST
I'm really confused!
I have no xorg.conf from mageia-prime-install at /etc/X11/ because I've moved it to another file.

but

glxspheres64 seems to be using my nvidia card!

$ glxspheres64
Polygons in scene: 62464 (61 spheres * 1024 polys/spheres)
GLX FB config ID of window: 0xad (8/8/8/0)
Visual ID of window: 0x27
Context is Direct
OpenGL Renderer: GeForce GTX 1050 Ti/PCIe/SSE2
60.070471 frames/sec - 67.038646 Mpixels/sec
59.998720 frames/sec - 66.958571 Mpixels/sec
60.001469 frames/sec - 66.961639 Mpixels/sec
60.034977 frames/sec - 66.999035 Mpixels/sec
60.003420 frames/sec - 66.963816 Mpixels/sec
Comment 5 Dave Hodgins 2021-04-22 01:47:49 CEST
On my system ...
$ glxspheres64
Polygons in scene: 62464 (61 spheres * 1024 polys/spheres)
GLX FB config ID of window: 0xad (8/8/8/0)
Visual ID of window: 0x27
Context is Direct
OpenGL Renderer: GeForce RTX 2060/PCIe/SSE2
5961.431066 frames/sec - 6652.957070 Mpixels/sec
6362.893808 frames/sec - 7100.989489 Mpixels/sec
6233.991825 frames/sec - 6957.134877 Mpixels/sec
6288.607928 frames/sec - 7018.086447 Mpixels/sec

$ head -n 2 /etc/X11/xorg.conf
#
# automatically generated by mageia-prime-install

I'd try running mageia-prime-uninstall and then mageia-prime-install
Comment 6 Dave Hodgins 2021-04-22 01:51:25 CEST
I'd also try either switching the theme to breeze, or manualy creating the
symlink ~/.config/gtk-3.0/window_decorations.css -> /usr/share/themes/Breeze/window_decorations.css

with
ln -s /usr/share/themes/Breeze/window_decorations.css ~/.config/gtk-3.0/
Comment 7 Adelson Oliveira 2021-04-22 02:02:12 CEST
I've already did prime uninstall and install. If you're surprised with the performance of my nvidia, this changes significantly when I'm not using the second monitor. I've already reported this a few years ago. Don't know why this is so. By now, I'd like to know why  I have to remove xorg.conf to the X server to go.
Comment 8 Dave Hodgins 2021-04-22 02:17:02 CEST
Assigning to the kernel team. Hopefully they'll be able to figure out
what's needed to get the second monitor working (I only use one on each of
my systems).

Assignee: bugsquad => kernel

Comment 9 Adelson Oliveira 2021-04-22 03:34:37 CEST
This is how it is here:

Just after installation of mageia 8 packages about 12 hours ago everything was fine. Then, it came another update (about 8 or 7 hours ago) to kernel, etc. After this mageia 8 update, my second monitor was not detected, only the first (main) screen worked.

I've done a few useless tests with XFdrake reported above, please forget it.

Now, if I reinstall mageia-prime (uninstall + install) the X server doesn't go. Then, I move xorg.conf to xorg.conf_old and the X server is back again.

As reported above, even without xorg.conf (that is likely the reason for the warning on conflicting modules also reported above), glxspheres uses nvidia to render.

Below is a copy of the Xorg.0.log when the X server did not go (xorg.conf was present). It makes reference to the Intel Device in xorg.conf. As reported by mcc => hardware => video cards I have:

CoffeeLake-H GT2 [UHD Graphics 630]
GP107M [GeForce GTX 1050 Ti Mobile]

This is the Xorg.0.log:


[    24.005] (--) Log file renamed from "/var/log/Xorg.pid-6915.log" to "/var/log/Xorg.0.log"
[    24.006] 
X.Org X Server 1.20.11
X Protocol Version 11, Revision 0
[    24.006] Build Operating System: rabbit 5.10.25-server-1.mga7 
[    24.006] Current Operating System: Linux localhost.localdomain 5.10.30-desktop-1.mga8 #1 SMP Wed Apr 14 09:10:47 UTC 2021 x86_64
[    24.006] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.10.30-desktop-1.mga8 root=UUID=ae29d4dc-eefa-4f63-bb09-bc71884913e1 ro splash quiet noiswmd resume=/dev/nvme0n1p1 audit=0 rd.driver.blacklist=nouveau driver.blacklist=nouveau xorg.blacklist=nouveau vga=788
[    24.006] Build Date: 13 April 2021  09:02:48PM
[    24.006] Build ID: x11-server 1.20.11-1.mga8 
[    24.006] Current version of pixman: 0.40.0
[    24.006] 	Before reporting problems, check https://bugs.mageia.org
	to make sure that you have the latest version.
[    24.006] Markers: (--) probed, (**) from config file, (==) default setting,
	(++) from command line, (!!) notice, (II) informational,
	(WW) warning, (EE) error, (NI) not implemented, (??) unknown.
[    24.006] (==) Log file: "/var/log/Xorg.0.log", Time: Wed Apr 21 21:38:33 2021
[    24.006] (==) Using config file: "/etc/X11/xorg.conf"
[    24.006] (==) Using config directory: "/etc/X11/xorg.conf.d"
[    24.006] (==) Using system config directory "/usr/share/X11/xorg.conf.d"
[    24.006] Data incomplete in file /etc/X11/xorg.conf
	Undefined Device "intel" referenced by ServerLayout "layout".
[    24.006] (EE) Problem parsing the config file
[    24.006] (EE) Error parsing the config file
[    24.006] (EE) 
Fatal server error:
[    24.006] (EE) no screens found(EE) 
[    24.006] (EE) 
Please consult the Mageia support 
	 at https://bugs.mageia.org
 for help. 
[    24.006] (EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information.
[    24.006] (EE) 
[    24.006] (EE) Server terminated with error (1). Closing log file.
Comment 10 Adelson Oliveira 2021-04-22 03:44:46 CEST
Created attachment 12661 [details]
Xorg.0.log when X server works

This is the Xorg.0.log with the X server working (xorg.conf is absent)
Comment 11 Adelson Oliveira 2021-04-22 03:47:15 CEST
As a reference, bug 24552,
https://bugs.mageia.org/show_bug.cgi?id=24552

 is about this very machine and how the second monitor affects performance when using mageia-prime.

thanks a lot
Comment 12 Giuseppe Ghibò 2021-04-22 12:55:41 CEST
A few notes:

To narrow the problems you have to find whether the problem is due to driver or configuration or something else (w.g. hardware/technology limit).

The starting point from a mageia-prime configuration is having a working configuration with the Intel (or any other Integrated graphics card, there are also AMD ones in some AMD/Nvidia combination) card from XFdrake. You don't have to configure first from XFdrake the proprietary Nvidia driver. XFdrake with Nvidia for now is reserved to the Desktop Nvidia cards, not the ones based on Optimus Prime technology.

The second source of problems might be the nvidia driver mismatch. It seems that in a particular combination of upgrades, i.e. when both kernel and nvidia driver are updated in the same run, a dkms script might build a mismatching driver (in that case the mismatched ones need to be removed handly).

Once ensured there is no mismatch (e.g. with modinfo), i.e. dkms are correctly built, you can dig into further problems.

As for performance, you need to exclude the vblank synching before benchmarking, otherwise you'll get always the frame rate of your monitor. The vblank synching can be excluded on a per-apps basis in different ways, according you are using mesa or nvidia. For mesa you have to set the env var vblank_mode=0, for nvidia vblank_mode is ignored and the right env var is: __GL_SYNC_TO_VBLANK=0, using both env var, you'll exclude synching to vblank in either nvidia and mesa cases, e.g.

vblank_mode=0  __GL_SYNC_TO_VBLANK=0 glxspheres64
vblank_mode=0  __GL_SYNC_TO_VBLANK=0 glxgears

etc.; as for performance drop when attaching the 2nd monitor, is that significative while "running"? E.g. start running the bench, then attach the second monitor while running without stopping the benchmark. Maybe that's intrinsic to hw (e.g. because the 2nd monitor is not physically connected to the nvidia hardware, does on Win, if you have installed, shows the same drop?).

CC: (none) => ghibomgx

Comment 13 Adelson Oliveira 2021-04-22 16:38:56 CEST
Hi Giuseppe,

I've configured the intel card with XFdrake and I will attach the corresponding xorg.conf. In this case, second monitor is not detected no matter if it is plugged or unplugged during XFdrake configuration. The X server is loaded however.

The symbolic link X => ../../usr/bin/Xorg used to be present and during mageia-prime-install it was reported that I was running X without a xorg.conf no matter if the xorg.conf was in /etc/X11. Then I've tried running mageia-prime with and without this symbolic link. In both cases, the X server is not loaded after mageia-prime-install.

The only way to have the X server loaded and the second monitor detected is to remove xorg.conf generated after running mageia-prime-install.

About mismatch of drivers nvidia and kernel, dmesg doesn't have any line on this and modinfo (modinfo nvidia_drm) reports the right nvidia version 460.67.

I'll report the bench to check for hw problems later ...
Comment 14 Adelson Oliveira 2021-04-22 16:40:14 CEST
Created attachment 12664 [details]
xorg.conf generated by XFdrake

This xorg.conf is the result of XFdrake when configuring intel's video card
Comment 15 Adelson Oliveira 2021-04-22 16:56:04 CEST
The bench test:

I've boot without attaching the second monitor. 
The glxspheres, the benching test, worked with more than 3000 frames/per second.
Upon plugging the second monitor, glxspheres changes its performance to about 60 frames/per second.

When I plug the second monitor the screen is black for one or two seconds, so I can't say if glxspheres is automatically turned off and on again.
Comment 16 Adelson Oliveira 2021-04-22 17:07:52 CEST
Created attachment 12665 [details]
xorg.conf after mageia-prime-install

I guess it may be interesting to see how is xorg.conf after mageia-prime since this changes causes the errors seen in Xorg.0.log as listed above
Comment 17 Giuseppe Ghibò 2021-04-22 17:34:53 CEST
The latest attach doesn't seems generated by mageia-prima-install, but rather overlapped later by XFdrake or something other or tweaked manually, for using with xinerama. It seems a configuration suitable for a multimonitor but for Desktop discrete nvidia cards, and not with Optimus. Furthermore what is also strange is that it messes with both "nouveau" and "nvidia": device1 with nvidia, device2 with nouveau. I don't think would have ever worked mixing nvidia and nouveau modules at the same time.

As for the 60fps, that's exactly the framerate of your display, not the maximum fps of your card. Are you sure you have used "vblank_mode=0 __GL_SYNC_TO_VBLANK=0 glxspheres64"?

AFAIK the multimonitor is detected automatically once you plug the monitor HDMI cable. Just config with XFdrake in Intel mode and no extra monitor installed or xinerama, Then later plug the HDMI cable of the monitor (or even an HDMI TV-set) later: in Plasma5 it just extends the screen to the new monitor. Furthermore with the Plasma Applet/Widget "Display Configuration", then you can tweak the position of the new screen (right of, duplicated, left, etc.). The same, if after having configured Intel in single monitor you switch to "mageia-prime-install" fur using the NVidia Optimus: the screen extent is automatically detected by Xorg when you plug the HDMI cable.
Comment 18 Adelson Oliveira 2021-04-22 19:23:17 CEST
Sorry for the lack of __GL_SYNC_TO_VBLANK=0 in the test, with it performance is kept above 3000 fps! Thanks for that.

It seems that xorg.conf was altered only at its end and one other line. But, yes, it is how it looks like just after running mageia-prime. I have this at /etc/X11,

$ ls -al /etc/X11/
total 48
drwxr-xr-x 1 root root  450 abr 22 11:17 ./
drwxr-xr-x 1 root root 5664 abr 22 11:12 ../
drwxr-xr-x 1 root root    6 abr 13 18:05 app-defaults/
drwxr-xr-x 1 root root 1244 abr 13 18:05 fontpath.d/
drwxr-xr-x 1 root root   22 abr 21 17:33 gdm/
drwxr-xr-x 1 root root   24 abr 21 11:43 mwm/
drwxr-xr-x 1 root root    0 fev 12 16:42 wmsession.d/
drwxr-xr-x 1 root root  190 abr 21 11:45 xdm/
drwxr-xr-x 1 root root   92 abr 21 11:45 xinit/
drwxr-xr-x 1 root root  416 abr 22 11:12 xinit.d/
-rw-r--r-- 1 root root  213 fev 12 16:42 Xmodmap
-rw-r--r-- 1 root root 7662 abr 22 11:12 xorg.conf.bak.beforenvidiaprime
drwxr-xr-x 1 root root  100 abr 22 11:12 xorg.conf.d/
-rw-r--r-- 1 root root 4189 abr 22 11:12 xorg.conf.nvidiaprime
-rw-r--r-- 1 root root 4189 abr 22 10:51 xorg.conf.nvidiaprime.preserve
-rw-r--r-- 1 root root    0 abr 21 21:37 xorg.conf.nvidiaprime.xorgfree
-rw-r--r-- 1 root root 4189 abr 22 11:12 xorg.conf_velho
-rw-r--r-- 1 root root 1652 fev 12 16:42 Xresources
-rwxr-xr-x 1 root root 5252 fev 12 16:42 Xsession*
drwxr-xr-x 1 root root  344 abr 22 11:12 xsetup.d/
Comment 19 Adelson Oliveira 2021-04-22 19:25:13 CEST
But, asap I'll repeat XFdrake and mageia-prime. I haven't choose Xinerama in XFdrake,I just change the resolution so as to have 1920X1080 with 24 colors.
Comment 20 Adelson Oliveira 2021-04-22 19:56:48 CEST
I can say in advance that attachments 12664 e 12665 differ, essentially, at the Server Layout where the errors in Xorg.0.log are reported. This session was created when I plugged the second monitor and had to define what would the second monitor use. I've chosen that would be "an extension to the left".

Mageia-prime inserted the line "Inactive Intel" on xorg.conf and moved the comment #Option Xinerama. This "Inactive Intel" seems to be what Xorg.0.log sees as an error.
Comment 21 Giuseppe Ghibò 2021-04-22 20:37:34 CEST
Mageia-prime doesn't insert that lines to an existing xorg.conf files, it should have been some modification introduced by XFdrake or something other process. mageia-prime just restores the previous configuration (either intel, or from a previous mageia-prime run, which could have been then later modified by hand).

Try to run mageia-prime-uninstall, and then "mageia-prime-install -f -z"; if you run Plasma5, you might wait 2 minutes for next login completion after an X11 zapping (that's another bug of plasma-workspace or kwin).
Comment 22 Adelson Oliveira 2021-04-22 23:37:12 CEST
Now I'm in really in trouble!

Made the tests up to mageia-prime-install -f -z and then the X was reinitiated with everything OK, the xorg.conf did not show those lines about Inactive intel...

Then, just to have an idea of what options -f and -z meant I've tried as root

mageia-prime-install --help

mageia-prime-install tried to reconfigure again but I did CTRL C to stop it.

Now the system does not boot hanging with kernel panic.

I'm using a Live iso to access the computer and would like to ask for how to recover!
Comment 23 Adelson Oliveira 2021-04-23 00:23:34 CEST
Solved the kernel panic from live iso after copying initrd-...img.old to initrd...img

And yes, don't know how, everything seems to be OK now: two monitors, a useful xorg.conf on /etc/X11 ...

I'm still wondering what options -f does in mageia-prime-install.

Could not answer what changed xorg.conf to insert Inactive intel after all, but if it seems interesting I can make more tests. If not I would like to close this issue.

Just would like to point to the danger when using mageia-prime-install. I've done CRTL C just after it had appended the word old to initrd. In fact, mageia-prime had nothing to do because the system was already configured to optimus and nothing should have been done to initrd in first place ...

Thanks a lot.
Comment 24 Giuseppe Ghibò 2021-04-23 10:24:34 CEST
"mageia-prime-install --help" is not a valid option, use "mageia-prime-install -h".
Comment 25 Giuseppe Ghibò 2021-04-23 10:39:39 CEST
Furthermore the full documentation is in: 

/usr/share/doc/mageia-prime/README.md

You can view with any MarkDown viewer, e.g. cutemarked

cutemarked /usr/share/doc/mageia-prime/README.md
Comment 26 Morgan Leijström 2023-06-23 09:03:11 CEST
Reporter, how are things today?

CC: (none) => fri

Comment 27 Morgan Leijström 2024-01-01 15:09:02 CET
No reply for very long and Mageia 8 is now EOL.

If still problems consult https://wiki.mageia.org/en/Mageia-prime_for_Optimus, and if then still problems ask on forum and/or open a new bug.

Status: NEW => RESOLVED
Resolution: (none) => OLD


Note You need to log in before you can comment on or make changes to this bug.