Bug 31897 - Repeated Xorg login failures with nvidia graphics hardware
Summary: Repeated Xorg login failures with nvidia graphics hardware
Status: NEEDINFO
Alias: None
Product: Mageia
Classification: Unclassified
Component: Release (media or process) (show other bugs)
Version: Cauldron
Hardware: x86_64 Linux
Priority: Normal minor
Target Milestone: ---
Assignee: Mageia Bug Squad
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-05-08 20:22 CEST by Len Lawrence
Modified: 2024-06-22 19:36 CEST (History)
5 users (show)

See Also:
Source RPM:
CVE:
Status comment:


Attachments
Journal output from test run of Mageia-9-beta2 on x86_64 box with nvidia graphics (19.00 KB, application/octet-stream)
2023-05-09 11:27 CEST, Len Lawrence
Details

Description Len Lawrence 2023-05-08 20:22:56 CEST
Description of problem:
On a newly installed system with several desktop environments and nvidia graphics hardware it is impossible to login to DEs which use Xorg graphics rather than Wayland.  Attempting to do so by switching the graphics driver have so far always failed; this applies to the proprietary nvidia current driver, nouveau and modesetting.  graphics card = GeForce GTX 1080 Ti


Version-Release number of selected component (if applicable):
Mageia-9-beta2-x86_64

How reproducible:
On one specific machine it is consistent across all reboots.

Steps to Reproduce:
1. Install the OS (on a machine with an nvidia graphics card) from the latest
   iso on a USB stick and implement the proprietary nvidia driver.
   Include several DEs : GNOME, GNOME Classic, Plasma, Mate, Xfce, Cinnamon
2. UEFI boot
3. Select either GNOME desktop, and login as user
   This should succeed.  Logout.
4. Select Plasma or any of the listed DEs apart from GNOME
   Login
5. Observe the screen clearing and the login greeter reappearing after a pause.
6. It should be possible to go back to GNOME.
7. Try another Xorg-based DE.
8. At this point the user may wish to try other graphics drivers.
   Ctrl-Alt-F2 to raise a console.  Login as root from a user session and run 
   drakx11 to select an alternative driver, reboot and check login behaviour
   again.
Comment 1 Lewis Smith 2023-05-08 21:15:37 CEST
Thanks for this curious report.
Do you think it related to the specific nVidia card GeForce GTX 1080 Ti?
Please post the O/P of:
 $ inxi -G
to clarify the graphics.
You refer to "one specific machine". Do you have another with nVidia graphics and Mageia 9 in some form that does not show the problem? [which is almost the inverse of that whereby Gnome & its relatives would not login, due to a fault in a specific version of glib].

Perhaps after a fresh boot, one successful login/logout to Gnome, and a bounced login to a different desktop, attach the journal:
 $ journalctl -b --no-hostname | xz > journal.txt.xz

CC: (none) => lewyssmith

Comment 2 Len Lawrence 2023-05-09 11:22:38 CEST
This is the only nvidia machine here so I cannot comment on how specific this might be to the graphics card.  Thanks for the suggestions.  The problem has been aired on QA Discuss but nobody else seems to have encountered the problem.  Not many nvidia users left.

$ inxi -G
Graphics:
  Device-1: NVIDIA GP102 [GeForce GTX 1080 Ti] driver: nouveau v: kernel
  Device-2: PCTV Systems tripleStick 292e type: USB driver: em28xx
  Display: wayland server: X.Org v: 22.1.9 with: Xwayland v: 22.1.9
    compositor: gnome-shell v: 44.1 driver: X: loaded: nouveau,v4l dri: nouveau
    gpu: nouveau resolution: 2560x1440~60Hz
  API: OpenGL v: 4.3 Mesa 23.0.3 renderer: NV132

Booted the test system.  Logged in to GNOME Classic; OK.
Logged out and attempted login to Cinnamon; failed.
Logged in to GNOME Classic.

$ journalctl -b --no-hostname | xz > journal.txt.xz

Attaching the journal output.
Comment 3 Len Lawrence 2023-05-09 11:27:14 CEST
Created attachment 13821 [details]
Journal output from test run of Mageia-9-beta2 on x86_64 box with nvidia graphics

CC: (none) => tarazed25

Comment 4 Len Lawrence 2023-05-09 12:56:26 CEST
Repeated earlier experiments with graphics drivers nouveau and modesetting.
DEs GNOME and GNOME Classic were accessible but none of the others (Cinnamon, Mate, Xfce, Plasma).
For GNOME login with modesetting:
$ inxi -G

Graphics:
  Device-1: NVIDIA GP102 [GeForce GTX 1080 Ti] driver: nouveau v: kernel
  Device-2: PCTV Systems tripleStick 292e type: USB driver: em28xx
  Display: wayland server: X.Org v: 22.1.9 with: Xwayland v: 22.1.9
    compositor: gnome-shell v: 44.1 driver: X: loaded: modesetting,v4l
    dri: nouveau gpu: nouveau resolution: 2560x1440~60Hz
  API: OpenGL v: 4.3 Mesa 23.0.3 renderer: NV132
Comment 5 Len Lawrence 2023-05-09 13:36:34 CEST
Looking at comment 2 I have to wonder where is the proprietary driver?
Tried reinstalling it via drakx11 and ended up with a system which refused ALL logins on the test partition.  I had not seen the usual message about rebuilding the nvidia driver after reboot.  Next removed dkms-nvidia-current and reinstalled it and rebooted.  Ended up with a blank screen with a blinking cursor.  Raised a console and ran startx - nothing doing - there was a problem with xauth.  At this point a reinstallation or upgrade is in order but need to capture the journal.
Comment 6 Dave Hodgins 2023-05-09 15:35:01 CEST
Regarding comment 5, has ~/.Xauthority become owned by root?

CC: (none) => davidwhodgins

Comment 7 Len Lawrence 2023-05-09 19:10:37 CEST
In reply to Dave Hodgins, comment 5:
No, the permissions are 0600.
The Xorg.0.log file says module nvidia cannot be found.

Removed /etc/X11/xorg.conf and tried again.  This time the login to GNOME Classic succeeded.  User's .Xauthority contains the MIT-MAGIC-COOKIE.

# updatedb
# locate nvidia |grep -v lib|grep -v /home|grep -v share|grep -v src| less
/etc/dracut.conf.d/99-nvidia.conf
/usr/bin/nvidia-modprobe
/usr/bin/nvidia-ngx-updater
/usr/bin/nvidia-persistenced
/usr/bin/nvidia-smi

$ inxi -G
Graphics:
  Device-1: NVIDIA GP102 [GeForce GTX 1080 Ti] driver: nouveau v: kernel
  Device-2: PCTV Systems tripleStick 292e type: USB driver: em28xx
  Display: wayland server: X.org v: 1.21.1.8 with: Xwayland v: 22.1.9
    compositor: gnome-shell v: 44.1 driver: X: loaded: v4l gpu: nouveau
    resolution: 2560x1440
  API: OpenGL Message: No GL data found on this system.
Comment 8 Len Lawrence 2023-05-09 19:46:55 CEST
Continuing from comment 7.
No xorg.conf

Logged in to GNOME (Wayland) and GNOME Classic.
Failed for IceWM and Cinnamon Software Rendering,
Back to GNOME Classic.

# nvidia-settings

ERROR: NVIDIA driver is not loaded

Segmentation fault (core dumped)

Ran drakx11.
Enabled Translucency, RENDER acceleration, Force display mode of DVI
Forcing DVI enforces the use of the frame buffer (potential slowdown?).  Looks like a bad idea; drakx11 stalls waiting for the udev queue to empty.
Went back and undid the framebuffer connection.  Tried to complete and hit "Fatal server error - try to change some parameters"
Comment 9 Len Lawrence 2023-05-09 19:59:42 CEST
This is the output from the stalled drakx11:

Too late to run INIT block at /usr/lib64/perl5/vendor_perl/Glib/Object/Introspection.pm line 257.
Ignore the following Glib::Object::Introspection & Gtk3 warnings
Subroutine Gtk3::main redefined at /usr/share/perl5/vendor_perl/Gtk3.pm line 539.
Timed out for waiting the udev queue being empty.

Crashed out at that point.  nvidia-settings crashes with a core dump.
Repeated drakx11 and accepted default options.

xorg.conf contains

Section "Device"
    Identifier "device1"
    VendorName "NVIDIA Corporation"
    BoardName "NVIDIA GeForce 745 series and later"
    Driver "nvidia"
    Option "DPMS"
    Option "DynamicTwinView" "false"
    Option "AddARGBGLXVisuals"
EndSection

Reboot.
Comment 10 Len Lawrence 2023-05-09 20:13:48 CEST
Continuing from comment 9.
Could not login at all.
Removed xorg.conf and performed a cold boot.
Back to GNOME and GNOME Classic.

During the reboot there were several error messages including the string "maybe USB cable is bad".

No xorg.conf and no rebuilding of the nvidia kmod during boot.
Comment 11 Len Lawrence 2023-05-09 20:27:28 CEST
Installed dkms-nvidia-current and rebooted.
No USB cable error messages this time and no sign of the nvidia kmod being built.  `inxi -G` reports that the Xorg nouveau driver is being used.  No xorg.conf file in /etc/X11.
Comment 12 Len Lawrence 2023-05-10 01:48:51 CEST
Switched to another partition with a GNOME system already installed (from a Live session).  That appeared to be running nouveau so I tried upgrading using the classic iso and installing the nvidia driver.  No errors but it did not appear to take because this was the result:
$ inxi -G
Graphics:
  Device-1: NVIDIA GP102 [GeForce GTX 1080 Ti] driver: nouveau v: kernel
  Device-2: PCTV Systems tripleStick 292e type: USB driver: em28xx
  Display: x11 server: X.org v: 1.21.1.8 with: Xwayland v: 22.1.9 driver: X:
    loaded: modesetting,v4l dri: nouveau gpu: nouveau resolution: 2560x1440~60Hz
  API: OpenGL v: 4.3 Mesa 23.0.3 renderer: NV132

GNOME was functioning OK and it was possible to install Mate (task-mate-minimal and task-mate) and then to login to Mate.

About to test successive logins....
Comment 13 Len Lawrence 2023-05-10 02:52:52 CEST
So far so good:
Gnome Classic on Wayland
IceWM session
GNOME on Xorg

xorg.conf identifies the driver as modesetting with option DPMS.
Going back to Mate.
Comment 14 Len Lawrence 2023-05-10 03:12:16 CEST
That worked.
One major issue that emerges from all this is that the nvidia-current kernel module does not get built.

$ echo $XDG_SESSION_TYPE
x11
$ echo $XAUTHORITY
/home/lcl/.Xauthority

$ locate nvidia |grep -v lib|grep -v /home|grep -v share|grep -v src|grep -v /data
/etc/nvidia-current
/etc/OpenCL/vendors/nvidia.icd
/etc/X11/xorg.conf.d/10-nvidia.conf
/etc/dracut.conf.d/99-nvidia.conf
/etc/nvidia-current/ld.so.conf
/etc/nvidia-current/modprobe.conf
/etc/nvidia-current/nvidia-settings.xinit
/etc/vulkan/icd.d/nvidia_icd.json
/etc/vulkan/implicit_layer.d/nvidia_layers.json
/usr/bin/nvidia-bug-report.sh
/usr/bin/nvidia-cuda-mps-control
/usr/bin/nvidia-cuda-mps-server
/usr/bin/nvidia-debugdump
/usr/bin/nvidia-modprobe
/usr/bin/nvidia-ngx-updater
/usr/bin/nvidia-persistenced
/usr/bin/nvidia-powerd
/usr/bin/nvidia-settings
/usr/bin/nvidia-sleep.sh
/usr/bin/nvidia-smi
/usr/bin/nvidia-xconfig

$ ls /etc/nvidia-current
ld.so.conf  modprobe.conf  nvidia-settings.xinit
Comment 15 Dave Hodgins 2023-05-10 04:31:25 CEST
What's the output of "dkms status"?
Comment 16 Len Lawrence 2023-05-10 09:28:13 CEST
# dkms status
nvidia-current, 525.116.03-1.mga9.nonfree, 6.3.1-desktop-1.mga9, x86_64: installed 

That was after having tried to reinstall nvidia:
$ drakx11
Too late to run INIT block at /usr/lib64/perl5/vendor_perl/Glib/Object/Introspection.pm line 257.
Ignore the following Glib::Object::Introspection & Gtk3 warnings
Subroutine Gtk3::main redefined at /usr/share/perl5/vendor_perl/Gtk3.pm line 539.
Timed out for waiting the udev queue being empty.

Timed out for waiting the udev queue being empty.
Timed out for waiting the udev queue being empty.
Timed out for waiting the udev queue being empty.
^C

As root:
# drakx11
Too late to run INIT block at /usr/lib64/perl5/vendor_perl/Glib/Object/Introspection.pm line 257.
Ignore the following Glib::Object::Introspection & Gtk3 warnings
Subroutine Gtk3::main redefined at /usr/share/perl5/vendor_perl/Gtk3.pm line 539.

(drakx11:447041): IBUS-WARNING **: 08:09:19.080: The owner of /home/lcl/.config/ibus/bus is not root!
Timed out for waiting the udev queue being empty.
^C

The proprietary driver was chosen to cover the range "GeForce 745 and later" to match GeForce GTX 1080 Ti.
Comment 17 Len Lawrence 2023-05-10 09:39:53 CEST
Tried root again but switching to root in the proper fashion (su -) and the IBUS warning disappeared.  Otherwise the same result needing ^C to exit.
Comment 18 Len Lawrence 2023-05-10 10:30:39 CEST
The situation seems to be going from bad to worse.  logout failed.  In the end had to crash out with CtrlAltDel and login to a Mageia 8 partition, which took longer than expected.

Going to hose the bad mga9 partition, update the bootloader and reinstall on the blank partition.
Comment 19 Len Lawrence 2023-05-10 16:52:25 CEST
Cleaned the partition and formatted it looking for bad blocks.
Tried to install all desktops except GNOME.  Seemed to work but the installation eventually appeared to be hung.  Hammered the cancel key until something happened.  The installation restarted and was soon finished with a free graphics driver installed.  It rebooted with nouveau.   Selected Cinnamon, Xfce, Mate, Plasma (X11), IceWM, LXDE.  All logins successful.  In LXDE firefox produced "Invalid menu entry" and aborted but the browser could be launched from the cli.
Back to Mate.  One oddity is that the splash screen is stuck with one of the standard Xfce backgrounds but Mageia 9 Plymouth appears intermittently.

Next step is to install the GNOME DE, if possible.
Comment 20 Len Lawrence 2023-05-10 16:53:54 CEST
On second thoughts - try the nvidia driver first.
Comment 21 Len Lawrence 2023-05-10 17:03:44 CEST
Lots of dracut modules could not be installed, such as squash, memstrack, nvmf, iscsi... because associated commands could not be found and others like
"dracut: dracut module 'ifcfg' depends on 'network', which can't be installed"

# dkms status
nvidia-current, 525.116.03-1.mga9.nonfree, 6.3.1-desktop-1.mga9, x86_64: installed
Comment 22 Len Lawrence 2023-05-10 17:59:14 CEST
After reboot, in Mate
$ inxi -G
Graphics:
  Device-1: NVIDIA GP102 [GeForce GTX 1080 Ti] driver: nvidia v: 525.116.03
  Device-2: PCTV Systems tripleStick 292e type: USB driver: em28xx
  Display: x11 server: X.org v: 1.21.1.8 with: Xwayland v: 22.1.9 driver: X:
    loaded: nvidia,v4l gpu: nvidia resolution: 2560x1440~60Hz
  API: OpenGL v: 4.6.0 NVIDIA 525.116.03 renderer: NVIDIA GeForce GTX 1080
    Ti/PCIe/SSE2

So, it can be done but we have no explanation about what went wrong before.

Switching to Plasma - working fine. 
Cinnamon software rendering works also. 
Installed GNOME.
Logged in to GNOME Classic on Xorg : OK
Plain GNOME : OK
Was expecting a Wayland session but XDG_SESSION_TYPE=x11
GNOME on Xorg : OK

There must be something else to be installed to get Wayland on board.

Logged in to Mate.
So the login process works fine in an all Xorg environment.
Comment 23 Lewis Smith 2023-05-10 20:59:06 CEST
Thanks for all your extra tests, Len.
There was/is one fundamental question which got overlooked:
- What display manager?
- And if you changed it to another?
Chasing display drivers may have been barking up the wrong tree.

We know that if a multi-desktop installation includes Gnome, GDM is the default display manager. Which should only use Wayland for Gnome itself, X11 for the others.
[I have found that sometimes other display managers use X11 even for Gnome...]

I find an easy way to check, once in a session, is:
 $ env | grep -Ei 'wayland|x11'
Comment 24 Len Lawrence 2023-05-11 17:13:22 CEST
There were three that I can remember but it tended to be GDM for systems containing GNOME.  When GNOME was left out Plasma attracted SDDM.  There was a third one at some point - in the latest round of tests I used whatever was offered.  The third is the simple one with a pale blue background - is that XDM?
I checked XDG_SESSION at random, not in a regular fashion and did not note the results.
Comment 25 Len Lawrence 2023-05-11 17:25:32 CEST
Looking back at comment 22 treating the system where nvidia was successfully installed, as far as I can remember there was no Wayland version offered although the Xwayland driver was loaded.  That probably makes sense because GNOME was added after the first reboot, i.e. on an Xorg system.  Plasma Wayland was also missing - same thing.
Comment 26 Len Lawrence 2023-05-11 21:37:26 CEST
From what other users say this bug does not reproduce on other nvidia hardware of similar age.  It will be EOL as far as nvidia is concerned fairly soon so maybe the incompatability with GNOME Wayland should simply be accepted and the bug closed as WONTFIX.
Comment 27 Marja Van Waes 2024-06-22 19:36:57 CEST
Is this bug still present in Cauldron? It is over a year later and most of Cauldron has changed.

Status: NEW => NEEDINFO
CC: (none) => marja11


Note You need to log in before you can comment on or make changes to this bug.