Bug 17329 - [TRACKER] gdm does not start / gnome-session does not start
Summary: [TRACKER] gdm does not start / gnome-session does not start
Status: RESOLVED FIXED
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: Cauldron
Hardware: All Linux
Priority: Normal major
Target Milestone: ---
Assignee: Mageia Bug Squad
QA Contact:
URL:
Whiteboard: 6dev1
Keywords:
Depends on:
Blocks:
 
Reported: 2015-12-13 16:39 CET by Florian Hubold
Modified: 2017-01-17 10:29 CET (History)
5 users (show)

See Also:
Source RPM: gdm / gnome-session
CVE:
Status comment:


Attachments
output of "journalctl -ab|grep -i gdm|grep -v audit" with gdm debug logging enabled (55.89 KB, application/x-xz)
2015-12-13 17:21 CET, Florian Hubold
Details

Description Florian Hubold 2015-12-13 16:39:51 CET
Description of problem:

When booting the recent GNOME live images, e.g. Mageia-6-dev1-LiveDVD-GNOME-x86_64-DVD or trying to start gdm manually from runlevel 3, it does not start an X server but instead starts a lot of sessions/seats for the gdm user. When booting the live images usually the last few messages are repeated for every newly spawned session and look like the following example:

Starting User Manager for UID 1000
[ OK] Started Session c124 of user live.
[ OK] Started User Manager for UID 1000.
Stopping User Manager for UID 1000 ...
[ OK ] Stopped User Manager for UID 1000 ...
[ OK ] Removed slice user-1000.slice.
[ OK ] Created slice user-1000.slice.

(in the example it spawned 124 sessions before it hit a ulimit for open files)

Steps to Reproduce:

My current reproducer is as follows, can be done either from the live images or in cauldron. For live images, first 3 steps are not needed, one can boot directly to runlevel 3. For cauldron, disable other display manager systemd units, enable gdm and set default runlevel to 3 (multi-user, no X)

# systemctl disable prefdm.service sddm.service
# systemctl enable gdm.service
# systemctl set-default multi-user.target

Then reboot, login as root and start gdm via
# systemctl start gdm

This bug report tries to keep track of the issues encountered and to fix them.
FWIW, gdm debug logging was enabled via Enable=true in /etc/X11/gdm/custom.conf

Reproducible: 

Steps to Reproduce:
Comment 1 Florian Hubold 2015-12-13 17:05:05 CET
First issue encountered from reading the debug logs, gdm does try to start a wayland-type session by default. journalctl -u gdm excerpt:


Dec 13 10:17:06 localhost gdm[2115]: GdmDisplay: session type: wayland
Dec 13 10:17:06 localhost gdm[2115]: GdmSession: setting session to type 'wayland'
Dec 13 10:17:06 localhost gdm[2115]: GdmSession: type wayland, program? yes, seat seat0
Dec 13 10:17:06 localhost gdm-launch-environment][2169]: GdmSessionWorker: Set PAM environment variable: 'XDG_SESSION_TYPE=wayland'
Dec 13 10:17:06 localhost gdm[2115]: GdmSession: type wayland, program? yes, seat seat0
Dec 13 10:17:06 localhost gdm[2115]: GdmSession: type wayland, program? yes, seat seat0
Dec 13 10:17:06 localhost gdm-launch-environment][2169]: GdmSessionWorker: start program: /usr/libexec/gdm-wayland-session "/usr/bin/gnome-session --autostart /usr/share/gdm/greeter/autostart --debug  --session gnome-wayland"
Dec 13 10:17:06 localhost gdm-launch-environment][2169]: GdmSessionWorker: opening session for program '/usr/libexec/gdm-wayland-session'


This can be fixed in /etc/X11/gdm/custom.conf by uncommenting "WaylandEnable=false" and we should probably default to that, no?


Second issue, which seems to be the main cause for GDM not being able to start an X session is that it cannot write the Xorg.0.log file. Judging by the weird location ( /var/lib/gdm/.local/share/xorg/Xorg.0.log ), and reading documentation¹ I'd say this comes from some malformed value passed by PAM.

1: https://help.gnome.org/admin/gdm/3.18/gdm.html#logging

Dec 13 10:17:07 localhost gdm-x-session[2221]: Preparing auth file for X server
Dec 13 10:17:07 localhost gdm-x-session[2221]: Running X server
Dec 13 10:17:07 localhost /usr/libexec/gdm-x-session[2221]: (EE)
Dec 13 10:17:07 localhost /usr/libexec/gdm-x-session[2221]: Fatal server error:
Dec 13 10:17:07 localhost /usr/libexec/gdm-x-session[2221]: (EE) Cannot open log file "/var/lib/gdm/.local/share/xorg/Xorg.0.log"
Dec 13 10:17:07 localhost /usr/libexec/gdm-x-session[2221]: (EE)

gdm will try to restart the X server like crazy, everytime failing to write the logfile and only stopping when it hits the soft ulimit for open files, which will usually be a little over 100 sessions as mentioned in comment 0.

CC: (none) => doktor5000, mageia, mageia, olav, thierry.vignaud
Whiteboard: (none) => 6dev1

Florian Hubold 2015-12-13 17:07:02 CET

Blocks: (none) => 15527

Comment 2 Florian Hubold 2015-12-13 17:21:02 CET
Created attachment 7280 [details]
output of "journalctl -ab|grep -i gdm|grep -v audit" with gdm debug logging enabled

output of "journalctl -ab|grep -i gdm|grep -v audit" with gdm debug logging enabled

This may not be 100% representable but should give a good overview on what actually happens and where stuff breaks.
Comment 3 Olav Vitters 2015-12-14 00:16:16 CET
FYI: gdm and pam are too low level for me to understand. With that said, looked at it anyway.

Last time I GDM issues were caused by PAM. For Mageia, I don't think anything other than prefdm is supported? I use gdm using prefdm. My gdm uses Wayland. It doesn't start any session due to pam though.

An strace -ff would be helpful IMO.

Seems gnome-shell is initially crashing due to driver problem. Then later it starts X and that has issues as well.

For /var/lib/gdm/.local/share/xorg/Xorg.0.log; it is weird that it is placed here, but there shouldn't be any permissions errors for this. This is the $HOME for gdm user. Please check this directories permissions.

There's also a lot of vt1 vs vt7 weirdness going on

Interestingly enough, I didn't see the usual PAM error messages I usually get!


Dec 13 10:17:06 localhost /usr/libexec/gdm-wayland-session[2185]: (gnome-shell:2197): GLib-GObject-CRITICAL **: g_object_unref: assertion 'G_IS_OBJECT (object)' failed
Dec 13 10:17:06 localhost /usr/libexec/gdm-wayland-session[2185]: pci id for fd 10: 80ee:beef, driver (null)
Dec 13 10:17:06 localhost /usr/libexec/gdm-wayland-session[2185]: pci id for fd 10: 80ee:beef, driver (null)
Dec 13 10:17:06 localhost /usr/libexec/gdm-wayland-session[2185]: pci id for fd 10: 80ee:beef, driver (null)
Dec 13 10:17:06 localhost /usr/libexec/gdm-wayland-session[2185]: (gnome-shell:2197): Clutter-CRITICAL **: Unable to initialize Clutter: Unable to initialize the Clutter backend
Dec 13 10:17:06 localhost /usr/libexec/gdm-wayland-session[2185]: (gnome-shell:2197): mutter-WARNING **: Unable to initialize Clutter.
Dec 13 10:17:06 localhost /usr/libexec/gdm-wayland-session[2185]: gnome-session-binary[2188]: DEBUG(+): GsmAutostartApp: (pid:2197) done (status:1)
Dec 13 10:17:06 localhost /usr/libexec/gdm-wayland-session[2185]: gnome-session-binary[2188]: WARNING: App 'org.gnome.Shell.desktop' exited with code 1
Comment 4 Olivier Blin 2015-12-14 00:34:40 CET
The Xorg log file path does not look weird, this is the usual gdm home directory.
Comment 5 Olivier Blin 2015-12-14 00:36:23 CET
The wayland errors seem to come from a VirtualBox VM (80ee:beef IDs), but I am not sure if we have 3D and EGL working by default on vbox.
This is probably a requirement for a wayland session, since I think only the wayland-egl API is supported.
Comment 6 Olivier Blin 2015-12-14 00:38:13 CET
About the Xorg permissions issues, what is the result of:
    ls -lRa /var/lib/gdm/.local
Comment 7 Florian Hubold 2015-12-14 02:50:13 CET
(In reply to Olav Vitters from comment #3)
> Last time I GDM issues were caused by PAM. For Mageia, I don't think
> anything other than prefdm is supported? I use gdm using prefdm. My gdm uses
> Wayland. It doesn't start any session due to pam though.

Well, prefdm isn't really what I would call "supported". It's ugly and it should die, disabling/masking it and enabling the native gdm.service should work too, at least it does for Olivier :)
But I can try again, this time not disabling prefdm.

(In reply to Olav Vitters from comment #3)
> An strace -ff would be helpful IMO.

Sure :) Problem is it currently locks up hard when I try e.g.
strace -o gdm_strace.log -ff /usr/sbin/gdm
and the live images don't have strace installed. But maybe I can grab the strace logs via ssh or otherwise from a fresh cauldron install, we'll see. 


When I tried via strace earlier I noticed something else, too:
/var/run/gdm does not exist and gdm will try to create it.

According to the earlier strace output it should have the same owner/group/permissions as /var/log/gdm, that is root:gdm and 0711 permissions.
Otherwise gdm will create and chown/chmod that folder.


(In reply to Olivier Blin from comment #5)
> The wayland errors seem to come from a VirtualBox VM (80ee:beef IDs), but I
> am not sure if we have 3D and EGL working by default on vbox.
> This is probably a requirement for a wayland session, since I think only the
> wayland-egl API is supported.

So nowadays we need real openGL/EGL support just to start a DM? This could be a similar problem with SDDM as seen in bug 17221 - SDDM fails to start because it cannot create an GL surface via Qt/XCB. But for my installed cauldron VM which has 3D acceleration disabled it works just fine the whole time ... Maybe some issue with mesa/xcb/libdrm/ or something else in the stack? I'm lost somehow :/

Will test again on my physical box.


(In reply to Olivier Blin from comment #6)
> About the Xorg permissions issues, what is the result of:
>     ls -lRa /var/lib/gdm/.local

For the permissions, /var/lib/gdm itself has 1770 permissions and belongs to gdm:gdm which looks OK to me, there shouldn't be an issue creating files there.
Although the missing parent folder xorg/ might be an issue if it only tries to create a file in there - that will fail. Added it manually for testing reasons now, but results are still the same.

[user@localhost]â[20:44:42]â[~] sudo ls -lRa /var/lib/gdm/.local
/var/lib/gdm/.local:
total 28
drwxr----- 3 gdm  gdm   4096 Nov 17 15:17 .
drwxrwx--T 5 gdm  gdm  20480 Nov 17 15:17 ..
drwxr-xr-x 4 root root  4096 Dec 13 20:34 share

/var/lib/gdm/.local/share:
total 16
drwxr-xr-x 4 root root 4096 Dec 13 20:34 .
drwxr----- 3 gdm  gdm  4096 Nov 17 15:17 ..
drwxr-x--T 2 gdm  gdm  4096 Nov 17 15:17 applications
drwxr-x--T 2 gdm  gdm  4096 Dec 13 20:34 xorg

/var/lib/gdm/.local/share/applications:
total 8
drwxr-x--T 2 gdm  gdm  4096 Nov 17 15:17 .
drwxr-xr-x 4 root root 4096 Dec 13 20:34 ..

/var/lib/gdm/.local/share/xorg:
total 8
drwxr-x--T 2 gdm  gdm  4096 Dec 13 20:34 .
drwxr-xr-x 4 root root 4096 Dec 13 20:34 ..
-rw-r--r-- 1 gdm  gdm     0 Dec 13 20:34 Xorg.0.log
Comment 8 Florian Hubold 2015-12-14 03:34:18 CET
Yay! Managed to grab the strace -ff logs - who wants the xz-compressed 4MB tarball containing the logs for the ~4500 forked processes? :p
Comment 9 Olav Vitters 2015-12-14 16:03:46 CET
Please send to ovitters@gmail.com or olav@vitters.nl.
Comment 10 Olav Vitters 2015-12-14 16:06:15 CET
btw, try making /var/lib/gdm/.local/share/xorg chmod 777 (and no special bit). I'm wondering if it starts xorg as root or maybe as gdm.
Comment 11 Olav Vitters 2015-12-14 21:02:24 CET
1. Dumped Mageia-6-dev1-LiveDVD-GNOME-x86_64-DVD.iso to an USB stick
2. Booted from it
3. Went to a shell
4. Logged in as root on a tty
5. Stopped prefdm and gdm (both seem to start?)
6. Logged in as live on a tty
7. Started prefdm in the root tty
7. GDM starts, then complains (size allocation thing)
Comment 12 Olav Vitters 2015-12-14 21:44:58 CET
Only logging in as live user is enough.

Noticed gdm service is started instead of prefdm, while prefdm is marked as dead. This is different from what it used to be. Whomever is changing things, please assist. It is pretty frustrating to stab in the dark!

I copied the gdm pam.d files from gdm 3.14 RPM. Still failed to login.

No clue.
Comment 13 Olav Vitters 2015-12-14 22:39:38 CET
http://pkgs.fedoraproject.org/cgit/xorg-x11-server.git/tree/0001-Fedora-hack-Make-the-suid-root-wrapper-always-start-.patch

I think we need to replace the xwrapper logic with what Fedora has.
Comment 14 Florian Hubold 2015-12-14 22:48:53 CET
Tried today again on my physical box, same results as described in comment #0 - gdm does not start but spawns hundreds of sessions. Manual start from runlevel 3 via "systemctl start prefdm" or "systemctl start gdm" results in the same issue.

@Olav: Uploaded the compressed strace logs to https://mega.nz/#!dYdkyQBR so other can also take a look and no need to sending them back and forth.

FWIW I briefly checked through the first of the forked processes, and looks OK to me. gdm tries to ping and end plymouth, which doesn't seem to succeed, then gdm-session-worker seems to start, then a lot of dbus communication for negotiation of filehandles, some dbus services are queried, then /usr/libexec/gdm-wayland-session seem to get started, then the wayland session tries to run /usr/bin/gnome-session ... but I don't see why it doesn't work.

First start of wayland-session/gnome-session happens in gdm_strace.log.2560

Maybe the strace logs would be more useful with at least relative timestamps or normal timestamps were used, and not using -ff so we don't end up with thousands of logs ... any preferences?

(In reply to Olav Vitters from comment #11)
> 1. Dumped Mageia-6-dev1-LiveDVD-GNOME-x86_64-DVD.iso to an USB stick
> 2. Booted from it
> 3. Went to a shell
> 4. Logged in as root on a tty

Meaning you boot to runlevel 3, or do a regular boot and then login as root on a different tty? And which tty? tty1 or tty2? Or any other? Details might matter.

> 5. Stopped prefdm and gdm (both seem to start?)
> 6. Logged in as live on a tty
> 7. Started prefdm in the root tty

Blino mentioned something like that, your "workaround" might explain why you don't see the VT allocation issues I seem to see.

> 7. GDM starts, then complains (size allocation thing)

I'd be happy to even get it to start at all :/
Comment 15 Olav Vitters 2015-12-14 23:26:31 CET
The "maybe do not have console permissions" is from /usr/bin/Xwrapper. Which is a Mageia patch. The way of starting X probably changed or something and Xwrapper now cannot deal with that. Next up is trying to use the suid wrapper from xorg, building that locally and trying it on my normal system (not on the live thing).

I think there are multiple issues though:
1. These problems starting X with "console permissions"
2. Something else

The debugging is difficult because some see #1, some see #2. There might be others.


FYI: Boot live cd, ctrl-alt-f2 or f3 (really doesn't matter), type in live, wait a bit.
Comment 16 Olav Vitters 2015-12-16 11:15:51 CET
(In reply to Florian Hubold from comment #14)
> @Olav: Uploaded the compressed strace logs to https://mega.nz/#!dYdkyQBR so
> other can also take a look and no need to sending them back and forth.

If I try to download that it says the decompression key is missing.

> Maybe the strace logs would be more useful with at least relative timestamps
> or normal timestamps were used, and not using -ff so we don't end up with
> thousands of logs ... any preferences?

I like thousands of lines! 

But I guess -ff was wrong. I thought -ff just follows forks. I actually often use it as strace -ff $SHOMETHING 2> ~/tmp/my-log. From reading the manpage, seems -f is way better.

strace -f -ttt -s 2000 ?

Note that I think I had a different problem and for me changing x11-server to use the upstream suid wrapper instead of Xwrapper solved things. It might introduce new bugs though (noticed Fedora specific patches). E.g. needed a patch to xinitrc (didn't do yet because Fedora and Mageia patches don't align.. pff).
Comment 17 Florian Hubold 2015-12-16 13:50:47 CET
(In reply to Olav Vitters from comment #16)
> (In reply to Florian Hubold from comment #14)
> > @Olav: Uploaded the compressed strace logs to https://mega.nz/#!dYdkyQBR so
> > other can also take a look and no need to sending them back and forth.
> 
> If I try to download that it says the decompression key is missing.

Argh, misinterpreted how link exporting works, please try again:
https://mega.nz/#!gcthSIzD!v44cTwG5HJajLyU7yb1u_bSun4nvGyWwGZ4a7Uvgg7I and click on "download in browser".

> But I guess -ff was wrong. I thought -ff just follows forks. I actually
> often use it as strace -ff $SHOMETHING 2> ~/tmp/my-log. From reading the
> manpage, seems -f is way better.

Actually it worked this way as I used -o to write only the strace output into files, and then it writes separate files for each forked process with the PID in the filename.

> 
> strace -f -ttt -s 2000 ?
> 
> Note that I think I had a different problem and for me changing x11-server
> to use the upstream suid wrapper instead of Xwrapper solved things. It might
> introduce new bugs though (noticed Fedora specific patches). E.g. needed a
> patch to xinitrc (didn't do yet because Fedora and Mageia patches don't
> align.. pff).

Updated my fresh cauldron installation to new gdm build yesterday but the x11-server build wasn't on my mirror yet. Tried again today with the FC patches tv added, but no luck.
Comment 18 Olav Vitters 2015-12-16 14:53:03 CET
(In reply to Florian Hubold from comment #17)
> Updated my fresh cauldron installation to new gdm build yesterday but the
> x11-server build wasn't on my mirror yet. Tried again today with the FC
> patches tv added, but no luck.

I didn't submit it because I wanted tv to review it. Could you try building it locally and test? Else just submit x11-server to updates_testing? Probably requires the related change to xinitrc else I might break other peoples setup.
Comment 19 Thierry Vignaud 2015-12-16 16:05:45 CET
Actully I build it locally this morning.
A normal user can still run startx.
So just submit...
Comment 20 Florian Hubold 2015-12-16 22:12:03 CET
\o/ SUCCESS.

With x11-server-xorg-1.18.0-10.mga6 which now provides /usr/bin/X as /usr/libexec/Xorg.wrap, gdm start works just fine, and also gdm succeeds to start a gnome session. No PAM problems visible.


From what I can see, gdm starts gdm-session-worker, which starts gdm-x-session, which starts gnome-session which starts the gdm greeter which runs /usr/libexec/Xorg on vt1.


Currently trying to get some strace logs for comparison.


I'll mark this as fixed for now, but keep it under observation as this may reappear. Thanks to everyone involved who helped fix this :)

Status: NEW => RESOLVED
Resolution: (none) => FIXED

Samuel Verschelde 2017-01-17 10:29:39 CET

Blocks: 15527 => (none)


Note You need to log in before you can comment on or make changes to this bug.