Bug 28643

Summary: systemd(1) kills kwin_x11 after timeout when system shuts down, makes shutdown very long
Product: Mageia Reporter: Mészáros Csaba <csablak>
Component: RPM PackagesAssignee: KDE maintainers <kde>
Status: NEW --- QA Contact:
Severity: normal    
Priority: Normal CC: andrewsfarm, davidwhodgins, ouaurelien
Version: 8   
Target Milestone: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Source RPM: kwin-5.20.4-3.mga8.src.rpm CVE:
Status comment: Drivers? Compositor settings? Why session-4.scope? Multi-user?
Attachments: slow shutdown
shutdown from LDM
shutdown debug

Description Mészáros Csaba 2021-03-25 14:12:33 CET
Description of problem:
Unfortunately I don't know why but the system shuts down takes a long time. 1.30minute
According to the journalctl log, kwin-x11 is faulty.

session-4.scope: Killing process 1835 (kwin_x11) with signal SIGKILL.

journalctl | grep kwin - partial

márc 24 11:03:00 csablakPC.home systemd[1]: session-6.scope: Stopping timed out. Killing.
márc 24 11:03:00 csablakPC.home systemd[1]: session-6.scope: Killing process 2981 (kwin_x11) with signal SIGKILL.
márc 24 11:03:00 csablakPC.home systemd[1]: session-6.scope: Killing process 108757 (FreezeDetector) with signal SIGKILL.
márc 24 11:03:01 csablakPC.home systemd[1]: session-6.scope: Failed with result 'timeout'.
Comment 1 Lewis Smith 2021-03-25 19:28:59 CET
*** Bug 28644 has been marked as a duplicate of this bug. ***
Comment 2 Lewis Smith 2021-03-25 19:37:33 CET
See also: https://bugs.mageia.org/show_bug.cgi?id=28644#c0

Sorry for the inconvenience of this problem. I suggest the following procedure to get the evidence:
- Just before shutting down the system, note exactly the time.
- Shut down (& note roughly how long that takes).
- Reboot.
- As root (you see more) do:
 journalctl --no-hostname -b -1
- From the paged output, copy/paste the last part of the journal of the previous session, from the time you noted, to a text file.
- If that is not very big, attach that file to this bug.
  If it is very big, compress it first with 'xz' before attaching it.

CC: (none) => lewyssmith
Status: NEW => NEEDINFO

Comment 3 Mészáros Csaba 2021-03-25 20:12:30 CET
Created attachment 12517 [details]
slow shutdown
Comment 4 Aurelien Oudelet 2021-03-25 20:20:46 CET
Thanks providing the file.

I did see:
márc 25 19:52:15 systemd[1]: NetworkManager-wait-online.service: Succeeded.
márc 25 19:52:15 systemd[1]: Stopped Network Manager Wait Online.

TIME OUT HERE from systemd which kill session-4.scope (MY add here)

márc 25 19:53:44 systemd[1]: session-4.scope: Stopping timed out. Killing.
márc 25 19:53:44 systemd[1]: session-4.scope: Killing process 1845 (kwin_x11) with signal SIGKILL.
márc 25 19:53:44 systemd[1]: session-4.scope: Failed with result 'timeout'.
márc 25 19:53:44 systemd[1]: Stopped Session 4 of user csablak.
márc 25 19:53:44 systemd[1]: session-4.scope: Consumed 25.919s CPU time.

According to the system log, we see it kills kwin_x11.


Normally, Plasma should kill itself all relevant process it owns. Here, the Windows Manager and Compositor (kwin_x11) seems to not respond to signal other than kill.

This CAN be related to drivers errors or wrong settings for compositor.

I did see that your system uses lightdm as Desktop Manager instead of sddm, why? By choice? Other Desktops installed? Can you even try to use sddm, more Plasma friendly?

Source RPM: kwin-5.20.4-3.mga8.src.rpm => (none)
CC: (none) => ouaurelien

Aurelien Oudelet 2021-03-25 20:22:51 CET

Summary: the system shuts down takes a long time => systemd(1) kills kwin_x11 after timeout when system shuts down
Source RPM: (none) => kwin-5.20.4-3.mga8.src.rpm
Status comment: (none) => Drivers? Compositor settings? Why session-4.scope? Multi-user?

Comment 5 Lewis Smith 2021-03-25 20:39:56 CET
Yes, thank you Csaba for that journal.
It is in two sections:
 19:52:01 - 19:52:15
and
 19:53:44 - 19:53:45
 The gap is where the delay is:
márc 25 19:52:15 systemd[1]: NetworkManager-wait-online.service: Succeeded.
márc 25 19:52:15 systemd[1]: Stopped Network Manager Wait Online.
        ^v^v^v^v
márc 25 19:53:44 systemd[1]: session-4.scope: Stopping timed out. Killing.
márc 25 19:53:44 systemd[1]: session-4.scope: Killing process 1845 (kwin_x11) with signal SIGKILL.
márc 25 19:53:44 systemd[1]: session-4.scope: Failed with result 'timeout'.
márc 25 19:53:44 systemd[1]: Stopped Session 4 of user csablak.
márc 25 19:53:44 systemd[1]: session-4.scope: Consumed 25.919s CPU time.

I would be surprised if the display manager mattered. I forgot to ask Csaba before for his basic system information:
 $ inxi -Sxx
Comment 6 Mészáros Csaba 2021-03-25 21:14:28 CET
$ inxi -Sxx 
System:    Host: csablakPC.home Kernel: 5.10.25-desktop-1.mga8 x86_64 bits: 64 compiler: gcc v: 10.2.1 
           Desktop: KDE Plasma 5.20.4 tk: Qt 5.15.2 wm: kwin_x11 dm: LightDM Distro: Mageia 8 mga8 

Lightdm - Because sddm is difficult to configure. I had no problems with Mageia7. Lightdm is not reconciled to wayland alone.
Unfortunately, I encountered many errors. I had to disable many factory-configured systemd.service daemons. Because I was full of red error in the journal logs.
Ehci ohci was also a problem, but I know the solution. 
https://bugs.mageia.org/show_bug.cgi?id=3667 #36

GoogleEarth segfault. But I already know it's because of lib64curl4. Haven't I even watched this announced by anyone?
I had to install an earlier version of curl and so it works now.
It may not be ideal.
The mpv has changed so much compared to the one in Mageia7 that I also had to carve my scripts based on it a bit.
But by reporting bugs, we will eliminate all of this. 
Come on guys!
Comment 7 Lewis Smith 2021-03-25 21:37:54 CET
Assigning to the KDE team for starters.

CC: lewyssmith => (none)
Status: NEEDINFO => NEW
Summary: systemd(1) kills kwin_x11 after timeout when system shuts down => systemd(1) kills kwin_x11 after timeout when system shuts down, makes shutdown very long
Assignee: bugsquad => kde

Comment 8 Mészáros Csaba 2021-03-25 22:16:43 CET
Well. I noticed an interesting thing. If I log out to Lightdm first and from there I stop or restart the machine, then there is no that 1.5 minute wait. It stops quickly.
Comment 9 Mészáros Csaba 2021-03-26 15:42:51 CET
Created attachment 12523 [details]
shutdown from LDM

shutdown from LDM. This is a log file where I logged out first and shut down the machine from Lightdm.
There was no waiting.
But it's so tiring. After all, all it took was to press the Power button and the machine stopped. Of course it stops now, just so slowly.
Comment 10 Mészáros Csaba 2021-03-28 21:36:50 CEST
Created attachment 12542 [details]
shutdown debug

I made a more detailed log to see if smarter people would figure out where the error was.
To do this, I put a small script in the systemd shutdown folder.
/usr/lib/systemd/system-shutdown/debug.sh
#!/bin/sh
mount -o remount,rw /
dmesg > /shutdown-log.txt
mount -o remount,ro /

And I did the startup with the following parameters
GRUB_CMDLINE_LINUX_DEFAULT="nosplash quiet nopti noiswmd resume=UUID=1ef96f4c-097b-432b-b736-5e1f62a83956 systemd.log_level=debug systemd.log_target=kmsg log_buf_len=1M printk.devkmsg=on enforcing=0 audit=0 vga=34C"

The result in the attachment.

Nevertheless I think I will have to put up an entropy daemon (haveged).
Comment 11 Mészáros Csaba 2021-03-29 22:26:56 CEST
I think so has the reason of the mistake. The logos show that there were problems with the pipewire. Maybe it didn't load properly. 

pipewire-media-session[1515]: error id:0 seq:158 res:-32 (Broken pipe): connection error

I also encountered authentication problems.

polkitd[727]: Unregistered Authentication Agent for unix-session:4 (system bus name :1.39, object path /org/kde/PolicyKit1/AuthenticationAgent, locale hu_HU.UTF-8)

So I removed the pipewire and its dependencies as well and then the auto-orphans.
This also took flatpak, ostree, xdg-desktop-portal-kde | gtk, but I did not care.
And now the shutdown is lightning fast, and according to the log:

Registered Authentication Agent for unix-session:4 (system bus name :1.40 [/usr/libexec/polkit-kde-authentication-agent-1], object path /org/kde/PolicyKit1/AuthenticationAgent, locale hu_HU.UTF-8)

I don’t know what the connections are, but it stays that way with me now.

I only found 1 problem: 
modprobe: ERROR: missing parameters. See -h.

But journalctl -b0 does not write which module is the problem.
Comment 12 Thomas Andrews 2021-03-31 17:47:12 CEST
(In reply to Mészáros Csaba from comment #6)
There is a workaround for running Google Earth Pro> 
> GoogleEarth segfault. But I already know it's because of lib64curl4. Haven't
> I even watched this announced by anyone?
> I had to install an earlier version of curl and so it works now.
> It may not be ideal.
> The mpv has changed so much compared to the one in Mageia7 that I also had
> to carve my scripts based on it a bit.
> But by reporting bugs, we will eliminate all of this. 
> Come on guys!

There is a workaround for getting Google Earth Pro working with Mageia 8, without the old version of lib64curl. 

See https://bugs.mageia.org/show_bug.cgi?id=28018#c17

CC: (none) => andrewsfarm

Comment 13 Mészáros Csaba 2021-04-30 09:38:02 CEST
Thanks for your advice on Google Earth. As for the length of the shutdown time, unfortunately I still don’t know what causes it. There are times when it is good, but not several times. I wrote too fast that it was okay. Well, no.
I switched to sddm, but the same problem with it.
But as I wrote, if I log out first and turn off the machine from the dm, everything is fine. But it's maceric.
Maybe I need to write a few lines in a script to logout first and then poweroff? Ugly hack. How do I find the problem?
Comment 14 Mészáros Csaba 2021-07-07 20:19:44 CEST
This mistake is annoying. There is no solution?
Comment 15 Dave Hodgins 2021-07-07 22:41:26 CEST
The kwin program is designed to keep running after the user logs off, to speed
up start up if the user logs in again without rebooting. That's a design choice
made by kde plasma.

Until kde plasma is redesigned to more fully integrate with systemd (assuming
it will be someday), systemd uses the defaults for non systemd services.

The defaults are conservative to allow usage on older systems with slow hard
drives and large databases that need to be synced, etc.

As a workaround, they can be reduced by overriding the defaults. See the
comments at the top of /etc/systemd/user.conf and /etc/systemd/system.conf

CC: (none) => davidwhodgins