Bug 14682 - poweroff or reboot blocks if I had first logged in and out as root in runlevel 3. Also "stop job" timeouts of various sorts even in X.
Summary: poweroff or reboot blocks if I had first logged in and out as root in runleve...
Status: RESOLVED OLD
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: 4
Hardware: i586 Linux
Priority: Normal major
Target Milestone: ---
Assignee: Mageia Bug Squad
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-11-28 19:02 CET by w unruh
Modified: 2015-10-27 06:59 CET (History)
0 users

See Also:
Source RPM:
CVE:
Status comment:


Attachments
journalctl -a output during reboot with stop job notice (152.13 KB, application/octet-stream)
2014-11-28 19:40 CET, w unruh
Details
ps -auxww before reboot to see what jobs are running (9.93 KB, text/plain)
2014-11-28 19:53 CET, w unruh
Details

Description w unruh 2014-11-28 19:02:24 CET
Description of problem:
I have my system coming up in runlevel 3.
If I log in as myself as user, I can run poweroff without problem. However, if I log in as root first, log out of root, log in as user, I get 

User root is logged in on tty1
Please retry opeation after closing inhibitors and logging out other users.
Alternatively ignore inhibitors and users with 'systemctl reboot -i'.

root is NOT logged in on tty1. I logged out and logged in as user unruh on tty1.
Ie, something is not cleaning up after itself properly on logout. 

If I then log out as user, log back in as root, and do reboot, I get no message but do get a LONG timeout wait before the system shuts down.


Version-Release number of selected component (if applicable):
Mageia 4.1 updated as of Nov 20. I do not know which item is responsible for cleaning up the logins and inhibitors (whatever they are). 

How reproducible:Always


Steps to Reproduce:
1.Set system up to boot to runlevel 3
2.Log in as root, and then log out as root. Log in as user.
3.run /usr/bin/poweroff as user
4. If I did not log in as root first but only as user, I get normal behaviour
and the system shuts down very quickly. If I logged in as root first, it takes
a long time to shut down.


Reproducible: 

Steps to Reproduce:
Comment 1 w unruh 2014-11-28 19:40:38 CET
Created attachment 5658 [details]
journalctl -a output during reboot with stop job notice

This is the journalctl -a file for the period between the issue of the reboot, and the end of the bootup of the system afterwards. 
Clearly the stop job is the line 
user@500.service stopping timed out. Killing
which took 90 sec to time out. (90 sec seems to be the universal systemd timeout)

I have no idea where user@500.service is located-- it seems to be a tempeorary .service file created by the systemd in some hidden location.
Comment 2 w unruh 2014-11-28 19:53:44 CET
Created attachment 5659 [details]
ps -auxww before reboot to see what jobs are running

Here is the ps -auxww file from before I issued the reboot call. 

Also, I found the temporary .service directory
/sys/fs/cgroup/systemd/user.slice/user-500.slics/user@500.service/
Looking at the file cgroup.procs I get 6998 and 7001, where 
6998 is /usr/lib/systemd/systemd --user
while
7001 is (sd-pam)
In notify_on_release there is 1
In tasks there is 6998 and 7001
I would seem that sd-pam is not shutting down properly.
(Just for completelness, cgroup_clone_children contains 0)
Comment 3 w unruh 2014-11-28 20:08:48 CET
Note that in journalctl -a, I get a bunch of 
"Failed to open private bus connection:Failed to connect to socket /run/user/500/dbus/user_bus_socket"
(and teh same with user/0/ if I logged in as root) 
messages. Could this be the problem? Note that there is no directory "dbus" under either /run/user/500 or /run/user/0
Comment 4 Bit Twister 2014-11-28 20:54:39 CET
(In reply to w unruh from comment #3)
> Note that in journalctl -a, I get a bunch of 
> "Failed to open private bus connection:Failed to connect to socket
> /run/user/500/dbus/user_bus_socket"
> (and teh same with user/0/ if I logged in as root) 
> messages. Could this be the problem? Note that there is no directory "dbus"
> under either /run/user/500 or /run/user/0

Yeah, I have seen lots of those kinds of errors when using sudo su -l some_user.
I thought it was a sudo problem but nobody else was complaining.

My solution: created /etc/profile.d/xx_local.sh to create those directories and to get cleaner/smaller journals. xx_local.sh was chosen so it would run last.
Snippet from my /etc/profile.d/xx_local.sh 

      
      if [ -z $XDG_RUNTIME_DIR ] ; then
        export XDG_RUNTIME_DIR=/run/user/$(id --user)
      fi
      
      for _d in dconf dbus pulse gvfs systemd ; do
        mkdir -p $XDG_RUNTIME_DIR/$_d 2> /dev/null
        chmod 700 $XDG_RUNTIME_DIR/$_d 2> /dev/null
      done
      touch $XDG_RUNTIME_DIR/systemd/nofify
      chmod 700 $XDG_RUNTIME_DIR 2> /dev/null

CC: (none) => junknospam

Comment 5 w unruh 2014-11-28 23:41:07 CET
The problem is that every time you  go to a new user, you need that directory there. Ie, it seems to be a problem with either the pam or the login program that it is not creating these directories. But your kludge might fix it if it is early enough. I made it 99local.sh
I am not sure what your xxlocal.sh meant-- is xx literal or supposed to stand for some number?

The other problem seems that it is sd_pam run from systemd which is the program which is not exiting. I will try to see if putting in the above script will fix this.
Comment 6 w unruh 2014-11-28 23:46:34 CET
Nope, just rebooted and I get the same error. 
Failed to open privat bus connection" Failed to connect to socket /run/user/500/dbus/user_bus_socket: No such file or directory. 
It seems that something should have opened that socket and has not done so. 

Also making those directories does not solve the problem of 
A stop job is running for User Manager for 500
giving a 90 sec timeout for the poweroff/shutdown.

I have not tried the "login as root and then as user" trick yet to see if it still blocks that.
Comment 7 Bit Twister 2014-11-29 00:29:34 CET
(In reply to w unruh from comment #6)

> Nope, 99local.sh did not fix the error, just rebooted and I get
> the same error. 

Looking around I just found where I created those directories for all my users during boot.

I do remember that rc.local was not the best place to run my create_run_user script. Had to add it to a custom service I created.


> It seems that something should have opened that socket and has not done so. 

I was guessing the Desktop Manager startup created it.

> Also making those directories does not solve the problem of 
> A stop job is running for User Manager for 500

Yup, saw those and numerous processes for users not logged in.
Those are features. :(

Solution in /etc/systemd/logind.conf

$ dif /var/local/vorig/etc/systemd/logind.conf_vinstall /etc/systemd/logind.conf
13c13,14
< #KillUserProcesses=no
---
> # changed by /local/bin/logind_conf_changes Tue 11 Nov 17:55 2014
> KillUserProcesses=yes



> I am not sure what your xxlocal.sh meant-- is xx literal or supposed
>  to stand for some number?

$ ls -l /etc/profile.d | tail -3
-rwxr-xr-x 1 root root  269 Oct 18 10:59 numlock.sh
-rw-r--r-- 1 root root 1940 Nov 11 04:14 vte.sh
lrwxrwxrwx 1 root root   22 Nov 11 16:42 xx_local.sh -> /local/bin/xx_local.sh

Create a post over in alt.os.linux.mageia and we can play around there instead of doing here in this bugreport.
Comment 8 w unruh 2014-11-29 21:35:43 CET
This bug would seem to be related to http://thread.gmane.org/gmane.comp.sysutils.systemd.devel/16363
but that was supposed to have been fixed in Feb 2014, and the mageia build was in Apr. If so that bug was not fixed. 

Note that none of Bittwister's suggestions made any difference to this bug.
Comment 9 w unruh 2014-11-30 00:23:06 CET
I have not figured out how to fix the first part-- namely that the system thinks root is still logged in although it is not, but have figured out how to get around the other-- namely that poweroff or reboot hands for 90 sec with the 
"a stop for User Manager kill" 
I place the line 
TimeoutSec=10
after KillSignal=SIGCONT in /usr/lib/systemd/system/user@.service

This means that the timeout is only 10 sec, rather than 90 sec. 
I have no idea if this has other sideeffects. It certainly does not cure the bug, but makes it less painful.
Comment 10 w unruh 2014-11-30 08:58:26 CET
Strangely I get no error and no timeout if I run poweroff or reboot from an X terminal (Ie, run startx, open a terminal and run poweroff)
Race condition? (eg the system is so busy with X that it gives systemd time to kill the user process)?
I also have no more idea how to debug these bugs.
Comment 11 w unruh 2014-11-30 20:50:27 CET
Sorry, the above comment is wrong. I just got the "A stop job.." timeout on a reboot run from within KDE. Strangely while I had put TimeoutSec=10 into 
/etc/systemd/system/user@.service (otherwise a copy of the one in /usr/lib/systemd)
this time the timeout took 25 sec. instead of the 10 it usually took.
Ie, it often does not give that error, if poweroff is run from kde, but sometimes it does. Ie, it sounds like some sort of race condition.
Comment 12 w unruh 2014-12-03 01:43:33 CET
Spoke too soon. Now, when I do reboot from X, I get
a stop job exists for user unruh c1 Session, which takes a long time to time out.

Yesterday I got the message that a stop job exists for Avahi. Each takes a long time to time out. It seems there really is a very deep problem in systemd.
w unruh 2014-12-03 01:45:01 CET

Summary: poweroff or reboot blocks if I had first logged in and out as root in runlevel 3 => poweroff or reboot blocks if I had first logged in and out as root in runlevel 3. Also "stop job" timeouts of various sorts even in X.

Comment 13 Samuel Verschelde 2015-09-21 13:22:09 CEST
Mageia 4 changed to end-of-life (EOL) status on 2015-09-19. It is is no longer 
maintained, which means that it will not receive any further security or bug 
fix updates.

Package Maintainer: If you wish for this bug to remain open because you plan to 
fix it in a currently maintained version, simply change the 'version' to a later 
Mageia version.

Bug Reporter: Thank you for reporting this issue and we are sorry that we weren't 
able to fix it before Mageia 4's end of life. If you are able to reproduce it 
against a later version of Mageia, you are encouraged to click on "Version" and 
change it against that version of Mageia. If it's valid in several versions, 
select the highest and add MGAxTOO in whiteboard for each other valid release.
Example: it's valid in cauldron and Mageia 5, set to cauldron and add MGA5TOO.

Although we aim to fix as many bugs as possible during every release's lifetime, 
sometimes those efforts are overtaken by events. Often a more recent Mageia 
release includes newer upstream software that fixes bugs or makes them obsolete.

If you would like to help fixing bugs in the future, don't hesitate to join the
packager team via our mentoring program [1] or join the teams that fit you 
most [2].

[1] https://wiki.mageia.org/en/Becoming_a_Mageia_Packager
[2] http://www.mageia.org/contribute/
Bit Twister 2015-09-23 11:42:15 CEST

CC: bittwister2 => (none)

Comment 14 Marja Van Waes 2015-10-27 06:59:03 CET
As announced over a month ago, Mageia 4 changed to end-of-life (EOL) status on 2015-09-19. It is is no longer maintained, which means that it will not receive any further security or bug fix updates.

This issue may have been fixed in a later Mageia release, so, if you still see it and didn't already do so: please upgrade to Mageia 5 (or, if you read this much later than this is written: make sure you run a currently maintained Mageia version)

If you are able to reproduce it against a maintained version of Mageia, you are encouraged to 
1. reopen this bug report, by changing the "Status" from "RESOLVED - OLD" to "REOPENED"
2. click on "Version" and change it against that version of Mageia. If you know it's valid in several versions, select the highest and add MGAxTOO in whiteboard for each other valid release.
Example: it's valid in cauldron and Mageia 5, set to cauldron and add MGA5TOO.
3. give as much relevant information as possible. If you're not an experienced bug reporter and have some time: please read this page:
https://wiki.mageia.org/en/How_to_report_a_bug_properly

If you see a similar issue, but are _not_sure_ it is the same, with the same cause, then please file a new bug report and mention this one in it (please include the bug number, too). 


If you would like to help fixing bugs in the future, don't hesitate to join the
packager team via our mentoring program [1] or join the teams that fit you 
most [2].
[1] https://wiki.mageia.org/en/Becoming_a_Mageia_Packager
[2] http://www.mageia.org/contribute/

Status: NEW => RESOLVED
Resolution: (none) => OLD


Note You need to log in before you can comment on or make changes to this bug.