Description of problem: Using the classic DVD iso the system takes several minutes to closedown and restart. The Plymouth splashscreen is displayed with five bubbles then four. Ctrl-Alt-F1 shows the message "watchdog failed to stop" but in the end the system does reboot; presumably whatever was stalling it timed out. Version-Release number of selected component (if applicable): mga6-dev1 How reproducible: Same behaviour for three DEs and the sddm login screen. Steps to Reproduce: 1. 2. 3. Reproducible: Steps to Reproduce:
Yes, this has been happening in cauldron for a while now. I can't remember the last time a reboot didn't involve the SysRq keys - E causes a flurry of activity, but I is what puts it to bed. Colin, could we have a quick tutorial on how to collect diagnostics for these types of hang ?
There are at least 3 players in that problem, users, several services, and finally sddm. I opened bug 17530 - mga6: plymouth-reboot is delaying reboots for that one. For the user, disabling kwallet and setting KillUserProcesses=yes in /etc/systemd/logind.conf seems to help. For the services, I found shutting the main ones down before issuing reboot/shutdown/halt. Here is the current list for my install in my shutdown_servers script. _servers="mythbackend named postfix httpd dovecot stunnel mysqld chronyd dhcpd"
CC: (none) => bittwister2
Generally speaking you can ensure the debug shell is active, (boot with "systemd.debug-shell" on the kernel command line), this allows you to switch to tty9. When things are hanging, try switching there and seeing if there are any specific jobs listed (systemctl list-jobs). I suspect it could be the bug I'm struggling to find time to get to the bottom of that relates to .socket unit jobs clogging up the system (or at least related to it).
At least in my experience, these delays happen during shutdown, not startup. Once you do Alt-SysRq-E, it looks like journalctl was the blocker, and the ones that -I takes down appear to be unmounts of NFS drives, maybe because the network is already down ? Is any of the stuff that shows up on tty when you shutdown with ESC logged anywhere ?
Some more shutdown info... As shutdown starts, CUPS seems to take quite a while to shut down (but eventually does). The next thing that blocks is "Session c2 for ftg" (ftg is the user who initiated the reboot from plasma). If I'm reading this right, this will only time out in 1.5 minutes (1.5 minutes ?!), and it looks like that's what happens. Finally, there's the unmounting of the NFS partitions which appear to have timeouts of 3 minutes (3 minutes ?! Really ?). After -I, there are some messages about systemd-journald not having a restart timeout so it is being restarted.
CC: (none) => jyri2000
Made a log af last boot. Here the delay in poweroff process is 1 min 20 sec. According place in the log: jaan 25 17:32:46 localhost systemd[1]: Received SIGRTMIN+20 from PID 11008 (plymouthd). jaan 25 17:34:08 localhost systemd[1]: session-c5.scope: Stopping timed out. Killing. jaan 25 17:34:08 localhost systemd[1]: Stopped Session c5 of user acrux. jaan 25 17:34:08 localhost systemd[1]: session-c5.scope: Unit entered failed state. jaan 25 17:34:08 localhost systemd[1]: session-c2.scope: Stopping timed out. Killing. jaan 25 17:34:08 localhost systemd[1]: Stopped Session c2 of user acrux. jaan 25 17:34:08 localhost systemd[1]: session-c2.scope: Unit entered failed state. jaan 25 17:34:08 localhost systemd[1]: Removed slice User Slice of acrux. jaan 25 17:34:08 localhost systemd[1]: Stopping Login Service... So after successfull shutting down plymouthd the delay occures with stopping these sessions c5 and c2 of user acrux...
Created attachment 7455 [details] journalctl of long shutdown I've seen user sessions taking long to stop while shutting down, too, including last time I shut down: feb 14 00:13:20 cldrn_64 systemd[1]: session-c2.scope: Stopping timed out. Killing. last night, the two lines with the longest time between them in journalctl output were: feb 14 00:12:09 cldrn_64 systemd[1]: Received SIGRTMIN+20 from PID 30202 (plymouthd). feb 14 00:13:20 cldrn_64 systemd[1]: mandi.service: State 'stop-sigterm' timed out. Killing. all red lines in journalctl output were: feb 14 00:11:52 cldrn_64 login[29550]: pam_systemd(login:session): Failed to release session: Connection reset by peer feb 14 00:11:53 cldrn_64 nmbd[5728]: [2016/02/14 00:11:53.038574, 0] ../source3/nmbd/nmbd.c:58(terminate) feb 14 00:11:53 cldrn_64 nmbd[5728]: Got SIGTERM: going down... feb 14 00:11:51 cldrn_64 systemd-logind[790]: Failed to abandon session scope: Transport endpoint is not connected feb 14 00:11:52 cldrn_64 systemd-logind[790]: Failed to abandon session scope: Transport endpoint is not connected feb 14 00:11:54 cldrn_64 systemd[1]: Failed unmounting /RedHD2. feb 14 00:11:55 cldrn_64 cupsd[1327]: Unable to communicate with avahi-daemon: An unexpected D-Bus error occured Attaching full shutdown log
CC: (none) => marja11
@ Colin Assigning to you, since you were already looking into it. Feel free to reassign when needed :-)
Assignee: bugsquad => mageia
and assigning to systemd :-D
Component: Release (media or process) => RPM PackagesSource RPM: (none) => systemd
after commenting in this report, i shutdown my cauldron and it was amazingly fast! i didn't really look which updstes i had, only remomber a kernel oh, and i may have rebooted earlier today, after some updates so.... did this get fixed for all and for every reboot or shutdown?
(In reply to Marja van Waes from comment #10) > after commenting in this report, i shutdown my cauldron and it was amazingly > fast! > > > so.... did this get fixed for all and for every reboot or shutdown? The issue returned, probably within a week after I wrote this. For me, it is always that every Plasma5 session that was started from sddm has to reach a time out before being killed (on a side note: logging out and back into Plasma5, the previous session keeps "State=closing" for ever, which is the same state as for sddm). Anyway: shutting down: mrt 07 09:06:55 cldrn_64 systemd[1438]: Reached target Shutdown. mrt 07 09:06:55 cldrn_64 systemd[1438]: Starting Exit the Session... mrt 07 09:06:55 cldrn_64 sddm[972]: Authentication error: "Process crashed" mrt 07 09:06:55 cldrn_64 sddm[972]: Auth: sddm-helper crashed (exit code 15) mrt 07 09:06:55 cldrn_64 sddm[972]: Authentication error: "Process crashed" mrt 07 09:06:55 cldrn_64 sddm[972]: Auth: sddm-helper exited with 15 mrt 07 09:06:55 cldrn_64 sddm[972]: Socket server stopping... mrt 07 09:06:55 cldrn_64 sddm[972]: Socket server stopped. mrt 07 09:06:55 cldrn_64 sddm[972]: Display server stopping... mrt 07 09:06:55 cldrn_64 systemd-logind[886]: Failed to abandon session scope: Connection reset by peer mrt 07 09:06:55 cldrn_64 systemd-logind[886]: Failed to abandon session scope: Transport endpoint is not connected .......... mrt 07 09:08:25 cldrn_64 systemd[1]: session-c2.scope: Stopping timed out. Killing. mrt 07 09:08:25 cldrn_64 systemd[1]: Stopped Session c2 of user marja. Of course, the delay is n times 90 seconds if n times a Plasma5 session was started If I can still not reproduce this using XFCE and if this persists after the updates neoclust it pushing, then I'll file a separate bug report for this issue and let it block this one. However: should it then be assigned to neoclust or to coling?
CC: (none) => neoclust
Seems to be fixed? Shutdown or reboot from Plasma 5 desktop is very fast again...
Must be nice. I will agree time to login prompt has improved. Guessing because of a reduction in some timeout value. Plymouth start manager is still crashing on first attempt. Once that times out and gets restarted the actual login is still slow. vb=vritualbox. boot login mga5vb 0:11 0:21 mga6vb 0:22 0:34 mga6 0:47 1:51 real hardware
Should this bug be closed? Does anybody still see this. Not evident here any longer.
(In reply to Len Lawrence from comment #14) > Should this bug be closed? Does anybody still see this. Not evident here > any longer. Boot is still failing. journalctl | grep plymouth May 09 09:23:29 systemd[1]: Starting Show Plymouth Reboot Screen... May 09 09:23:30 systemd[1]: Started Show Plymouth Reboot Screen. May 09 09:24:03 dracut: Starting plymouth daemon May 09 09:24:03 systemd[1]: Starting Show Plymouth Boot Screen... May 09 09:24:04 plymouth[540]: error: unexpectedly disconnected from boot status daemon May 09 09:24:04 systemd[1]: plymouth-start.service: Main process exited, code=killed, status=11/SEGV May 09 09:24:04 systemd[1]: Failed to start Show Plymouth Boot Screen. May 09 09:24:04 systemd[1]: plymouth-start.service: Unit entered failed state. May 09 09:24:04 systemd[1]: plymouth-start.service: Failed with result 'signal'. May 09 09:24:06 systemd[1]: Starting Tell Plymouth To Write Out Runtime Data... May 09 09:24:06 systemd[1]: Started Tell Plymouth To Write Out Runtime Data.
(In reply to Bit Twister from comment #15) > (In reply to Len Lawrence from comment #14) > > Should this bug be closed? Does anybody still see this. Not evident here > > any longer. > > Boot is still failing. > journalctl | grep plymouth I'd suggest to open a new bug report for this. Personally I can't reproduce the issue in comment 0 anymore, even though I did have it back in January, so as far as I'm concerned this bug is fixed.
Status: NEW => RESOLVEDResolution: (none) => FIXEDSummary: Reboot from login or desktop takes a long time => Reboot from login or desktop takes a long time ("watchdog failed to stop")
(In reply to Rémi Verschelde from comment #16) Personally I can't reproduce > the issue in comment 0 anymore, even though I did have it back in January, > so as far as I'm concerned this bug is fixed. Likewise.
CC: (none) => zen25000