Description of problem: systemd hangs forever when stopping a service journalctl will show Looping too fast. Throttling execution a little Version-Release number of selected component (if applicable): How reproducible: always Steps to Reproduce: 1. click up a terminal 2. su - root 3. xterm journalctl -fa --nopager 4. urpmi --auto --auto-update I can not remember if problem showed up after reboot or not but when you see Looping too fast. Throttling execution a little anytime a service package install does a stop/restart. I added /^systemd/ to /etc/urpmi/skip.list to prevent update and started over. Problem did not happen when running systemd-217 Reproducible: Steps to Reproduce:
Assignee: bugsquad => mageia
Source RPM: systemd-224-3.mag6.x86_64.src.rpm => systemd-225-1.mga6.src.rpm
installing lib64atspi-gir2.0-2.17.90-2.mga6.x86_64.rpm lib64atspi0-2.17.90-2.mga6.x86_64.rpm lib64atk-gir1.0-2.16.0-2.mga6.x86_64.rpm lib64atk1.0-devel-2.16.0-2.mga6.x86_64.rpm at-spi2-core-2.17.90-2.mga6.x86_64.rpm colord-1.2.11-2.mga6.x86_64.rpm lib64atk1.0_0-2.16.0-2.mga6.x86_64.rpm atk1.0-common-2.16.0-2.mga6.x86_64.rpm lib64colord2-1.2.11-2.mga6.x86_64.rpm lib64atk-bridge2.0_0-2.17.90-1.mga6.x86_64.rpm lib64sane1-1.0.24-11.mga6.x86_64.rpm from /var/cache/urpmi/rpms Preparing... ################# 440/2077: at-spi2-core ############################### 441/2077: lib64atspi0 ############################# 442/2077: lib64sane1 ########################### 443/2077: lib64colord2 ############################## 444/2077: colord ####################### ^Cwarning: %post(colord-1.2.11-2.mga6.x86_64) scriptlet failed, signal 2 Waited hours, before Ctrl c. # systemctl status colord colord.service - Manage, Install and Generate Color Profiles Loaded: loaded (/usr/lib/systemd/system/colord.service; static) Active: inactive (dead) colord.service, httpd, cups are normally disabled on my setup to speed shutdown. Default run level is three
Yeah, there is definitely an issue in cauldron. There will be 226 coming very soon, and I'll debug more thoroughly then!
Temporary workaround is to abort urpmi when systemd starts spooling "Looping too fast. Throttling execution a little" into the journal and reboot.
FWIW, there was an issue related to enabling/disabling units that had both systemd native units and legacy sysvinit scripts shipped (the latter being unnecessary in Mageia). This is now fixed in newer chkconfig. I thought there was a bug report about it, but I can't seem to find it now - the was the closest one I found, so figured I'd mention it here. Running 227 here so will push it shortly (needs new util-linux which I've also updated but not yet pushed).
(In reply to Colin Guthrie from comment #4) > FWIW, there was an issue related to enabling/disabling units that had both > systemd native units and legacy sysvinit scripts shipped (the latter being > unnecessary in Mageia). > > This is now fixed in newer chkconfig. I thought there was a bug report about > it, but I can't seem to find it now - the was the closest one I found, so > figured I'd mention it here. > > Running 227 here so will push it shortly (needs new util-linux which I've > also updated but not yet pushed). Can't wait. Guessing it is not fixing the problem but will fix a symptom. I hit control c to get past each systemd restart which indicates it is not just chkconfig scripts. Then again I bow to your knowledge of what is going on. If it runs in your setup, turn it loose in cauldron and I'll run another mga5->mag6 upgrade.
last 2 rounds of updates no longer hang. Could be because install procedure changed. Now running step 19 in bug 16894 because my guess was systemd hanging because dbus never returned completion and systemd went into looping wait.
Status: NEW => RESOLVEDResolution: (none) => FIXED
It could certainly have been the chkconfig thing. Either way, glad it's working now. I'm sure I'll have new problems for you soon tho' :)
So, FTR, I've also been seeing some problems with the "Looping too fast" thing recently. In systemd v227 it seems to show up for me pretty regularly related to jobs for starting and stopping various socket units related to rfkill switch handling. This has knock on effects and even blocks suspend, reboot, automounts and other types of jobs. I reported my issue here: https://github.com/systemd/systemd/issues/1579 but I suspect the root cause is the same as https://github.com/systemd/systemd/issues/1505 which has apparently existed since v225 but has become easier to trigger in v227. Reverting an old commit (see the latter link) seems to "fix" things for me, although that's not the proper "fix" long term.
(In reply to Colin Guthrie from comment #8) > but I suspect the root cause is the same as > https://github.com/systemd/systemd/issues/1505 which has apparently existed > since v225 but has become easier to trigger in v227. > > Reverting an old commit (see the latter link) seems to "fix" things for me, > although that's not the proper "fix" long term. I can agree. plugging "Looping too fast. Throttling execution a little" into https://encrypted.google.com/advanced_search seems to indicate to me there is a basic missed signal logic problem that have been there for awhile. Also would not hurt for it to back off to a ~5 second rate instead of every second just to conserve journal disk space. I have seen several intermittent service shutdown timeouts and have resorted to shutting down those services prior a system reboot/shutdown. I have been trying to hook my shutdown_services script into the shutdown service chain with no luck so far. Snippet of my script follows: #**************************** #* shutdown these apps first #**************************** _servers="mythbackend named postfix httpd dovecot stunnel mysqld chronyd dhcpd" pkill -f thunderbird # helps dovecot shutdown faster for _serv in $_servers ; do _state=$(systemctl is-active $_serv.service 2> /dev/null) if [ $? -eq 0 ] ; then if [ "$_state" = "active" ] ; then echo "Shutting down $_serv" systemctl stop $_serv _state=$(systemctl is-active $_serv.service 2> /dev/null) while [ "$_state" != "inactive" ] ; do sleep 1 _state=$(systemctl is-active $_serv.service 2> /dev/null) done fi fi done
I have this with systemd hands issue on Cauldron. This may be caused by suspend/resume. Should this report be reopened?
CC: (none) => krnekit
(In reply to Nikita Krupenko from comment #10) > I have this with systemd hands issue on Cauldron. This may be caused by > suspend/resume. Should this report be reopened? I certainly see it affecting suspend and resume but don't think it's specifically *caused* by it. Do you get the issue with "looping too fast" being logged every second in the journal? When this happens can you do a "systemctl list-jobs" and let me know what it contains (I suspect it'll have some .socket units in there, specifically the rfkill socket jobs). If so, I'm actively tracking this upstream.
Yes, I see this in journal. I'll try your command next time I'll encounter this issue. Ok, so I reopen it.
Status: RESOLVED => REOPENEDResolution: FIXED => (none)
Hi! Here is the list of jobs: JOB UNIT TYPE STATE 55225 NetworkManager-dispatcher.service start waiting 55224 systemd-rfkill.socket stop waiting 2 jobs listed. Network is working and connected through ethernet. The wi-fi is turned off right now.
Also, from journal: Dec 12 13:49:34 localhost.localdomain systemd[1]: sys-devices-virtual-misc-rfkill.device: Job sys-devices-virtual-misc-rfkill.device/start timed out. Dec 12 13:49:34 localhost.localdomain systemd[1]: Timed out waiting for device sys-devices-virtual-misc-rfkill.device. Dec 12 13:49:34 localhost.localdomain systemd[1]: Dependency failed for Load/Save RF Kill Switch Status. Dec 12 13:49:34 localhost.localdomain systemd[1]: systemd-rfkill.service: Job systemd-rfkill.service/start failed with result 'dependency'. Dec 12 13:49:34 localhost.localdomain systemd[1]: systemd-rfkill.socket: Unit is bound to inactive unit sys-devices-virtual-misc-rfkill.device. Stopping, too. Dec 12 13:49:34 localhost.localdomain systemd[1]: sys-devices-virtual-misc-rfkill.device: Job sys-devices-virtual-misc-rfkill.device/start failed with result 'timeout'. And then it starts "looping too fast". This happened after I resume suspended laptop.
Yeah definitely the same issue I'm getting too. Trying to nail it but keep missing upstream folks. It's something to do with reloading systemd (i.e. if you never call systemctl reload it doesn't seem to trigger - this will often happen as a result of installing/updating a package or enabling/disabling a unit etc). Anyway it's being tracked here: https://github.com/systemd/systemd/issues/1579 Not sure if it's Mageia Only, so will have to try some "purer" packages at some point (i.e. no downstream patches)
Yes, I usually found, that systemd hangs when installing new kernel (with broadcom-wl module) and post scriptlet hangs too.
When this hang occurs it can (usually) be cleared out by running the following command in another window: systemctl list-jobs | grep -vi JOB | sort -u | grep rfkill.*waiting | cut -d' ' -f1 | xargs systemctl cancel Rubbish work around, but I try and keep my system in this state a while so I can do various debugging (none of which is useful it seems) and this at least lets me carry on.
I'll try this next time. Usually, it makes unable for my system to reboot/power off (and sleep too) as it is done now by systemd, so I'm using SysRq keys for this :)
FWIW, a better workaround is to mask the relevant units. You lose the loading/saving of rfkill status but that's hardly a major headache compared to the bug! systemctl mask systemd-rfkill.service systemctl mask systemd-rfkill.socket systemctl mask sys-devices-virtual-misc-rfkill.device At some point you'll want to unmask them again, but this will make general operation much nicer in the mean time!
CC: (none) => doktor5000
Problem no longer occurs. Colin will have to decide if this bug is resolved.
Summary: mga6: systemd hangs forever when stopping a service => 6_s1: systemd hangs forever when stopping a service
Seems my rfkill issue also has a fix upstream now too (which I've applied). This could very easily have been a completely separate issue, so if you think it's fixed and I think the rfkill issue is fixed, then let's call this bug fixed!
Status: REOPENED => RESOLVEDResolution: (none) => FIXED