| Summary: | systemd detects ordering cycle on prefdm.service/start - in return deletes plymouth-quit.service/stop and getty@tty1.service/stop effectively preventing X to start | ||
|---|---|---|---|
| Product: | Mageia | Reporter: | Florian Hubold <doktor5000> |
| Component: | RPM Packages | Assignee: | Colin Guthrie <mageia> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | major | ||
| Priority: | Normal | CC: | doktor5000, hc |
| Version: | 5 | ||
| Target Milestone: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Source RPM: | systemd-217-11.mga5.src.rpm | CVE: | |
| Status comment: | |||
| Attachments: |
journalctl -ab output of boot with ordering cycle issues
journalctl -ab output of boot with ordering cycle issues in debug mode ( boot option systemd.log_level=debug ) proper journalctl -ab output of boot with ordering cycle issues in debug mode ( boot option systemd.log_level=debug ) |
||
|
Description
Florian Hubold
2015-06-30 21:39:49 CEST
Ideally a full log with systemd.log_level=debug on the kernel command line from the boot would be good. Created attachment 6796 [details] journalctl -ab output of boot with ordering cycle issues (In reply to Colin Guthrie from comment #2) > Ideally a full log with systemd.log_level=debug on the kernel command line > from the boot would be good. Yep, would probably be. Unfortunately the issue already vanished, and I've got no clue what particular action fixed it, only vague guess about rc.local. I've attached the journalctl -ab output when all the issues where there, after obtaining that I advised the user to fix some of his basic issues. This is what I asked him to do: - assign a hostname via "hostnamectl set-hostname" - disable rc.local via "systemctl mask rc-local" as it failed beforehand, so might have caused some ordering issues. This is what he did in addition: - unplug all external HDDs - unplug all USB devices - plug keyboard/mouse receiver directly into mainboard - move away /etc/rc.d/rc.local file somewhere else After several reboots and plugging his devices again as they were beforehand, everything is fine again. I assume that systemd doesn't really care about the hostname, so the only relevant change was masking rc.local ... So my question would be, for such ordering issues, booting with "systemd.log_level=debug" and then "journalctl -ab" output should be sufficient to check why such cycles appeared? CC:
(none) =>
doktor5000 Closing as the issue has vanished after masking rc.local ... :/ Status:
NEW =>
RESOLVED (In reply to Florian Hubold from comment #3) > Created attachment 6796 [details] > journalctl -ab output of boot with ordering cycle issues > > (In reply to Colin Guthrie from comment #2) > > Ideally a full log with systemd.log_level=debug on the kernel command line > > from the boot would be good. > > So my question would be, for such ordering issues, booting with > "systemd.log_level=debug" and then "journalctl -ab" output should be > sufficient to check why such cycles appeared? It can certainly help. For most issues it can provide a lot of insights that regular logs do not, so it's a good one to keep in the armoury. You can just use "debug" on the command line too but the kernel spews out a lot of stuff then too. Reopening, after the user unmasked rc-local the issue is back. I cannot really comprehend how rc-local has an effect of the ordering of prefdm and getty/plymouth-quit-wait ... but anyways, at least it's reproducible somehow. Will ask him to create a debug journal log and attach it here. Resolution:
WORKSFORME =>
(none) Created attachment 6798 [details]
journalctl -ab output of boot with ordering cycle issues in debug mode ( boot option systemd.log_level=debug )
@Colin: Here's the debug log, although it looks a bit weird. Not one occurence of prefdm in there, no ordering cycle issues.
And when the last child process of plymouth died, it seems to automatically invoke an emergency shell, although the user says he still just saw the regular plymouth bootsplash ...
Attachment 6796 is obsolete:
0 =>
1 Created attachment 6799 [details] proper journalctl -ab output of boot with ordering cycle issues in debug mode ( boot option systemd.log_level=debug ) (In reply to Florian Hubold from comment #7) > @Colin: Here's the debug log, although it looks a bit weird. Not one > occurence of prefdm in there, no ordering cycle issues. Talked with the user again, this is now the correct debug log, which also shows the prefdm ordering cycle as expected. And it is easily reproducible, unmasking rc-local.server makes it appear, masking it makes it vanish.
Attachment 6798 is obsolete:
0 =>
1 @colin: Another issue, reported via https://forums.mageia.org/en/viewtopic.php?f=7&t=10134 - a restart of prefdm seems to help there for normal login. But this thing with the ordering cycles seems ridiculous to me: Aug 07 19:26:19 gilraen systemd[1]: Found ordering cycle on preload.service/start Aug 07 19:26:19 gilraen systemd[1]: Breaking ordering cycle by deleting job prefdm.service/start Aug 07 19:26:19 gilraen systemd[1]: Job prefdm.service/start deleted to break ordering cycle starting with preload.service/start Aug 07 19:26:19 gilraen systemd[1]: Found ordering cycle on preload.service/start Aug 07 19:26:19 gilraen systemd[1]: Breaking ordering cycle by deleting job remote-fs.target/start Aug 07 19:26:19 gilraen systemd[1]: Job remote-fs.target/start deleted to break ordering cycle starting with preload.service/start Aug 07 19:26:19 gilraen systemd[1]: Found ordering cycle on preload.service/start Aug 07 19:26:19 gilraen systemd[1]: Breaking ordering cycle by deleting job time-sync.target/start Aug 07 19:26:19 gilraen systemd[1]: Job time-sync.target/start deleted to break ordering cycle starting with preload.service/start And one more time, masking the triggering unit, in this case preload.service, fixes the issue. @colin: Can we please get this fixed? As it doesn't seem like an upstream problem, what can be done about this?
Florian Hubold
2015-08-09 16:48:43 CEST
Blocks:
(none) =>
15527 I think the solution decided upon was just to kill off preload. It's not really been tested and probably provides minimal benefits even if it did work (due to it not being tested or integrated). We can likely just release a new systemd version that obsoletes it. I meant "can we please get this fixed" in regards to the ridiculous ordering cycles, not especially for preload - the initial report was triggered by rc.local causing ordering cycles. I mean, c'mon, how is rc.local, whether it runs or not anyhow relevant to prefdm? Sorry but I simply fail to see the logic upon which systemd decides what ordering cycles are - those seem completely arbitrary with no actual relation. They are not completely arbitrary. They are very much written down in the rules of the units/init scripts. It's very specifically the exact opposite of arbitrary. In this case, preload has Required-Start $local_fs and $remote_fs and $time These translate to systemd having an After rule in MGA5 for preload with: After=remote-fs.target time-sync.target systemd-journald.socket basic.target system.slice prefdm.service With prefdm having: After=livesys-late.service systemd-user-sessions.service getty@tty1.service plymouth-quit.service systemd-journald.socket basic.target system.slice And rc-local having: After=network.target systemd-journald.socket basic.target system.slice And before rules: preload: Before=resolvconf.service graphical.target shutdown.target partmon.service network.service numlock.service irqbalance.service postfix.service icecream.service icecream-scheduler.service msec.service virtualbox.service network-up.service prefdm: Before=dracut-shutdown.service shutdown.target preload.service graphical.target rc-local: Before=plymouth-quit.service shutdown.target getty@tty1.service plymouth-quit-wait.service Now the main problem here is that prefdm has to start before preload but after after systemd-user-sessions.service. s-u-s has to start after remote-fs.target which has to start after remote-fs-pre.target which has to start after nfs-lock.target which has to start after network.target. So s-u-s has to start after network.target So preferdm has to start before preload but after network.target. preload itself has to start after network.target, and thus there is a deadlock (aka an ordering cycle). All of this only shows up and is activated under certain circumstances - i.e. if remote-fs.target is not started or there are no remote fs's detected then a lot of these loops will be closed. There are likely other ways that generators (such as the rc-local one) can get in the way by adding extra deps. So these things are far from arbitrary. They are very much determined, although the use of generators (such as in the rc-local case or when remote filesystems are detected in /etc/fstab) can make things harder to spot. But while they may not be arbitrary, they are bloody annoying :p (In reply to Florian Hubold from comment #0) > Description of problem: > > A user in the german forum reported this after his upgrade to mga5 via > https://forums.mageia.org/de/viewtopic.php?f=8&t=2507 . From his point of > view, the system boots normally, but plymouth never goes away and X never > starts. He has to switch to another tty, login and run "startx" to get his > normal desktop. > I just upgraded from 4 to 5 with urpmi --replacefiles --auto-update --auto and got almost the same problem, except that my boot goes directly to login and after login, I have to run startx to get to the desktop. systemctl mask preload.service make it boot normal. CC:
(none) =>
hc So one half of the bug should be solved by removing preload via Bug 16086 As the originally reported issue also involved preload as a dependency in the ordering cycle on prefdm, I'll close this one. But we should reopen against cauldron if we see this again before or after mga6 release ... Status:
REOPENED =>
RESOLVED
Samuel Verschelde
2017-01-17 10:29:39 CET
Blocks:
15527 =>
(none) |