Description of problem: A user in the german forum reported this after his upgrade to mga5 via https://forums.mageia.org/de/viewtopic.php?f=8&t=2507 . From his point of view, the system boots normally, but plymouth never goes away and X never starts. He has to switch to another tty, login and run "startx" to get his normal desktop. Looking further, it's due to an ordering cycle where systemd shoots itself in the foot, killing getty on tty1 where X will start and also killing plymouth-wait-quit stop, hence plymouth will display forever. There are some other small issues on his box, but they seem unrelated and I have advised him for solutions on those already, awaiting feedback ... ---- Relevant excerpt from journalctl -ab via grep -i -E "plymouth|prefdm" journal.log Jun 30 17:26:06 localhost.localdomain systemd[1]: Found ordering cycle on prefdm.service/start Jun 30 17:26:06 localhost.localdomain systemd[1]: Breaking ordering cycle by deleting job plymouth-quit.service/stop Jun 30 17:26:06 localhost.localdomain systemd[1]: Job plymouth-quit.service/stop deleted to break ordering cycle starting with prefdm.service/start Jun 30 17:26:06 localhost.localdomain systemd[1]: Found ordering cycle on prefdm.service/stop Jun 30 17:26:06 localhost.localdomain systemd[1]: Breaking ordering cycle by deleting job getty@tty1.service/stop Jun 30 17:26:06 localhost.localdomain systemd[1]: Job getty@tty1.service/stop deleted to break ordering cycle starting with prefdm.service/stop Jun 30 17:26:06 localhost.localdomain systemd[1]: tmp.mount: Directory /tmp to mount over is not empty, mounting anyway. [snip - boot progresses here] Jun 30 17:26:18 localhost.localdomain shorewall6[2738]: done. Jun 30 17:26:36 localhost.localdomain systemd[1]: plymouth-quit-wait.service start operation timed out. Terminating. Jun 30 17:26:36 localhost.localdomain systemd[1]: Failed to start Wait for Plymouth Boot Screen to Quit. Jun 30 17:26:36 localhost.localdomain systemd[1]: Unit plymouth-quit-wait.service entered failed state. Jun 30 17:26:36 localhost.localdomain systemd[1]: plymouth-quit-wait.service failed. I have a complete journalctl -ab log from that system, which I can also attach if necessary. I have also asked him to give me the output of "systemd-delta" and "systemctl show -p Requires,Wants,Requisite,BindsTo,PartOf,Before,After" for prefdm and plymouth-quit-wait. Please advise what is required for further analysis and to break this ordering cycle. Maybe a dump via "systemd-analyze dump" at the point when the issue persists or when he has started his X session normally? Reproducible: Steps to Reproduce:
See also: Bug 15168 Bug 15691 Bug 16086
Ideally a full log with systemd.log_level=debug on the kernel command line from the boot would be good.
Created attachment 6796 [details] journalctl -ab output of boot with ordering cycle issues (In reply to Colin Guthrie from comment #2) > Ideally a full log with systemd.log_level=debug on the kernel command line > from the boot would be good. Yep, would probably be. Unfortunately the issue already vanished, and I've got no clue what particular action fixed it, only vague guess about rc.local. I've attached the journalctl -ab output when all the issues where there, after obtaining that I advised the user to fix some of his basic issues. This is what I asked him to do: - assign a hostname via "hostnamectl set-hostname" - disable rc.local via "systemctl mask rc-local" as it failed beforehand, so might have caused some ordering issues. This is what he did in addition: - unplug all external HDDs - unplug all USB devices - plug keyboard/mouse receiver directly into mainboard - move away /etc/rc.d/rc.local file somewhere else After several reboots and plugging his devices again as they were beforehand, everything is fine again. I assume that systemd doesn't really care about the hostname, so the only relevant change was masking rc.local ... So my question would be, for such ordering issues, booting with "systemd.log_level=debug" and then "journalctl -ab" output should be sufficient to check why such cycles appeared?
CC: (none) => doktor5000
Closing as the issue has vanished after masking rc.local ... :/
Status: NEW => RESOLVEDResolution: (none) => WORKSFORME
(In reply to Florian Hubold from comment #3) > Created attachment 6796 [details] > journalctl -ab output of boot with ordering cycle issues > > (In reply to Colin Guthrie from comment #2) > > Ideally a full log with systemd.log_level=debug on the kernel command line > > from the boot would be good. > > So my question would be, for such ordering issues, booting with > "systemd.log_level=debug" and then "journalctl -ab" output should be > sufficient to check why such cycles appeared? It can certainly help. For most issues it can provide a lot of insights that regular logs do not, so it's a good one to keep in the armoury. You can just use "debug" on the command line too but the kernel spews out a lot of stuff then too.
Reopening, after the user unmasked rc-local the issue is back. I cannot really comprehend how rc-local has an effect of the ordering of prefdm and getty/plymouth-quit-wait ... but anyways, at least it's reproducible somehow. Will ask him to create a debug journal log and attach it here.
Resolution: WORKSFORME => (none)Status: RESOLVED => REOPENED
Created attachment 6798 [details] journalctl -ab output of boot with ordering cycle issues in debug mode ( boot option systemd.log_level=debug ) @Colin: Here's the debug log, although it looks a bit weird. Not one occurence of prefdm in there, no ordering cycle issues. And when the last child process of plymouth died, it seems to automatically invoke an emergency shell, although the user says he still just saw the regular plymouth bootsplash ...
Attachment 6796 is obsolete: 0 => 1
Created attachment 6799 [details] proper journalctl -ab output of boot with ordering cycle issues in debug mode ( boot option systemd.log_level=debug ) (In reply to Florian Hubold from comment #7) > @Colin: Here's the debug log, although it looks a bit weird. Not one > occurence of prefdm in there, no ordering cycle issues. Talked with the user again, this is now the correct debug log, which also shows the prefdm ordering cycle as expected. And it is easily reproducible, unmasking rc-local.server makes it appear, masking it makes it vanish.
Attachment 6798 is obsolete: 0 => 1
@colin: Another issue, reported via https://forums.mageia.org/en/viewtopic.php?f=7&t=10134 - a restart of prefdm seems to help there for normal login. But this thing with the ordering cycles seems ridiculous to me: Aug 07 19:26:19 gilraen systemd[1]: Found ordering cycle on preload.service/start Aug 07 19:26:19 gilraen systemd[1]: Breaking ordering cycle by deleting job prefdm.service/start Aug 07 19:26:19 gilraen systemd[1]: Job prefdm.service/start deleted to break ordering cycle starting with preload.service/start Aug 07 19:26:19 gilraen systemd[1]: Found ordering cycle on preload.service/start Aug 07 19:26:19 gilraen systemd[1]: Breaking ordering cycle by deleting job remote-fs.target/start Aug 07 19:26:19 gilraen systemd[1]: Job remote-fs.target/start deleted to break ordering cycle starting with preload.service/start Aug 07 19:26:19 gilraen systemd[1]: Found ordering cycle on preload.service/start Aug 07 19:26:19 gilraen systemd[1]: Breaking ordering cycle by deleting job time-sync.target/start Aug 07 19:26:19 gilraen systemd[1]: Job time-sync.target/start deleted to break ordering cycle starting with preload.service/start
And one more time, masking the triggering unit, in this case preload.service, fixes the issue. @colin: Can we please get this fixed? As it doesn't seem like an upstream problem, what can be done about this?
Blocks: (none) => 15527
I think the solution decided upon was just to kill off preload. It's not really been tested and probably provides minimal benefits even if it did work (due to it not being tested or integrated). We can likely just release a new systemd version that obsoletes it.
I meant "can we please get this fixed" in regards to the ridiculous ordering cycles, not especially for preload - the initial report was triggered by rc.local causing ordering cycles. I mean, c'mon, how is rc.local, whether it runs or not anyhow relevant to prefdm? Sorry but I simply fail to see the logic upon which systemd decides what ordering cycles are - those seem completely arbitrary with no actual relation.
They are not completely arbitrary. They are very much written down in the rules of the units/init scripts. It's very specifically the exact opposite of arbitrary. In this case, preload has Required-Start $local_fs and $remote_fs and $time These translate to systemd having an After rule in MGA5 for preload with: After=remote-fs.target time-sync.target systemd-journald.socket basic.target system.slice prefdm.service With prefdm having: After=livesys-late.service systemd-user-sessions.service getty@tty1.service plymouth-quit.service systemd-journald.socket basic.target system.slice And rc-local having: After=network.target systemd-journald.socket basic.target system.slice And before rules: preload: Before=resolvconf.service graphical.target shutdown.target partmon.service network.service numlock.service irqbalance.service postfix.service icecream.service icecream-scheduler.service msec.service virtualbox.service network-up.service prefdm: Before=dracut-shutdown.service shutdown.target preload.service graphical.target rc-local: Before=plymouth-quit.service shutdown.target getty@tty1.service plymouth-quit-wait.service Now the main problem here is that prefdm has to start before preload but after after systemd-user-sessions.service. s-u-s has to start after remote-fs.target which has to start after remote-fs-pre.target which has to start after nfs-lock.target which has to start after network.target. So s-u-s has to start after network.target So preferdm has to start before preload but after network.target. preload itself has to start after network.target, and thus there is a deadlock (aka an ordering cycle). All of this only shows up and is activated under certain circumstances - i.e. if remote-fs.target is not started or there are no remote fs's detected then a lot of these loops will be closed. There are likely other ways that generators (such as the rc-local one) can get in the way by adding extra deps. So these things are far from arbitrary. They are very much determined, although the use of generators (such as in the rc-local case or when remote filesystems are detected in /etc/fstab) can make things harder to spot. But while they may not be arbitrary, they are bloody annoying :p
(In reply to Florian Hubold from comment #0) > Description of problem: > > A user in the german forum reported this after his upgrade to mga5 via > https://forums.mageia.org/de/viewtopic.php?f=8&t=2507 . From his point of > view, the system boots normally, but plymouth never goes away and X never > starts. He has to switch to another tty, login and run "startx" to get his > normal desktop. > I just upgraded from 4 to 5 with urpmi --replacefiles --auto-update --auto and got almost the same problem, except that my boot goes directly to login and after login, I have to run startx to get to the desktop. systemctl mask preload.service make it boot normal.
CC: (none) => hc
So one half of the bug should be solved by removing preload via Bug 16086 As the originally reported issue also involved preload as a dependency in the ordering cycle on prefdm, I'll close this one. But we should reopen against cauldron if we see this again before or after mga6 release ...
Status: REOPENED => RESOLVEDResolution: (none) => FIXED
Blocks: 15527 => (none)