Bug 14452 - systemd started journald as wrong type, so "After=" jobs did not wait for it to initialise
Summary: systemd started journald as wrong type, so "After=" jobs did not wait for it ...
Status: RESOLVED FIXED
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: Cauldron
Hardware: All Linux
Priority: Normal normal
Target Milestone: ---
Assignee: Colin Guthrie
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-11-04 08:53 CET by Morgan Leijström
Modified: 2014-11-05 18:46 CET (History)
6 users (show)

See Also:
Source RPM:
CVE:
Status comment:


Attachments

Description Morgan Leijström 2014-11-04 08:53:50 CET
Example of problem:
After update yesterday a laptop get stuck at boot with the message

[ **   ] A start job is running for Trigger Flushing of Journal to Persistent Storage (14 min 7s / no limit)

And I found no way to get past it to debug, fix - or user scenario: get important work done.

If it can not save journal to /var in a couple seconds, (ok set timeout to 30) forget it and move on and let us get the system up, that is important for any use inluding debug...!

Forum reference: https://forums.mageia.org/en/viewtopic.php?f=15&t=8666

Reproducible: 

Steps to Reproduce:
Comment 1 Nikita Krupenko 2014-11-04 10:01:10 CET
Confirm. I have the same problem on two PCs and cannot boot. One is laptop with EFI and grub2, another is a PC with grub1.

CC: (none) => krnekit

Sander Lepik 2014-11-04 10:08:53 CET

CC: (none) => mageia
Assignee: bugsquad => mageia

Comment 2 Colin Guthrie 2014-11-04 10:43:27 CET
Hmm, in classic style, it works fine here :D

Can I ask, do both of you have separate /var - i.e. /var mounted as a separate partition.

(if you use magic sysreq + e it should let things continue, although this really is a horrible work around and I'll certainly look to get this fixed ASAP once I understand the trigger case).
Comment 3 Nikita Krupenko 2014-11-04 11:07:15 CET
I have no separate /var partition.
sysrq + e helped and I can now boot!
Also looking at journal on both machines I've found, that hang started after this lines:

Nov 04 11:54:49 localhost.localdomain service_harddrake[6615]: ### Program is starting ###
Nov 04 11:54:50 localhost.localdomain service_harddrake[6615]: running: dmidecode
Nov 04 11:54:50 localhost.localdomain service_harddrake[6615]: modified file /etc/sysconfig/harddrake2/kernel
Nov 04 11:54:52 localhost.localdomain service_harddrake[6615]: created file /etc/sysconfig/harddrake2/previous_hw
Nov 04 11:54:52 localhost.localdomain service_harddrake[6615]: launched command: display_driver_helper --check-loaded
Comment 4 Colin Guthrie 2014-11-04 11:14:31 CET
(In reply to Nikita Krupenko from comment #3)
> I have no separate /var partition.

Thanks for confirming. That rules out that avenue of investigation.

> sysrq + e helped and I can now boot!
> Also looking at journal on both machines I've found, that hang started after
> this lines:
> 
> Nov 04 11:54:49 localhost.localdomain service_harddrake[6615]: ### Program
> is starting ###
> Nov 04 11:54:50 localhost.localdomain service_harddrake[6615]: running:
> dmidecode
> Nov 04 11:54:50 localhost.localdomain service_harddrake[6615]: modified file
> /etc/sysconfig/harddrake2/kernel
> Nov 04 11:54:52 localhost.localdomain service_harddrake[6615]: created file
> /etc/sysconfig/harddrake2/previous_hw
> Nov 04 11:54:52 localhost.localdomain service_harddrake[6615]: launched
> command: display_driver_helper --check-loaded

OK, that's interesting. I disable harddrake here, so I'll reneable it and see if I can reproduce the problem.

Do you see anything else interesting in journal logs before that (and it may be that for now, the logs are kept in /run and will not be saved permanently - i.e. a reboot might lose the logs - as will an extended period of time)

After using sysrq + e and a boot, does "systemctl status systemd-journal-flush.service" or "journalctl -b -u systemd-journal-flush.service" give any useful information? (both run as root).
Comment 5 Nikita Krupenko 2014-11-04 11:25:34 CET
[root@localhost nekit]# systemctl status systemd-journal-flush.service
â systemd-journal-flush.service - Trigger Flushing of Journal to Persistent Storage
   Loaded: loaded (/usr/lib/systemd/system/systemd-journal-flush.service; static)
   Active: active (exited) since ÐÑо 2014-11-04 11:55:15 EET; 29min ago
     Docs: man:systemd-journald.service(8)
           man:journald.conf(5)
  Process: 6649 ExecStart=/usr/bin/journalctl --flush (code=exited, status=0/SUCCESS)
 Main PID: 6649 (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/systemd-journal-flush.service
[root@localhost nekit]# journalctl -b -u systemd-journal-flush.service
-- Logs begin at ÐÑк 2014-09-28 22:41:35 EEST, end at ÐÑо 2014-11-04 12:18:05 EET. --

Before hang I also see this in logs:

Nov 04 11:54:28 localhost.localdomain systemd[1]: systemd-journald.service failed.
Nov 04 11:54:28 localhost.localdomain systemd-journal[746]: Journal started
-- Subject: The Journal has been started
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- The system journal process has been starting up, opened the journal
-- files for writing and is now ready to process requests.
Nov 04 11:54:28 localhost.localdomain systemd-sysctl[740]: Overwriting earlier assignment of kernel/sysrq in file '/etc/sysctl.d/51-alt-sysrq.conf'.
Nov 04 11:54:28 localhost.localdomain systemd-sysctl[740]: Overwriting earlier assignment of net/ipv4/icmp_ignore_bogus_error_responses in file '/etc
Nov 04 11:54:28 localhost.localdomain systemd-sysctl[740]: Overwriting earlier assignment of net/ipv4/conf/all/log_martians in file '/etc/sysctl.d/99
Nov 04 11:54:28 localhost.localdomain systemd[1]: systemd-journald.service: main process exited, code=killed, status=10/USR1
Nov 04 11:54:28 localhost.localdomain systemd[1]: Unit systemd-journald.service entered failed state.
Comment 6 Morgan Leijström 2014-11-04 11:53:56 CET
As i see it here are 2 bugs

1) IMO, by principle there should not exist any limitless timeouts on anything non critical for booting (this is actually what I filed this bug for, i guess I should clone it)  Long timeout OK for such important stuff, but not limitless!

2) the issue that we discuss which is "just" an example of what may trigger 1)

---

IIRC the system i have do not have separate /var
When i got back to it an hour after i made this bug it had black screen - possibly failed suspend - and i could not get it boot.  I will reaad up on how to issue Sysrq on a T43.  Later.
Comment 7 Colin Guthrie 2014-11-04 12:00:28 CET
(In reply to Morgan Leijström from comment #6)
> 1) IMO, by principle there should not exist any limitless timeouts on
> anything non critical for booting (this is actually what I filed this bug
> for, i guess I should clone it)  Long timeout OK for such important stuff,
> but not limitless!

Please could you just take this one straight upstream? I agree with you but I'm not sure I'd have time to act as the middle man here.

FWIW, this is generally how upstream works already, so you'd have to be specific about which jobs you think are setup incorrectly. Generally upstream will be welcome of this kind of feedback.


> IIRC the system i have do not have separate /var
> When i got back to it an hour after i made this bug it had black screen -
> possibly failed suspend - and i could not get it boot.  I will reaad up on
> how to issue Sysrq on a T43.  Later.

Thanks! If this doesn't work for you I'm sure I can give you another way to get things to work via kernel command line tweaks!
Comment 8 Colin Guthrie 2014-11-04 12:06:33 CET
Also seeing other reports on IRC... they seem to be 64 bit only at present... is anyone here on 32 bit with this issue? (I have two test systems, one 64 and one 32. I can't make it happen on either, but I do have mandriva-everytime.service masked on my 64 bit install which may hide the issue... systemd.mask=mandriva-everytime.service on the kernel command line *might* help, but this is being checked now by Marja)

CC: (none) => marja11

Comment 9 Colin Guthrie 2014-11-04 12:10:04 CET
OK, so Marja confirms that adding "systemd.mask=mandriva-everytime.service" to the kernel command line is a good work around for now.
Comment 10 Marja Van Waes 2014-11-04 12:25:49 CET
(In reply to Colin Guthrie from comment #9)
> OK, so Marja confirms that adding "systemd.mask=mandriva-everytime.service"
> to the kernel command line is a good work around for now.

I didn't expect that to survive reboots (it can't, can it?), but I don't hit the problem again on two reboots _without_ that string added
Comment 11 Morgan Leijström 2014-11-04 12:48:10 CET
I only tried update on 32 bit and there it hit.
(thinkpad T43, earlier upgraded from mga4 to mga5 alpha2 and continuing updates)

I will report the missing timeout upstream in weekend or so.
Uh, where is upstream more exactly?
Comment 12 Frank Griffin 2014-11-04 12:54:53 CET
Confirming, as mentioned in bug#14444.  64-bit, mandriva-everytime not masked.

CC: (none) => ftg

Comment 13 Frank Griffin 2014-11-04 13:02:15 CET
Oh, and /var is on root partition.
Comment 14 Morgan Leijström 2014-11-04 13:02:51 CET
1)  When I issue [Alt] + [PrtSc/SysRq] + [e] , it just append a timestamp and
    " SysRq : This sysrq operation is disabled. "
And then continuing on new line as before:
    [ **   ] A start job is running for Trigger Flushing of Journal to Persistent Storage
  - continuing time counting from where it were.


2) adding "systemd.mask=mandriva-everytime.service" to the kernel command line (by pressing F3 at grub boot menu) also do not help...
Comment 15 Frank Griffin 2014-11-04 13:06:04 CET
By the way, when I first saw this, my root partition was 99% full, and I thought it was a disk space issue.  But I freed up a GB or so and retried, with the same result.
Comment 16 Morgan Leijström 2014-11-04 13:15:18 CET
Disk partitions: separate ext4 /boot, 2 LVM pv for same LVM, two MSwinXP ntfs partitions. The LVM contains ext4 /, ext4 /home, and swap.
No partition was close to full.

Anything else to try on kernel boot line?

Anything i can check or edit from an alternate booting system?
 I use sysresccd.org on USB stick.
Comment 17 Frank Griffin 2014-11-04 13:21:08 CET
(In reply to Morgan Leijström from comment #14)
> 1)  When I issue [Alt] + [PrtSc/SysRq] + [e] , it just append a timestamp and
>     " SysRq : This sysrq operation is disabled. "

I thought Colin had fixed this a while back.  Upstream decided that the SYSRQ stuff should be disabled by default, and we added a sysctl rule to revoke that.  See bug#10473.  You should be able to boot rescue, and add /etc/sysctl.d/51-alt-sysrq.conf containing 

kernel.sysrq = 1
Comment 18 Morgan Leijström 2014-11-04 13:38:51 CET
Thank you Frank, that did it; Alt] + [PrtSc/SysRq] + [e] now got me out of it.

Should we open a bug about /etc/sysctl.d/51-alt-sysrq.conf ?
Note this system was upgraded online by changing repos from mga4 when mga5 was alpha - something may have borked during upgrade.
Comment 19 Morgan Leijström 2014-11-04 13:43:44 CET
Ah, reading bug#10473 i realise this system maybe was cauldron at that time so it did not get that update then because of that.
Comment 20 Nikita Krupenko 2014-11-04 14:14:50 CET
All my systems are 64-bit.
Checked systemd.mask=mandriva-everytime.service - it does not helps.
Comment 21 Colin Guthrie 2014-11-04 15:34:03 CET
Thanks for the test people. I'll try to get to the bottom of the issue as soon as I can (likely this evening).

In the mean time, you might want to try adding "systemd.debug-shell" to the kernel command line which should enable a debugging shell on tty9. This might let you inspect where things are failing (i.e. see which jobs are still activating) via systemctl list-jobs, and systemctl status and such like.

Hopefully the debug shell kicks in early enough to be useful here. I'm still hoping to reproduce the issue so I can take a proper look.
Barry Jackson 2014-11-04 17:47:58 CET

CC: (none) => zen25000

Comment 22 Colin Guthrie 2014-11-04 19:53:14 CET
OK, so I managed to reproduce this on my 32 bit test system and the debug shell works fine (manually enabled rather than via kernel command line but it should be the same either way).

Annoyingly the "A start job..." thing is still outputting there which makes the actual debugging a bit annoying! But getting there.
Comment 23 Morgan Leijström 2014-11-04 20:50:40 CET
Interesting: When adding "systemd.debug-shell" to the kernel command line, the problem does not appear!
Removing it again and problem is back.
Repeated on another thinkpad T43. 
So on those systems adding systemd.debug-shell is a *workaround*...

(I also intended to try on thinkpad R61 = 64 bit, but it do not have the same issue however it fail to load nvidia usin glatest kernel, steppig back it is OK)
Comment 24 Morgan Leijström 2014-11-04 20:57:35 CET
Maybe it is a timing issue, because only some systems are affected, and on different systems different methods works...  maybe something need happen before something else, and changing some things alters that order...
Comment 25 Nikita Krupenko 2014-11-04 21:28:04 CET
In my case with first try to boot with systemd.debug-shell, boot was hanging in different place. systemctl list-jobs displayed, that mandriva-everytime.service was running. Seems debug shell affects this issue.
With second try all was as usual, and running job was systemd-journal-flush.service
Comment 26 Nikita Krupenko 2014-11-04 22:44:04 CET
Installed updates for systemd and systemd-units and boot does not hangs anymore!
Comment 27 Morgan Leijström 2014-11-04 22:46:45 CET
Which versions do you have now, and do you know what you had before?
Comment 28 Colin Guthrie 2014-11-04 23:06:03 CET
OK, so I think we've gotten to the bottom of the issue. This was a real cross distro debugging session upstream as we had fedora, arch and mga guys all hacking away at the issue!

Latest build should fix it, but for the record, this is the what was happening.

systemd started the journal daemon. It is marked as Type=simple init's unit file (or rather it has no Type= anything, so defaults to simple). This means systemd starts it and contines on straight away with any other units which come after it. The systemd-journal-flush unit comes after it, and tries to send a signal to journald to make it do it's flush then waits for word back that this has happened. Sadly, the flush operation happened so quickly after journald itself started that it had not yet fully initialised and had not setup proper signal handlers, and thus when the flush operation sent its signal, it actually killed journald (as expected without any signal handling!) and a new journald was respawned and the flush operation hung waiting for a reply that would never come.

So, the correct fix is to use the proper Type=notify (which journald supports). This means that systemd will not continue with After= jobs until the journal is properly initialised and it's signal handler is all setup.

This was very much a race condition and as a result, doing other things (like enabling the debug shell) could easily cause it not to trigger. Some boots would also work fine.

Hopefully the latest build will solve all these problems tho'!

All the best and thanks for your help and patience debugging this issue :)

See Also: (none) => https://bugs.freedesktop.org/show_bug.cgi?id=85871

Comment 29 Morgan Leijström 2014-11-04 23:17:34 CET
Sounds like it have been a nice hacking party - thank you all :)

And thank you for the report and explanation Colin.
Morgan Leijström 2014-11-04 23:21:57 CET

Summary: Please do not use limitless timeout on boot blocking processes unless absolutely needed => systemd started journald as wrong type, so "After=" jobs did not wait for it to initialise

Comment 30 Barry Jackson 2014-11-04 23:26:35 CET
Yes, nice job Colin!

New build works fine for me - tested many cold boots and warm re-boots with no further sign of the issue. :)
Comment 31 Frank Griffin 2014-11-05 01:43:51 CET
Absolutely well done !
Comment 32 Pierre Jarillon 2014-11-05 02:15:06 CET
Congratulations Colin and very nice explanation.
Here the boot x86_64 is now correct. IMO, the bug can be closed.

CC: (none) => jarillon

Comment 33 Morgan Leijström 2014-11-05 08:46:13 CET
I confirm the fix works on my affected systems too.
Comment 34 Morgan Leijström 2014-11-05 08:50:45 CET
... before closing, maybe someone ( =Colin ;) ) could check wether the same potential problem exist in mga4 so it do not potentially hit there in a later update.  I mean systemd to use the proper Type=notify for journald.

Status: NEW => RESOLVED
Resolution: (none) => FIXED

Comment 35 Morgan Leijström 2014-11-05 08:52:31 CET
Doh.
I was about to close it, then wrote we should not yet, but I still had selected to close...  time for coffee!

Status: RESOLVED => REOPENED
Resolution: FIXED => (none)

Comment 36 Colin Guthrie 2014-11-05 10:47:08 CET
Well, technically the same problem does exist there, but it's much less likely to trigger as the After= line for the flush service has more things in it (local-fs.target remote-fs.target).

So this reduces the likelihood of the race a lot. I'm also not 100% sure if journald the binary supported using the notification protocol back then, but I'll take a look and double check and if it can be set to Type=notify, we may as well do so.

FWIW, if you are patient enough the machine will eventually shut itself down again - I think it takes about half an hour to wait for soemthing to happen, but it did happen to my VM while I was debugging this issue and left it there :)

As a secondary note, a really nice command to run in the debug shell if you ever have to use it and get the status update "Eye of Sauron" animation thing on tty9 getting in the way of your commands, just type "kill -s 55 1" to stop it :)

Anyway, I think I will close this issue for now but I will do some tests on MGA4 systems for this issue too.

Status: REOPENED => RESOLVED
Resolution: (none) => FIXED

Comment 37 Morgan Leijström 2014-11-05 11:12:10 CET
OK. Thanks for the tips.
As an industrial machine designer i am used to hunt any fraction of percent of anything going wrong.
Comment 38 Morgan Leijström 2014-11-05 18:46:30 CET
My son reminded me that we had something similar on my old workstation which we refurbished as family computer using new graphics, an SSD, Mageia 4.

At most (but not all) boots it hang with some similar line about some filesystem problem, and an animation - possibly same issue, but at that time I thought it may just be that the old main board BIOS and SATA port hardware is not working perfectly with a modern SSD, so we changed it for an old drive, and it worked perfectly, but slower of course.

Rethinking now, it may have been a timing issue, possibly the same.
Mageia 4, 64 bit, on an old 2x dual opteron high end main board - a power hungry monster.

So now it may work if we install mga5 on an SSD in that old chap :)

Note You need to log in before you can comment on or make changes to this bug.