5220 – various systemd service failures on T420

Bug 5220 - various systemd service failures on T420

Summary: various systemd service failures on T420

Status:	RESOLVED FIXED

Alias:	None

Product:	Mageia
Classification:	Unclassified
Component:	RPM Packages (show other bugs)
Version:	Cauldron
Hardware:	x86_64 Linux

Priority:	Normal Severity: normal
Target Milestone:	---
Assignee:	Colin Guthrie
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2012-04-04 02:26 CEST by Herbert Poetzl
Modified:	2015-03-31 03:28 CEST (History)
CC List:	7 users (show)

See Also:
Source RPM:
CVE:
Status comment:

Attachments
contents of /var/log/dmesg (99.39 KB, text/plain) 2012-04-13 19:50 CEST, Wolfgang Bornath	Details
Saved output of dmesg as of 14 Apr 2012 08:04 GMT +0200 (49.48 KB, text/plain) 2012-04-14 10:33 CEST, Wolfgang Bornath	Details
/var/log/syslog starting at 14 Apr 2012 08:04 GMT +0200 (21.00 KB, text/plain) 2012-04-14 10:35 CEST, Wolfgang Bornath	Details
View All Add an attachment (proposed patch, testcase, etc.)

Description Herbert Poetzl 2012-04-04 02:26:11 CEST

Description of problem:
after reboot (not the first one) of a newly installed cauldron system (on a Lenovo T420) several services fail:


# systemctl status plymouth-quit-wait.service
plymouth-quit-wait.service - Wait for Plymouth Boot Screen to Quit
	  Loaded: loaded (/lib/systemd/system/plymouth-quit-wait.service; static)
	  Active: failed (Result: timeout) since Tue, 03 Apr 2012 16:50:35 +0100; 9h ago
	Main PID: 2384
	  CGroup: name=systemd:/system/plymouth-quit-wait.service

# systemctl status nscd.service
nscd.service - Name Service Cache Daemon
	  Loaded: loaded (/lib/systemd/system/nscd.service; enabled)
	  Active: failed (Result: timeout) since Tue, 03 Apr 2012 16:51:37 +0100; 9h ago
	  CGroup: name=systemd:/system/nscd.service

# systemctl status ct_sync.service
ct_sync.service - LSB: Connection tracking state replication
	  Loaded: loaded (/etc/rc.d/init.d/ct_sync)
	  Active: failed (Result: exit-code) since Tue, 03 Apr 2012 16:50:16 +0100; 9h ago
	  CGroup: name=systemd:/system/ct_sync.service

# systemctl status microcode_ctl.service
microcode_ctl.service - LSB: Update the Intel / AMD CPU microcode
	  Loaded: loaded (/etc/rc.d/init.d/microcode_ctl)
	  Active: failed (Result: exit-code) since Tue, 03 Apr 2012 16:50:08 +0100; 9h ago
	  CGroup: name=systemd:/system/microcode_ctl.service

Comment 1 Wolfgang Bornath 2012-04-13 14:55:59 CEST

(In reply to comment #0)
> Description of problem:
> after reboot (not the first one) of a newly installed cauldron system (on a
> Lenovo T420) several services fail:

Same situation after new installed cauldron on a Lenovo S10e netbook (i586): 
> 
> # systemctl status plymouth-quit-wait.service
> plymouth-quit-wait.service - Wait for Plymouth Boot Screen to Quit
>       Loaded: loaded (/lib/systemd/system/plymouth-quit-wait.service; static)
>       Active: failed (Result: timeout) since Tue, 03 Apr 2012 16:50:35 +0100;
> 9h ago
>     Main PID: 2384
>       CGroup: name=systemd:/system/plymouth-quit-wait.service

This is reproduceable and stops the boot process for good. For me this is a release-critical issue.

CC: (none) => molch.b

Comment 2 Wolfgang Bornath 2012-04-13 15:47:59 CEST

Adding to the last sentence of comment #1 :

I can't even do updates or corrections, the system is hosed!

Booting failsafe (RL1) ends in 
"Starting Rescue Shell...
Welcome to rescue mode. Use "systemctl default" or "^D" to activate default mode.
Failed to issue method call: Transaction is destructive "

Some more info:
The installation was done via i586 DVD, first a minimal system which booted fine. Network came up, media were set. 
Then I did 'urpmi task-kde4-minimal --no suggests' which worked fine. After that I did nothing else than reboot, which resulted in the plymouth-quit-wait timeout.

Comment 3 Martin Whitaker 2012-04-13 18:21:56 CEST

I wonder if this is related to bug #5262. If you boot with the extra boot options

  systemd.log_level=debug systemd.log_target=kmsg

then log in as root on a virtual terminal and inspect /var/log/dmesg, do you see any messages about services being deleted?

CC: (none) => mageia

Comment 4 Wolfgang Bornath 2012-04-13 19:19:48 CEST

Well, I added the extra boot options and let the system start run until it ended at the same point. Then looked at dmesg but could not find any more info about plymouth-quit-wait.service. It is started, then received timeout. Failed.

Comment 5 Martin Whitaker 2012-04-13 19:39:56 CEST

Just to check - you did look in /var/log/dmesg, not run the 'dmesg' command (I always assumed these were the same, but they aren't...).

It might be worth attaching a copy of /var/log/dmesg, just in case it gives anyone some clues as to what's wrong.

Comment 6 Wolfgang Bornath 2012-04-13 19:50:52 CEST

Created attachment 1985 [details]
contents of /var/log/dmesg

Comment 7 Martin Whitaker 2012-04-13 20:12:39 CEST

Ah yes, the interesting lines are:

[    5.017795] systemd[1]: Found ordering cycle on basic.target/start
[    5.021320] systemd[1]: Walked on cycle path to sockets.target/start
[    5.024799] systemd[1]: Walked on cycle path to syslog.socket/start
[    5.028209] systemd[1]: Walked on cycle path to basic.target/start
[    5.031579] systemd[1]: Breaking ordering cycle by deleting job syslog.socket/start
[    5.035132] systemd[1]: Looking at job getty@tty1.service/stop conflicted_by=yes
[    5.035146] systemd[1]: Looking at job getty@tty1.service/start conflicted_by=no
[    5.035159] systemd[1]: Fixing conflicting jobs by deleting job getty@tty1.service/start
[    5.035185] systemd[1]: Looking at job prefdm.service/start conflicted_by=no
[    5.035197] systemd[1]: Looking at job prefdm.service/stop conflicted_by=no
[    5.035208] systemd[1]: Fixing conflicting jobs by deleting job prefdm.service/stop
[    5.035229] systemd[1]: Looking at job plymouth-quit.service/stop conflicted_by=yes
[    5.035241] systemd[1]: Looking at job plymouth-quit.service/start conflicted_by=no
[    5.035253] systemd[1]: Fixing conflicting jobs by deleting job plymouth-quit.service/start

Comment 8 Martin Whitaker 2012-04-13 21:24:56 CEST

No, comparing this to a working system, these messages appear to be normal. So this looks to be different to bug #5262.

If you attach the output of the 'dmesg' command (saved to a file) and a copy of /var/log/syslog, I'll take a look to see if I can spot anything else - but I'm not an expert on this.

Comment 9 Wolfgang Bornath 2012-04-14 07:08:16 CEST

Neither am I. What makes me nervous that this release critical bug has been open for 10 days, we are at Beta3, but it has not been attached to anybody by now.  

Ok, could you give me a hint how I save the output of dmesg into a file?

Comment 10 Martin Whitaker 2012-04-14 10:00:33 CEST

At a terminal prompt, type

  dmesg > output-file

and the output will be saved in output-file (substitute whatever you like for output-file).

Comment 11 Wolfgang Bornath 2012-04-14 10:30:32 CEST

Duh! I did that this morning because I always do it like that, but I received a failure message, so I thought dmesg needs a different syntax.

Ok, now it worked :)

Comment 12 Wolfgang Bornath 2012-04-14 10:33:59 CEST

Created attachment 1986 [details]
Saved output of dmesg as of 14 Apr 2012 08:04 GMT +0200

Comment 13 Wolfgang Bornath 2012-04-14 10:35:39 CEST

Created attachment 1987 [details]
/var/log/syslog starting at 14 Apr 2012 08:04 GMT +0200

Wolfgang Bornath 2012-04-14 10:36:40 CEST

Attachment 1986 description: Saved output as of 14 Apr 2012 08:04 GMT +0200 => Saved output of dmesg as of 14 Apr 2012 08:04 GMT +0200

Comment 14 Martin Whitaker 2012-04-14 12:32:55 CEST

OK, in syslog we have:

Apr 14 04:17:21 localhost kdm[2077]: X server died during startup
Apr 14 04:17:21 localhost kdm[2077]: X server for display :0 cannot be started, session disabled

I'm fairly sure this is a different problem to the systemd service failures (some of which, like the failure to load Intel/AMD microcode, are probably harmless), so rather than hijack this bug report, can I suggest you open a new bug and attach /var/log/Xorg.0.log and also any logs you can find for kdm (for gdm, there is a directory /var/log/gdm - so try looking for /var/log/kdm). From your post in the forum, I guess this problem is due to you trying out a minimal install, and it will probably turn out to be a missing dependency that means a vital package hasn't been installed. Put me on the cc list, and I'll take a look.

Comment 15 Wolfgang Bornath 2012-04-14 23:56:49 CEST

Hmm, I'm not sure about your assessment. I'll check Xorg.0.log.

There I see that at the end the log shows failures to load the modules intel, vesa, fdev, consequently showing "No drivers available" and the well known "Fatal server error: no screens found"

kdm log shows exactly the same ("failed to load module" for all 3 modules).

But usually in this case you are dropped to a text login, not just hanging in the air.

Anyway, I'll open another bug report.

Comment 16 Martin Whitaker 2012-04-15 00:22:40 CEST

My guess is that you aren't dropped to a text login because systemd has deleted the necessary job:

[    5.035159] systemd[1]: Fixing conflicting jobs by deleting job
getty@tty1.service/start

Colin has responded to your post in the forum - perhaps he will be able to shed more light on this.

Comment 17 Wolfgang Bornath 2012-04-15 00:39:07 CEST

(In reply to comment #15)

> Anyway, I'll open another bug report.

No, I did not because I found out that it has nothing to do with X.

I booted, system ended as reported with the failure message.
 - logged in as root on tty2
 - started XFdrake, several packages were installed and X was configured
 - rebooted -> system hangs on failure message as reported
 - logged in as user on tty2 and did 'startx' -> X and KDE came up just fine!

So, leaving 2 questions open:

The bug as reported about plymouth-wait-quit.service is still open.

Why did task-kd4-minimal not install the drivers for X? This is certainly a
topic for another bug report. But if the system would drop me at a prompt
because X was not yet configured, no problem. I could live with that.

Comment 18 Marja Van Waes 2012-05-26 13:04:49 CEST

Hi,

This bug was filed against cauldron, but we do not have cauldron at the moment.

Please report whether this bug is still valid for Mageia 2.

Thanks :)

Cheers,
marja

Keywords: (none) => NEEDINFO

Comment 19 Matteo Pasotti 2012-09-01 15:11:26 CEST

Hi,
user gribodo (from mageiaonline.it) is experiencing this issue after Mageia 2 installation.
I'm helping him following this bug report.

The reference to the italian discussion:
http://www.mageiaonline.it/bb/viewtopic.php?f=26&t=38

Let me know which kind of data I can provide to be of help.

Regards,
matteo

CC: (none) => pasotti.matteo

roelof Wobben 2013-01-02 20:16:34 CET

CC: (none) => r.wobben
Assignee: bugsquad => mageia

Comment 20 roelof Wobben 2013-01-02 20:17:04 CET

@Colin : can you shine a light on this matter. 

Roelof

Comment 21 roelof Wobben 2013-01-02 20:18:15 CET

And can anyone confirm this bug on Mageia2 or Mageia 3 ?

Roelof

Comment 22 Colin Guthrie 2013-01-03 18:05:38 CET

Chances are this problem stems from something in the boot sequence causing a breakage resulting in one or more jobs being deleted. In my experience, the job to start prefdm.service is typically one that gets ejected.

This can happen when third party non-LSB initscripts are used.

In order to tell if the problem is that, please do a fresh boot, switch to tty2 and do "systemctl show prefdm.service | grep ^Active".

If it says: 
ActiveEnterTimestampMonotonic=0

in there then the job hasn't even tried to start, but the conflicts with a getty on tty1 has been honoured, meaning that the unit has been parsed but subsequently ejected from the transaction due to a conflict.

The next step would be narrowing down what job caused this. Try checking in /etc/init.d folder and looking for anything not provided by a package (rpm -qf /etc/init.d/*) and also anything provided by any unofficial packages (e.g. from vmware, cisco etc. etc.). If you find such files in there, add proper LSB headers to them to hopefully fix the job ejection.

Comment 23 Dimitrios Glentadakis 2013-03-20 20:44:14 CET

I have a similar problem here:
https://bugs.mageia.org/show_bug.cgi?id=6719

I dont have any non boot problem, only it stacks for a while in "Failed to start Wait for Plymouth Boot Screen to Quit"

CC: (none) => dglent

Comment 24 Richard Neill 2014-01-27 02:31:41 CET

The microcode problem is fixed by installing the package:
microcode-0.20131009-3.mga4.nonfree

Q1: does it matter if this is omitted?
Q2: should this be installed by default? It's not free software, but I think most of the arguments against non-free software don't apply to this package.

CC: (none) => mageia

Comment 25 Nic Baxter 2015-03-31 03:28:30 CEST

Remove needinfo as all info given. Appears to be fixed by installing microcode, so closed

Keywords: NEEDINFO => (none)
Status: NEW => RESOLVED
CC: (none) => nic
Resolution: (none) => FIXED

Note You need to log in before you can comment on or make changes to this bug.