Bug 4339 - 2_a3: Any system service failure is adding 5+ minutes to initrd completion
Summary: 2_a3: Any system service failure is adding 5+ minutes to initrd completion
Status: RESOLVED FIXED
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: Cauldron
Hardware: x86_64 Linux
Priority: Normal major
Target Milestone: ---
Assignee: Mageia Bug Squad
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 2120
  Show dependency treegraph
 
Reported: 2012-01-29 19:54 CET by Bit Twister
Modified: 2012-03-25 03:27 CEST (History)
3 users (show)

See Also:
Source RPM: systemd-39-3.mga2.src.rpm
CVE:
Status comment:


Attachments

Description Bit Twister 2012-01-29 19:54:52 CET
Description of problem:
Any system service failure is adding 5+ minutes to initrd completion

 systemd[1]: Startup finished in 1s 874ms 778us (kernel) + 12s 709ms 948us (initrd) + 5min 21s 658ms 364us (userspace) = 5min 36s 243ms 90us.

Version-Release number of selected component (if applicable):


How reproducible: Always


Steps to Reproduce:
1. Fix any service to fail on start
2. reboot
3. watch "grep finished /var/log/messages"
4. Ctrl+c to abort watch when "Startup finished" shows up.
Comment 1 Bit Twister 2012-01-29 21:12:30 CET
Steps to Reproduce change. Note I did not test it, but except for the ln -s location this might recreate the problem. I had a link to my /local/bin/rc.local but would fail unless I copied it to /etc/rc.d.

1. cd /etc/rc.d
2. mkdir hold
3. mv rc.local
4. ln -s hold/rc.local
5. reboot
6. watch "grep finished /var/log/messages"
7. Ctrl+c to abort watch when "Startup finished" shows up.
Manuel Hiebel 2012-01-30 01:05:08 CET

Blocks: (none) => 2120
Severity: normal => major

Comment 2 D Morgan 2012-01-30 01:29:18 CET
Fix any service to fail on start ?


can you give the output of ""watch "grep finished /var/log/messages""

CC: (none) => dmorganec

Comment 3 Bit Twister 2012-01-30 02:15:33 CET
(In reply to comment #2)
> Fix any service to fail on start ?

Yes, that is would be required to recreate the problem.
Would you like me to change it to "Modify any service so it will fail to start"?

> can you give the output of ""watch "grep finished /var/log/messages""

See last paragraph of Description of problem:
I did snip date time node from output.
Comment 4 D Morgan 2012-01-31 09:39:13 CET
if the services fail this is not really a systemd issue but you should open bugreports against each failing services, and adding them in the https://bugs.mageia.org/show_bug.cgi?id=2120  tracker bugreport.
Comment 5 Bit Twister 2012-01-31 10:27:11 CET
(In reply to comment #4)
> if the services fail this is not really a systemd issue but you should open
> bugreports against each failing services, and adding them in the
> https://bugs.mageia.org/show_bug.cgi?id=2120  tracker bugreport.

Hmm, I can agree with reports on any failing service, but I disagree with not having a bug report against systemd when a non-impacting service failure is causing a down line service from completing.

I think 5 minutes to decide a service is a failure is a bit long.

I just checked again, and runlevel is "unknown" until initrd completes. :(

I'll reopen bug 4198.
Comment 6 D Morgan 2012-01-31 15:07:12 CET
don't speak about several bugreport in one please.

For systemd it depend of the init script, if it hang, how do you think systemd must know if this is normal or not ?

this is OUR job to make sure the initscript we provide are OK

( btw this is just my own opinion )
Comment 7 Bit Twister 2012-01-31 20:27:04 CET
(In reply to comment #6)
> don't speak about several bugreport in one please.

I am going to guess what you are saying is, don't use "bug nnnn" numbers.

> For systemd it depend of the init script, if it hang, how do you think systemd
> must know if this is normal or not ?

Then what has picked 5 minutes? Was there not some conversion somewhere about shortening/lengthening the 30 second timeout in hotpluging or some service.
Looking like the new value is 20 seconds.

I can see the 5 minute impact that has on the runlevel bug. 

In my opinion, systemd should have a default failure timer, not 5 minutes, and a new key word could be used in the service's unit file to extend the failure detect timeout.

This kind of systemd design problem should be pushed upstream.
Comment 8 Dave Hodgins 2012-01-31 22:27:39 CET
Any arbitrary timeout must be easy to override.

The current 30 second timeout is not long enough to allow me to run
any of the alpha live cds under virtualbox on my i586 host.

The udev default is 180 seconds, but dracut overrides that to 30.

I think dracut should add "boot parameters" (with reasonable defaults
that will work on most hardware) for any timeouts it imposes.

CC: (none) => davidwhodgins

Comment 9 Bit Twister 2012-01-31 23:18:03 CET
That is food for thought. Although I would rather have a /etc/modprob.d/fault_timeout.conf or better yet a /etc/sysctl.conf control.
Comment 10 Dave Hodgins 2012-02-01 02:09:53 CET
(In reply to comment #9)
> That is food for thought. Although I would rather have a
> /etc/modprob.d/fault_timeout.conf or better yet a /etc/sysctl.conf control.

The problem with that is that it can't be modified on a live cd.
There has to be a relatively easy way to override it.
Comment 11 Bit Twister 2012-02-01 03:28:12 CET
(In reply to comment #10)
> (In reply to comment #9)
> > That is food for thought. Although I would rather have a
> > /etc/modprob.d/fault_timeout.conf or better yet a /etc/sysctl.conf control.
> 
> The problem with that is that it can't be modified on a live cd.
> There has to be a relatively easy way to override it.

I hear where you are coming from, I wanted an on system file so I do not have to keep tweaking /boot/grub/menu.lst. I have see boot arguments causing panic halts with dracut/kernel giving no clue as to what it did not like.
Comment 12 Dave Hodgins 2012-02-01 03:46:25 CET
Take a look at
lsinitrd /boot/initrd-3.2.2-server-1.mga2.img |grep getarg
(with appropriate kernel version of course)
for existing debugging and break options that can
allow you to make tweaks in a bash session before or
after / gets mounted.
Comment 13 Bit Twister 2012-02-01 04:20:34 CET
(In reply to comment #12)
> Take a look at
> lsinitrd /boot/initrd-3.2.2-server-1.mga2.img |grep getarg

I have already been reprimanded for talking too much about problems when requesting a retest result and providing other bug reports created by any bug I happen to be working with a maintainer. 
 
We need to take this over to Newsgroups: alt.os.linux.mageia since your command failed to provide output.
Comment 14 D Morgan 2012-02-11 03:34:40 CET
what about this bug with systemd 40 ?
Comment 15 Bit Twister 2012-02-11 05:01:17 CET
(In reply to comment #14)
> what about this bug with systemd 40 ?

No idea. In comment 4, you said it was not systemd's problem, so I resolved the service failure problem.

I did yet another clean install Feb 1 + updates and systemd has gotten worse with additional updates. Shutdown is taking minutes, there is no output on runlevel 3 terminals/consoles, nothing in logs as to where/what failing service is hanging shutdown or startup.

For a system which is supposed to run somewhat in a parallel mode I do not understand why we seem to be getting/hitting serial timeouts.

I have to reboot several times just to get it to pick the correct nic for internet access. My testing proves to me testing with an upgrade from alpha3 dvd is not a stable test platform. My request for an up to date dvd was rejected. So we lost 3 weeks of quality bugreport/testing, which I thought was pretty short sighted.

You will just have to wait for Beta 1 for any re-testing by me.
If your system test proves you fixed the problem, close the bug.

I'll open new bug reports in Beta 1 if there is no open bug report.
 
I will suggest you do one last test first.
Set system default runlevel to 3
urpmi invictus-firewall
reboot
urpme invictus-firewall
reboot

So far Mageia 2 will be a joke for any Radeon video cards with fglrx driver installed and user wants to play a video or run mythtv.  :(
Comment 16 Bit Twister 2012-02-11 16:00:34 CET
(In reply to comment #15)
> (In reply to comment #14)
> > what about this bug with systemd 40 ?
> 
Sorry missed a step, it should be:

Set system default runlevel to 3
urpmi invictus-firewall
reboot
systemctl disable ct_sync.service
urpme invictus-firewall
reboot
Comment 17 Colin Guthrie 2012-03-25 01:54:52 CET
Many of the issues in this bug report are fixed now (things like using the right nic etc), but  is the underlying issue still here?

CC: (none) => mageia

Comment 18 Bit Twister 2012-03-25 03:27:31 CEST
(In reply to comment #17)
> Many of the issues in this bug report are fixed now (things like using the
> right nic etc), but  is the underlying issue still here?

Looking like it has been reduced to a more manageable level. I fixed
/lib/systemd/system/named.service to fail on pid and commented out the timer and time was somewhere around 1 to 2 minutes biting me. But that is another problem report which I'll get to in a little bit.  :)

I'll go ahead and marked it resolved.

Status: NEW => RESOLVED
Resolution: (none) => FIXED


Note You need to log in before you can comment on or make changes to this bug.