Bug 4509 - With systemd, NetworkManager.service take a while at boot
Summary: With systemd, NetworkManager.service take a while at boot
Status: RESOLVED FIXED
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: Cauldron
Hardware: All Linux
Priority: release_blocker major
Target Milestone: ---
Assignee: Colin Guthrie
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 2120
  Show dependency treegraph
 
Reported: 2012-02-13 16:11 CET by Damien Lallement
Modified: 2012-04-04 12:24 CEST (History)
4 users (show)

See Also:
Source RPM: networkmanager
CVE:
Status comment:


Attachments
systemd-analyze blame (1.67 KB, text/plain)
2012-02-13 16:11 CET, Damien Lallement
Details
systemd-analyze time (91 bytes, text/plain)
2012-02-13 16:12 CET, Damien Lallement
Details
systemd-analyze plot (435.75 KB, image/svg+xml)
2012-02-13 16:12 CET, Damien Lallement
Details
Some debug from my system when I noticed the problem. (5.71 KB, text/plain)
2012-04-02 12:00 CEST, Colin Guthrie
Details

Description Damien Lallement 2012-02-13 16:11:10 CET
Description of problem: My computer needs 5 minutes to start with Cauldron when it was needing less than 1 minutes in Mageia 1. When I check with systemd-analyze blame, I can see that NetworkManager.service take 3650ms to be available!
My network is configured in DHCP and if I use an formal IP (no DHCP): same thing.

Version-Release number of selected component (if applicable):
systemd-40-2.mga2.src.rpm
networkmanager-0.9.2.0-1.mga2.src.rpm
Comment 1 Damien Lallement 2012-02-13 16:11:44 CET
Created attachment 1545 [details]
systemd-analyze blame
Comment 2 Damien Lallement 2012-02-13 16:12:04 CET
Created attachment 1546 [details]
systemd-analyze time
Comment 3 Damien Lallement 2012-02-13 16:12:34 CET
Created attachment 1547 [details]
systemd-analyze plot
Manuel Hiebel 2012-02-14 00:04:49 CET

CC: (none) => balcaen.john, dmorganec, mageia
Blocks: (none) => 2120
Source RPM: systemd-40-2.mga2.src.rpm => networkmanager

Comment 4 Dave Hodgins 2012-02-14 02:03:30 CET
Looking at the plot, the service ksmtuned starts in second 29,
and ends in second 329.

Why do you have qemu installed?

The network-up service starts in second 35 end in second 335.

Try specifying
MII_NOT_SUPPORTED=yes
in /etc/sysconfig/network-scripts/ifcfg-eth0

Replace eth0 with the appropriate device name.

CC: (none) => davidwhodgins

Comment 5 Damien Lallement 2012-02-17 01:38:13 CET
FYI:
- qemu is because I need it for work.
- my network card is entirely supported by mii-tools, so...

Priority: Normal => release_blocker
Status: NEW => ASSIGNED
Assignee: bugsquad => mageia

Comment 6 Colin Guthrie 2012-02-17 10:18:42 CET
OK, having a quick look, the kmstuned problem is quite simple.

In the initscript it says in the LSB header:

# pidfile: /var/run/ksmtuned.pid

but later in the file:

pidfile=${PIDFILE-/var/run/ksmtune.pid}

(note the discrepancy in the naming)

Since systemd 40, it looks at the pidfile LSB header and waits for that file. This is likely why it waits so long (and it will likely be recorded in the log file that the pid file does not exist too).

I'm not sure which PID is better, but I'd guess ksmtuned.pid?


The network stuff is likely more interesting of course, but one problem at a time :)
Comment 7 Colin Guthrie 2012-03-22 13:38:51 CET
There was a problem with initscripts that the network sysvinit script was interpreting ifcfg-* files generated by (I think Network Manager) and starting e.g. ifplugd for my wlan adapter.

I've fixed initscripts to not do that and manually removed a few old network ifcfg-eth1 files etc. (created at some point in the past but with no corresponding adapter) and now network.service exits successfully and doesn't start any ifplugd processes.

NetworkManager has also been fine for me all along, so it's hard to work out exactly where the delay is coming from :(
Comment 8 Colin Guthrie 2012-03-24 23:36:18 CET
Some fixes from blino and myself in initscripts might help here. Do things work a bit better with latest initscripts and a fully updated cauldron?
Comment 9 Damien Lallement 2012-03-27 16:35:20 CEST
Nothing better on my 3 updated cauldron... :-(
Comment 10 Colin Guthrie 2012-03-27 17:12:44 CEST
Just for reference, there is a semi-related issue that we're looking at now that will decouple prefdm from network a bit more thoroughly than is done now. Obviously for network-auth and network-homedir people, we will still need to wait for network to be up.

As for the network-up itself, I've put a few things into initscripts that will at least help a bit, but this won't actually affect NetworkManager service itself.

I'm not honestly sure what's causing it to be so slow directly. It starts very quickly for me: 658ms. :s Your's takes 3.5s ish... maybe that's not the bit to be worried about tho'. Anyway, hopefully a few changes I'll try and push out tonight will help, and I'll try and look at network-up specifically too. Sander has also been helping here, but any debug of exactly why network-up takes so long would be appreciated.
Comment 11 Colin Guthrie 2012-04-02 11:57:47 CEST
OK, I may have found a/the problem.

When booting recently I realised that one of the things NetworkManager does is try to restart nscd. It does this by calling an old initscript (ugg).

Anyway the problem is that nscd requires the network to be up.

So when nscd is restarted, systemd will ultimately wait until it's dependanances are ready.

Now, network-up script will use NetworkManager via the nmcli utility to get the status of connections. When NM is busy spawning nscd, it will hang on any connections on the command line...

Thus a deadlock is formed: NetworkManager is waiting for network to be up to restart nscd, but network-up is using nmcli to see *when* the network is up.

I've fixed this by writing a little wrapper to restart nscd in a more systemd friendly way. This should avoid starting it on boot and thus hopefully things will go smoother.

I'm not sure this is the only problem here, but it would be nice if this at least helped the situation.
Comment 12 Colin Guthrie 2012-04-02 12:00:47 CEST
Created attachment 1902 [details]
Some debug from my system when I noticed the problem.

I should note that due to other changes I've done recently, it should be easy enough to login when waiting for the network (unless you have NFS shares in fstab marked without nofail option or have the network-auth service enabled, both of which will delay login screen start until network is ready). It therefore becomes possible to run the status commands while the problem is still occurring. Of course for delays to network start that delay things less than the time it takes to login and execute those commands, this doesn't really aid debugging.
Comment 13 Colin Guthrie 2012-04-03 14:53:43 CEST
I'm pretty sure this is now resolved.

I think it was a deadlock that occured when NetworkManager itself tried to restart the nscd service.

I've now created a little script that pokes nscd and only restarts it when it really needs to rather than unconditionally.

I'll wait until you can test it too tho', as I know you've also got the other "network-up" ticket open that will also hopefully be closable too now... (I live in hope)
Comment 14 Damien Lallement 2012-04-04 12:24:32 CEST
[dams@maximus ~]$ systemd-analyze time
Startup finished in 8290ms (kernel) + 6019ms (initramfs) + 40806ms (userspace) = 55116ms
[dams@maximus ~]$

Great, fixed, I close the bug.
Thanks!

Status: ASSIGNED => RESOLVED
Resolution: (none) => FIXED


Note You need to log in before you can comment on or make changes to this bug.