Description of problem: My computer needs 5 minutes to start with Cauldron when it was needing less than 1 minutes in Mageia 1. When I check with systemd-analyze blame, I can see that NetworkManager.service take 3650ms to be available! My network is configured in DHCP and if I use an formal IP (no DHCP): same thing. Version-Release number of selected component (if applicable): systemd-40-2.mga2.src.rpm networkmanager-0.9.2.0-1.mga2.src.rpm
Created attachment 1545 [details] systemd-analyze blame
Created attachment 1546 [details] systemd-analyze time
Created attachment 1547 [details] systemd-analyze plot
CC: (none) => balcaen.john, dmorganec, mageiaBlocks: (none) => 2120Source RPM: systemd-40-2.mga2.src.rpm => networkmanager
Looking at the plot, the service ksmtuned starts in second 29, and ends in second 329. Why do you have qemu installed? The network-up service starts in second 35 end in second 335. Try specifying MII_NOT_SUPPORTED=yes in /etc/sysconfig/network-scripts/ifcfg-eth0 Replace eth0 with the appropriate device name.
CC: (none) => davidwhodgins
FYI: - qemu is because I need it for work. - my network card is entirely supported by mii-tools, so...
Priority: Normal => release_blockerStatus: NEW => ASSIGNEDAssignee: bugsquad => mageia
OK, having a quick look, the kmstuned problem is quite simple. In the initscript it says in the LSB header: # pidfile: /var/run/ksmtuned.pid but later in the file: pidfile=${PIDFILE-/var/run/ksmtune.pid} (note the discrepancy in the naming) Since systemd 40, it looks at the pidfile LSB header and waits for that file. This is likely why it waits so long (and it will likely be recorded in the log file that the pid file does not exist too). I'm not sure which PID is better, but I'd guess ksmtuned.pid? The network stuff is likely more interesting of course, but one problem at a time :)
There was a problem with initscripts that the network sysvinit script was interpreting ifcfg-* files generated by (I think Network Manager) and starting e.g. ifplugd for my wlan adapter. I've fixed initscripts to not do that and manually removed a few old network ifcfg-eth1 files etc. (created at some point in the past but with no corresponding adapter) and now network.service exits successfully and doesn't start any ifplugd processes. NetworkManager has also been fine for me all along, so it's hard to work out exactly where the delay is coming from :(
Some fixes from blino and myself in initscripts might help here. Do things work a bit better with latest initscripts and a fully updated cauldron?
Nothing better on my 3 updated cauldron... :-(
Just for reference, there is a semi-related issue that we're looking at now that will decouple prefdm from network a bit more thoroughly than is done now. Obviously for network-auth and network-homedir people, we will still need to wait for network to be up. As for the network-up itself, I've put a few things into initscripts that will at least help a bit, but this won't actually affect NetworkManager service itself. I'm not honestly sure what's causing it to be so slow directly. It starts very quickly for me: 658ms. :s Your's takes 3.5s ish... maybe that's not the bit to be worried about tho'. Anyway, hopefully a few changes I'll try and push out tonight will help, and I'll try and look at network-up specifically too. Sander has also been helping here, but any debug of exactly why network-up takes so long would be appreciated.
OK, I may have found a/the problem. When booting recently I realised that one of the things NetworkManager does is try to restart nscd. It does this by calling an old initscript (ugg). Anyway the problem is that nscd requires the network to be up. So when nscd is restarted, systemd will ultimately wait until it's dependanances are ready. Now, network-up script will use NetworkManager via the nmcli utility to get the status of connections. When NM is busy spawning nscd, it will hang on any connections on the command line... Thus a deadlock is formed: NetworkManager is waiting for network to be up to restart nscd, but network-up is using nmcli to see *when* the network is up. I've fixed this by writing a little wrapper to restart nscd in a more systemd friendly way. This should avoid starting it on boot and thus hopefully things will go smoother. I'm not sure this is the only problem here, but it would be nice if this at least helped the situation.
Created attachment 1902 [details] Some debug from my system when I noticed the problem. I should note that due to other changes I've done recently, it should be easy enough to login when waiting for the network (unless you have NFS shares in fstab marked without nofail option or have the network-auth service enabled, both of which will delay login screen start until network is ready). It therefore becomes possible to run the status commands while the problem is still occurring. Of course for delays to network start that delay things less than the time it takes to login and execute those commands, this doesn't really aid debugging.
I'm pretty sure this is now resolved. I think it was a deadlock that occured when NetworkManager itself tried to restart the nscd service. I've now created a little script that pokes nscd and only restarts it when it really needs to rather than unconditionally. I'll wait until you can test it too tho', as I know you've also got the other "network-up" ticket open that will also hopefully be closable too now... (I live in hope)
[dams@maximus ~]$ systemd-analyze time Startup finished in 8290ms (kernel) + 6019ms (initramfs) + 40806ms (userspace) = 55116ms [dams@maximus ~]$ Great, fixed, I close the bug. Thanks!
Status: ASSIGNED => RESOLVEDResolution: (none) => FIXED