| Summary: | NetworkManager: 100% CPU and 10+ minutes to reboot (no network if NM is enabled) | ||
|---|---|---|---|
| Product: | Mageia | Reporter: | Pierre Fortin <pfortin> |
| Component: | RPM Packages | Assignee: | Mageia Bug Squad <bugsquad> |
| Status: | NEW --- | QA Contact: | |
| Severity: | normal | ||
| Priority: | Normal | CC: | davidwhodgins, ftg, lewyssmith |
| Version: | Cauldron | ||
| Target Milestone: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| See Also: | https://bugs.mageia.org/show_bug.cgi?id=32373 | ||
| Whiteboard: | |||
| Source RPM: | CVE: | ||
| Status comment: | |||
| Attachments: |
photo of bootup after NM installed
NM even takes a long time to shutdown... journal |
||
|
Description
Pierre Fortin
2023-10-18 22:56:55 CEST
Created attachment 14069 [details]
photo of bootup after NM installed
Created attachment 14070 [details]
NM even takes a long time to shutdown...
It seems you did not follow the correct procedure when you still have net_applet services running. See here how to switch properly to NM: https://wiki.mageia.org/en/Switching_to_networkmanager
sturmvogel
2023-10-19 04:24:14 CEST
Source RPM:
networkmanager-qt-5.110.0-1.mga10.x86_64 Tue 19 Sep 2023 02:37:07 PM EDT =>
(none) As described in the wiki, long boot times are caused by not properly masking the legacy network startup services. According your output they are not masked at all... (In reply to sturmvogel from comment #3) > It seems you did not follow the correct procedure when you still have > net_applet services running. See here how to switch properly to NM: > https://wiki.mageia.org/en/Switching_to_networkmanager I followed that EXACT procedure using copy/paste of every step. When I pasted the last step: (systemctl mask network.service; systemctl mask network-up), the commands did not return the prompt. See https://bugs.mageia.org/show_bug.cgi?id=32373#c4 and note the 3 "Created symlink" messages -- maybe they were delayed from a previous command. This issue is about 100% CPU which is a bug no matter what. Was it coincidence that issuing "systemctl mask network.service; systemctl mask network-up" did not return the prompt, or did NetworkManager going 100% CPU cause one of these commands not to return? Either way, this indicates a flaw in the procedure and/or a bug. I provided what information I could; Sorry if some of it was after doing what I could to restore networking. >As described in the wiki, long boot times are caused by not properly masking >the legacy network startup services. According your output they are not masked >at all... Already addressed this in https://bugs.mageia.org/show_bug.cgi?id=32373#c5 where making the 2 changes identified therein was the ONLY way I could get mcc to get past "Please wait" to get any network interface up -- I was surprised when it was the WiFi that came up, given the problem I reported in https://bugs.mageia.org/show_bug.cgi?id=32373 where I subsequently changed: 2023-10-18 20:23:47 CEST Summary: no WiFi after reboot => no networking after switch to NetworkManager and reboot as a result of this issue. From my perspective, if unmasking got any networking back, the slow boot is secondary. Quoting the wiki: It is also recommended disabling the legacy network startup services by running # systemctl mask network.service; systemctl mask network-up as otherwise this would introduce unnecessary delays during boot. OK; but there's something wrong when at the very point of issuing this pair, they didn't return and the NM was now at 100%, which is still the case... $ top | grep Network 451003 root 20 0 508104 190592 17472 R 100.0 0.1 7:41.32 NetworkManager 451003 root 20 0 509104 191552 17472 R 100.0 0.1 7:44.33 NetworkManager Created attachment 14076 [details] journal Migrated laptop to NetworkManager (same procedure and it's idle, not 100%) Found: https://bugs.archlinux.org/task/61688 On main system: systemctl restart NetworkManager just goes into S state and only returns when killed. No journal entries occur when the restart command is issued/killed. https://unix.stackexchange.com/questions/700464/check-current-active-network-manager $ nmcli connection Warning: nmcli (1.44.0) and NetworkManager (Unknown) versions don't match. Restarting NetworkManager is advised. Error: NetworkManager is not running. $ NetworkManager --version 1.44.0 So I can't issue any nmcli commands. Atached is what I see in the journal about every 10 minutes.
Marja Van Waes
2023-10-25 22:06:04 CEST
See Also:
(none) =>
https://bugs.mageia.org/show_bug.cgi?id=32373 Having no experience of NetworkManager, nor wishing to try it ... I wonder whether the related bug 32373 is not a variation on this one - at least from its c4. Both bugs show multiple problems including the 100% CPU usage. The ArchLinux URL given in comment 6 is about that, but dates from 2019 and talks mostly of Curl, also IPv6 and firewalls. It is long & implies various remedies which I do not think apply here. I am unsure whether Pierre has yet got NM (strictly alone) working on any machine. Ethernet &/or WiFi. Frank does (https://bugs.mageia.org/show_bug.cgi?id=32373#c6) but not without some manipulations apparently complicated by Plasma... Without seeing the Wiki, common sense says that if you change to NM you should first stop then inhibit our networking as noted, configure NM, and re-boot after the switch. Some comments note trying one alongside the other, which to say the least complicates the issue. Is the Wiki procedure watertight? I see these issues: * The fact that the command: systemctl mask network.service; systemctl mask network-up did not return. And when interrupted ^Z, shows: [11]+ Stopped systemctl enable --now NetworkManager.service suggests that NM has already been configured before killing our networking. * the 100% CPU usage * The huge & accumulating number of files named ifcfg-veth* in /etc/sysconfig/network-scripts, created at _random_ times; though on average about 4 per minute most of the time; some repeated within one second -- no repeating time pattern (https://bugs.mageia.org/show_bug.cgi?id=32373#c7). CC:
(none) =>
lewyssmith You do have to systemctl enable/start NetworkManager since we install it disabled. When using NM I don't mess with systemctl mask, I just use mcc/drakconnect to remove the ifcfg interfaces that the install created and then remove the ifcfg support from /etc/NetworkManager/NetworkManager.conf. Finally, execute "nmtui" to activate the interfaces you want to use. NM will remember them thereafter. The Plasma bit is that you have to install plasma-nm-applet because we don't do that by default. NM support is enabled automatically for GNOME, but not for Plasma. CC:
(none) =>
ftg (In reply to Lewis Smith from comment #7) > I see these main issues: > > * the 100% CPU usage > > * The huge & accumulating number of files named ifcfg-veth* in > /etc/sysconfig/network-scripts, created at _random_ times; though on average > about 4 per minute most of the time; some repeated within one second -- no > repeating time pattern (https://bugs.mageia.org/show_bug.cgi?id=32373#c7). Is this the ongoing situation? Does any other NetworkManager user here see these things, or just Pierre (and are you still seeing them?). (In reply to Frank Griffin from comment #8) > You do have to systemctl enable/start NetworkManager since we install it > disabled. Enabling it is covered in https://wiki.mageia.org/en/Switching_to_networkmanager > When using NM I don't mess with systemctl mask, I just use mcc/drakconnect > to remove the ifcfg interfaces that the install created and then remove the > ifcfg support from /etc/NetworkManager/NetworkManager.conf. Finally, > execute "nmtui" to activate the interfaces you want to use. NM will > remember them thereafter. Sounds like https://wiki.mageia.org/en/Switching_to_networkmanager needs some updating as it contains: systemctl mask network.service; systemctl mask network-up nmtui: first time hearing about this; maybe add it to https://wiki.mageia.org/en/Switching_to_networkmanager ... Also, in nmtui, what does "Please select an option: Radio" even mean? "Radio" needs an action; guessing: "enable/disable Radio"...? > The Plasma bit is that you have to install plasma-nm-applet because we don't > do that by default. NM support is enabled automatically for GNOME, but not > for Plasma. That should also be in: https://wiki.mageia.org/en/Switching_to_networkmanager but plasma-nm-applet is not available: $ urpmi plasma-nm-applet No package named plasma-nm-applet Did you mean?: urpmi plasma-applet-nm To satisfy dependencies, the following packages are going to be installed: Package Version Release Arch (medium "Core Release") plasma-applet-nm 5.27.9 1.mga10 x86_64 plasma-applet-nm-libreswan 5.27.9 1.mga10 x86_64 plasma-applet-nm-openvpn 5.27.9 1.mga10 x86_64 4.1KB of additional disk space will be used. 1.2MB of packages will be retrieved. Installed it. $ findcmd applet /usr/bin: mate-panel-test-applets mgaapplet mgaapplet-config mgaapplet-update-checker mgaapplet-upgrade-helper net_applet nm-applet $ nm-applet --help Usage: nm-applet This program is a component of NetworkManager (https://networkmanager.dev). It is not intended for command-line interaction but instead runs in the GNOME desktop environment. So, plasma-applet-nm installed the Gnome stuff? Not seeing any network applet in systray; is there a command to get a systray applet like I used to see before NetworkManager? (In reply to Lewis Smith from comment #9) > (In reply to Lewis Smith from comment #7) > > I see these main issues: > > > > * the 100% CPU usage > > > > * The huge & accumulating number of files named ifcfg-veth* in > > /etc/sysconfig/network-scripts, created at _random_ times; though on average > > about 4 per minute most of the time; some repeated within one second -- no > > repeating time pattern (https://bugs.mageia.org/show_bug.cgi?id=32373#c7). > Is this the ongoing situation? No; no idea where those came from. They only appeared between these reboots: Fri Sep 29 11:35:01 PM EDT 2023 Wed Oct 11 02:34:24 AM EDT 2023 See https://bugs.mageia.org/show_bug.cgi?id=32373#c0 I keep track of all installed RPMs via this in my root crontab: @reboot rpm -qa | sort > /home/ROOT/RPM.history/RPMS.`/bin/date +%Y%m%d` Hmmm... I apparently had @daily until: -rw-r--r-- 1 root root 176811 May 24 00:00 RPMS.20230524 -rw-r--r-- 1 root root 176811 May 25 00:00 RPMS.20230525 -rw-r--r-- 1 root root 176811 May 26 00:00 RPMS.20230526 but when I changed it to @reboot -- the command has not worked since; though running it manually works: -rw-r--r-- 1 root root 196093 Nov 7 09:19 RPMS.20231107 > Does any other NetworkManager user here see these things, or just Pierre > (and are you still seeing them?). Maybe ask on the 'discuss' mailing list... "man veth" - Virtual Ethernet Device has very little info on the uses. Are you using docker, golang, or fop-javadoc? I suspect the devices are being created by badly configured containers of one sort or another, and then being detected by network manager which automatically adds a configuration file for any un-configured network device. CC:
(none) =>
davidwhodgins I tried with a docker-compose image briefly. My journal for that period is expunged; but from 'history', I started docker-compose Sep 24 and installed jitsi, days before the Sep 29 reboot; and left it running until reboot on Oct 11. Haven't done anything with docker since. (In reply to Pierre Fortin from comment #11) > > > * The huge & accumulating number of files named ifcfg-veth* in > > > /etc/sysconfig/network-scripts, created at _random_ times; though on average > > > about 4 per minute most of the time; some repeated within one second -- no > > > repeating time pattern (https://bugs.mageia.org/show_bug.cgi?id=32373#c7). > > Is this the ongoing situation? > No; no idea where those came from. They only appeared between these reboots: > Fri Sep 29 11:35:01 PM EDT 2023 > Wed Oct 11 02:34:24 AM EDT 2023 (In reply to Pierre Fortin from comment #13) > I tried with a docker-compose image briefly. My journal for that period is > expunged; but from 'history', I started docker-compose Sep 24 and installed > jitsi, days before the Sep 29 reboot; and left it running until reboot on > Oct 11. Haven't done anything with docker since. (In reply to Dave Hodgins from comment #12) > "man veth" - Virtual Ethernet Device has very little info on the uses. > Are you using docker, golang, or fop-javadoc? I suspect the devices are > being created by badly configured containers of one sort or another, and then > being detected by network manager which automatically adds a configuration > file for any un-configured network device. This corresponds to Dave's suggestion. So at least they have gone. Accepting the need to refine the NM Wiki, does that leave just the 100% CPU utilisation? Ah - but what about the very long startup time? Gone or ongoing? (In reply to Pierre Fortin from comment #10) > urpmi plasma-nm-applet > No package named plasma-nm-applet > Did you mean?: > urpmi plasma-applet-nm A fair cop! It seems that all these Plasma NM pkgs exist: plasma-applet-nm plasma-applet-nm-fortisslvpnui plasma-applet-nm-l2tp plasma-applet-nm-libreswan plasma-applet-nm-openconnect plasma-applet-nm-openvpn plasma-applet-nm-pptp plasma-applet-nm-ssh plasma-applet-nm-strongswan plasma-applet-nm-vpnc (In reply to Dave Hodgins from comment #12) > "man veth" - Virtual Ethernet Device has very little info on the uses. > > Are you using docker, golang, or fop-javadoc? I suspect the devices are > being created by badly configured containers of one sort or another, and then > being detected by network manager which automatically adds a configuration > file for any un-configured network device. Docker removed months ago. Now, I found another crazy issue: while I'm not seeing new veth* interfaces being created in /etc/sysconfig/network-scripts, iptables has bee growing with (example): $ iptables -L -n | grep vetha5188d3 vetha5188d3_in 0 -- 0.0.0.0/0 0.0.0.0/0 vetha5188d3_fwd 0 -- 0.0.0.0/0 0.0.0.0/0 vetha5188d3_out 0 -- 0.0.0.0/0 0.0.0.0/0 Chain vetha5188d3_fwd (1 references) Chain vetha5188d3_in (1 references) Chain vetha5188d3_out (1 references) One wouldn't be a big deal; but: $ iptables -L -n | grep Chain | grep veth | grep _fwd | wc -l 5066 ^^^^!! $ grep -Rls vetha5188d3 /etc /etc/shorewall/interfaces $ ll /etc/shorewall/interfaces -rw------- 1 root root 117027 Sep 29 12:51 /etc/shorewall/interfaces $ grep veth /etc/shorewall/interfaces | wc -l 5066 $ grep -v veth /etc/shorewall/interfaces net p5p1 detect # ethernet - not connected net br-b8ea9ef8ed7d detect bridge net enp5s0 detect net docker0 detect bridge net wlp10s0 detect # WiFi net br-935570e85ea1 detect bridge net enp9s0 detect net vboxnet0 detect $ ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host noprefixroute valid_lft forever preferred_lft forever 2: p5p1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN group default qlen 1000 link/ether 74:86:e2:14:83:3d brd ff:ff:ff:ff:ff:ff altname enp9s0 3: wlp10s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 44:e5:17:fd:11:87 brd ff:ff:ff:ff:ff:ff inet 192.168.1.46/24 brd 192.168.1.255 scope global noprefixroute wlp10s0 valid_lft forever preferred_lft forever inet6 fe80::46e5:17ff:fefd:1187/64 scope link noprefixroute valid_lft forever preferred_lft forever So why does iptables contain all these rules with no actual interfaces? # from: https://stackoverflow.com/questions/31989426/how-to-identify-orphaned-veth-interfaces-and-how-to-delete-them $ for name in $(ifconfig -a | sed 's/[ \t].*//;/^\(lo\|\)$/d' | grep veth) do echo $name # ip link delete $name # uncomment this done # nothing... but... $ wc -l /var/lib/shorewall/.iptables-restore-input 76226 /var/lib/shorewall/.iptables-restore-input Deleted all the veth interaces from /etc/shorewall/interfaces and rebooting... iptables -L -n is now clean... If it doesn't stay clean, try uninstalling mandi-ifw and mandi. |