Bug 23280 - Systemd ordering cycle problem between dnf and drakx-net often keeps network from coming up.
Summary: Systemd ordering cycle problem between dnf and drakx-net often keeps network ...
Status: RESOLVED FIXED
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: Cauldron
Hardware: All Linux
Priority: Normal normal
Target Milestone: ---
Assignee: Base system maintainers
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-07-04 06:12 CEST by aguador
Modified: 2019-02-24 12:17 CET (History)
11 users (show)

See Also:
Source RPM: systemd, dnf-3.0.2, drakx-net
CVE:
Status comment:


Attachments
DNF output of the update after which I first noticed this issue (66.70 KB, text/plain)
2018-07-17 10:17 CEST, Marja Van Waes
Details

Description aguador 2018-07-04 06:12:33 CEST
Install kernel 4.17.3-2 (rc1 of 4.17.4) and reboot.

Ethernet will be off. In some cases may be restored by MCC, in others must use terminal command.

Problem appears to be resolved with kernel 4.17.4.
Comment 1 Marja Van Waes 2018-07-04 13:21:36 CEST Comment hidden (obsolete)

Resolution: (none) => FIXED
Status: NEW => RESOLVED
CC: (none) => marja11

Comment 2 Marja Van Waes 2018-07-08 08:07:50 CEST Comment hidden (obsolete)

CC: (none) => kernel, mageiatools
Status: RESOLVED => REOPENED
Resolution: FIXED => (none)

Comment 3 Barry Jackson 2018-07-13 00:39:50 CEST
I'm seeing this for the first time with 4.17.6 but 4.17.5 is OK.
I don't have 'networkmanager' installed, although there are some networkmanager-* packages.

CC: (none) => zen25000

Comment 4 Barry Jackson 2018-07-13 00:46:43 CEST
Interesting, on re-boot back into 4.17.6 after booting 4.17.5 to test and report above, the network came straight up.
Comment 5 Marja Van Waes 2018-07-17 10:17:52 CEST
Created attachment 10284 [details]
DNF output of the update after which I first noticed this issue

In case it helps: attaching the dnf output from the updates after which I had this issue on the laptop that always uses LAN. 

If an update is the culprit, then I fail to see what other than kernel-4.17.2-2.mga7.src.rpm can have caused this.

However, there's this line in my "journalctl -a" logs that only occurred on that laptop when I hit this issue:

jul 03 07:18:04 localhost systemd[1]: network.target: Job network.service/start deleted to break ordering cycle starting with network.target/start
Comment 6 Marja Van Waes 2018-07-17 10:21:11 CEST
(In reply to Marja Van Waes from comment #5)

> 
> However, there's this line in my "journalctl -a" logs that only occurred on
> that laptop when I hit this issue:
> 
> jul 03 07:18:04 localhost systemd[1]: network.target: Job
> network.service/start deleted to break ordering cycle starting with
> network.target/start

CC'ing the base system maintainers

(In reply to Barry Jackson from comment #4)
> Interesting, on re-boot back into 4.17.6 after booting 4.17.5 to test and
> report above, the network came straight up.

Wondering whether that would work for all

Summary: Update to 4.17.3-2 sets ethernet "off" => Kernel update sets ethernet "off"
Status comment: (none) => Might be a workaround: reboot into a previous kernel and then into the new kernel again
CC: (none) => basesystem

Comment 7 Barry Jackson 2018-07-17 14:42:26 CEST
No, it seemed to work for a soft reboot but last night after being turned off for a day a cold boot had no ethernet again.
Hitting 'connect' in the netapplet config tool brought it up straight away.
Comment 8 aguador 2018-07-17 15:13:48 CEST
I have had the exact same experience as Barry with the 4.17.6-3 kernel. It survived a reboot yesterday, but this morning I had to connect manually via network center.
Comment 9 Thomas Backlund 2018-07-17 15:49:19 CEST
(In reply to Marja Van Waes from comment #5)


> However, there's this line in my "journalctl -a" logs that only occurred on
> that laptop when I hit this issue:
> 
> jul 03 07:18:04 localhost systemd[1]: network.target: Job
> network.service/start deleted to break ordering cycle starting with
> network.target/start


This is the real bug.

There is a dep loop somewhere forcing systemd to break ordering, and then all bets are off...

Instead of rebooting ... 

just do a "systemctl restart network" as root and see if that works...

And ... does this only happend with network service, not networkmanager, right ?

CC: (none) => tmb

Comment 10 Marja Van Waes 2018-07-17 19:13:46 CEST
(In reply to Thomas Backlund from comment #9)
> (In reply to Marja Van Waes from comment #5)
> 
> 
> > However, there's this line in my "journalctl -a" logs that only occurred on
> > that laptop when I hit this issue:
> > 
> > jul 03 07:18:04 localhost systemd[1]: network.target: Job
> > network.service/start deleted to break ordering cycle starting with
> > network.target/start
> 
> 
> This is the real bug.
> 
> There is a dep loop somewhere forcing systemd to break ordering, and then
> all bets are off...
> 
> Instead of rebooting ... 
> 
> just do a "systemctl restart network" as root and see if that works...
> 
> And ... does this only happend with network service, not networkmanager,
> right ?

Forget about comment #2, I had both network.service and NetworkManager.service enabled there #2 :-(
Obsoleting that comment.

My other laptops don't use NetworkManager

I'll try "systemctl restart network.service" if I hit the issue again (on the first laptop, or on a third one, where I couldn't, yet, reproduce the "ethernet off" problem).

On all three laptops, there aren't any "deleted to break ordering cycle starting with" messages in my journalctl logs from before 1 July 2018. I have such messages for dnf-makecache.timer, too, and it is funny that, for each single laptop, the "break ordering cycle" messages started right after I updated dnf to 3.0.2

Is that a coincidence?

@ aguador and Barry

Do you have dnf installed?
Do you see similar ordering cycle messages in your logs?

Status comment: Might be a workaround: reboot into a previous kernel and then into the new kernel again => (none)
CC: (none) => ngompa13

Comment 11 aguador 2018-07-17 22:13:10 CEST
dnf installed and, yes, here is the entry from this morning's boot:

jul 17 07:05:35 localhost systemd[1]: network-online.target: Job network.target/start deleted to break ordering cycle starting with network-online.target/start
Comment 12 Barry Jackson 2018-07-18 00:42:03 CEST
dnf is installed here too, though I rarely use it and oddly this evening it booted with network up straight away from cold.

However late last night I ran urpme --auto-orphans to clear out a stack of *-devel files etc. so maybe the culprit was removed.

I now have:

[baz@localhost ~]$ rpm -qa|grep dnf
dnfdragora-1.0.1-3.mga7
dnf-3.0.2-1.mga7
python3-dnfdaemon-0.3.19-1.mga7
dnf-yum-3.0.2-1.mga7
dnfdragora-common-1.0.1-3.mga7
python3-dnf-plugins-core-3.0.1-1.mga7
python3-libdnf-0.15.2-1.mga7
dnf-plugins-core-3.0.1-1.mga7
mageia-dnf-conf-1.0-1.mga6
lib64dnf2-0.15.2-1.mga7
dnf-data-3.0.2-1.mga7
dnfdragora-gtk-1.0.1-3.mga7
python3-dnf-3.0.2-1.mga7
dnfdaemon-0.3.19-1.mga7

[baz@localhost ~]$ sudo journalctl -a |grep "network.target"
Jul 03 23:54:53 localhost systemd[1]: network.service: Found dependency on network.target/start
Jul 04 20:27:24 localhost systemd[1]: network.service: Found dependency on network.target/start
Jul 06 00:00:20 localhost systemd[1]: basic.target: Found dependency on network.target/start
Jul 08 17:30:07 localhost systemd[1]: basic.target: Found dependency on network.target/start
Jul 08 23:41:52 localhost systemd[1]: network-up.service: Found dependency on network.target/start
Jul 10 20:38:51 localhost systemd[1]: basic.target: Found dependency on network.target/start
Jul 12 23:24:52 localhost systemd[1]: network.target: Found ordering cycle on network.service/start
Jul 12 23:24:52 localhost systemd[1]: network.target: Found dependency on iptables.service/start
Jul 12 23:42:01 localhost systemd[1]: network.service: Found dependency on network.target/start
Jul 15 23:01:17 localhost systemd[1]: basic.target: Found dependency on network.target/start
Jul 16 21:25:28 localhost systemd[1]: network-online.target: Found ordering cycle on network.target/start
Jul 16 21:25:28 localhost systemd[1]: network-online.target: Job network.target/start deleted to break ordering cycle starting with network-online.target/start

NOTE that last error was yesterday - tonight it worked fine

[baz@localhost ~]$ sudo systemctl | grep net
  sys-devices-pci0000:00-0000:00:1c.2-0000:02:00.0-net-enp2s0.device                                    loaded active plugged   AR8151 v2.0 Gigabit Ethernet                                                         
  sys-subsystem-net-devices-enp2s0.device                                                               loaded active plugged   AR8151 v2.0 Gigabit Ethernet                                                         
● network.service                                                                                       loaded failed failed    LSB: Bring up/down networking                                                        
  systemd-networkd.service                                                                              loaded active running   Network Service                                                                      
  systemd-networkd.socket                                                                               loaded active running   Network Service Netlink Socket                                                       
  network-online.target                                                                                 loaded active active    Network is Online                                                                    
  network.target                                                                                        loaded active active    Network
Comment 13 Barry Jackson 2018-07-18 00:55:30 CEST
@Thomas
I don't know when this changed but it is confusing, network.service no longer seems used but it exists, and I see several failed messages during boot and have done for a long time now in cauldron: "Failed to start LSB: Bring up/down networking."

[baz@localhost ~]$ sudo systemctl restart network
Job for network.service failed because the control process exited with error code.
See "systemctl status network.service" and "journalctl -xe" for details.

Seems replaced with systemd-networkd which reports no errors:

[baz@localhost ~]$ sudo systemctl restart systemd-networkd
[baz@localhost ~]$

[baz@localhost ~]$ systemctl status network.service
● network.service - LSB: Bring up/down networking
   Loaded: loaded (/etc/rc.d/init.d/network; generated; vendor preset: disabled)
   Active: failed (Result: exit-code) since Tue 2018-07-17 23:44:54 BST; 4min 56s ago
     Docs: man:systemd-sysv-generator(8)
  Process: 10470 ExecStart=/etc/rc.d/init.d/network start (code=exited, status=1/FAILURE)
   CGroup: /system.slice/network.service
           └─32669 /sbin/ifplugd -I -b -i enp2s0

Jul 17 23:44:54 localhost network[10470]: RTNETLINK answers: File exists
Jul 17 23:44:54 localhost network[10470]: RTNETLINK answers: File exists
Jul 17 23:44:54 localhost network[10470]: RTNETLINK answers: File exists
Jul 17 23:44:54 localhost network[10470]: RTNETLINK answers: File exists
Jul 17 23:44:54 localhost network[10470]: RTNETLINK answers: File exists
Jul 17 23:44:54 localhost network[10470]: RTNETLINK answers: File exists
Jul 17 23:44:54 localhost network[10470]: RTNETLINK answers: File exists
Jul 17 23:44:54 localhost systemd[1]: network.service: Control process exited, code=exited status=1
Jul 17 23:44:54 localhost systemd[1]: network.service: Failed with result 'exit-code'.
Jul 17 23:44:54 localhost systemd[1]: Failed to start LSB: Bring up/down networking.

[baz@localhost ~]$ systemctl status systemd-networkd.service
● systemd-networkd.service - Network Service
   Loaded: loaded (/usr/lib/systemd/system/systemd-networkd.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2018-07-17 23:45:51 BST; 5min ago
     Docs: man:systemd-networkd.service(8)
 Main PID: 10752 (systemd-network)
   Status: "Processing requests..."
   CGroup: /system.slice/systemd-networkd.service
           └─10752 /usr/lib/systemd/systemd-networkd

Jul 17 23:45:51 localhost systemd[1]: Starting Network Service...
Jul 17 23:45:51 localhost systemd-networkd[10752]: enp2s0: Gained IPv6LL
Jul 17 23:45:51 localhost systemd-networkd[10752]: Enumeration completed
Jul 17 23:45:51 localhost systemd[1]: Started Network Service.
[baz@localhost ~]$
Comment 14 Marja Van Waes 2018-07-23 08:14:19 CEST
(In reply to Thomas Backlund from comment #9)
> (In reply to Marja Van Waes from comment #5)

> > 
> > jul 03 07:18:04 localhost systemd[1]: network.target: Job
> > network.service/start deleted to break ordering cycle starting with
> > network.target/start
> 
> 
> This is the real bug.
> 
> There is a dep loop somewhere forcing systemd to break ordering, and then
> all bets are off...
> 
> Instead of rebooting ... 
> 
> just do a "systemctl restart network" as root and see if that works...
> 

Yes, that worked fine this morning. Going by /root/.bash_history I hadn't tried that on that system before :-/

(Sorry for the late reply, but this was the third time I booted that system since you asked, and I only hit the issue again now)
Comment 15 Marja Van Waes 2018-07-23 08:21:38 CEST
(In reply to Thomas Backlund from comment #9)
> (In reply to Marja Van Waes from comment #5)
> 
> 

> > 
> > jul 03 07:18:04 localhost systemd[1]: network.target: Job
> > network.service/start deleted to break ordering cycle starting with
> > network.target/start
> 

> 
> There is a dep loop somewhere forcing systemd to break ordering, and then
> all bets are off...
> 

So, should this bug be assigned to drakx-net, or to systemd?

Is there a chance dnf wants something that drakx-net can't supply, or can't the dnf-makecache.timer ordering issue be related?
Comment 16 Marja Van Waes 2018-07-23 22:22:38 CEST
The dnf ordering issue messages that I had seen so far, were always something similar to:

basic.target: Job dnf-makecache.timer/start deleted to break ordering cycle starting with basic.target/start  

but now I found this on the third laptop:

jul 21 20:27:42 cldrn_64 systemd[1]: network.service: Job dnf-makecache.timer/start deleted to break ordering cycle starting with network.service/start

jul 23 16:00:33 cldrn_64 systemd[1]: dnf-makecache.timer: Job network-online.target/start deleted to break ordering cycle starting with dnf-makecache.timer/start

So, even if I don't understand what that means, I'm now convinced that dnf and drakx-net don't get along well.

It affects wlan, too, btw, but it is easy to think you're too far away from an access point or that there's something wrong with the network. 

I've disabled  dnf-makecache.timer and dnf-makecache.service for now, dunno whether disabling the latter, too, was overkill.

CC'ing ignatenkobrain because dnf is really involved.

Summary: Kernel update sets ethernet "off" => Systemd ordering cycle problem between dnf and drakx-net often keeps network from coming up.
Source RPM: kernel-4.17.3-2.mga7.src.rpm => dnf-3.0.2, drakx-net, systemd
CC: (none) => ignatenko

Marja Van Waes 2018-07-23 22:23:10 CEST

Attachment 10284 is obsolete: 0 => 1

Comment 17 Adrien Guichard 2018-07-24 13:48:47 CEST
FYI, I was facing the same problem, just after installed latest Mageia Cauldron.

The network could not start and every attempt to activate network failed (ifup does not return).

As soon as I disabled "dnf-makecache.timer", I could start the network again.

CC: (none) => guichard.adrien

Comment 18 Marja Van Waes 2018-08-02 19:20:17 CEST
Assuming the basesystem (systemd) maintainers have the best knowledge to fix this, so assigning to them

Assignee: bugsquad => basesystem
Source RPM: dnf-3.0.2, drakx-net, systemd => systemd, dnf-3.0.2, drakx-net
CC: (none) => mageia

Comment 19 aguador 2018-09-04 15:00:01 CEST
Given that I have Cauldron running on a laptop and am not using dnf (now part of task-packager to which I may/may not return), I have just disabled dnf-makecache.service completely. This bug not only requires bringing up the network manually virtually all the time, it also adds about 2 full minutes to boot time when it fails!

There is a relevant RH bug report, about the appropriateness of the timer itself rather than breaking ordering, closed as "WONTFIX" 3 years ago here: https://bugzilla.redhat.com/show_bug.cgi?id=1187111

As an aside, Barry (Comment 13) raises a good point about service.network as it fails and seems to have been replaced with systemd-networkd.service, which means keeping service.network simply lengthens boot time for no reason. Could not find a separate bug report on this. Is there one?
Comment 20 Bit Twister 2018-09-04 16:22:10 CEST
(In reply to aguador from comment #19)
 
> As an aside, Barry (Comment 13) raises a good point about service.network as
> it fails and seems to have been replaced with systemd-networkd.service,
> which means keeping service.network simply lengthens boot time for no
> reason. 

Hmmm, that is odd, since I converted to systemd-networkd.service back on mga5, I shaved 20 milliseconds off my boot time for each of my nic cards.


$ grep -iE "ONBOOT|NM_CONTROLLED" /etc/sysconfig/network-scripts/ifcfg-en*
/etc/sysconfig/network-scripts/ifcfg-enp3s0:ONBOOT=no
/etc/sysconfig/network-scripts/ifcfg-enp3s0:NM_CONTROLLED=no
/etc/sysconfig/network-scripts/ifcfg-enp4s0:ONBOOT=no
/etc/sysconfig/network-scripts/ifcfg-enp4s0:NM_CONTROLLED=no

$ cd /usr/lib/systemd/network/
$ grep -v ^#  *enp* 
10_xx__enp3s0.network:
10_xx__enp3s0.network:[Match]
10_xx__enp3s0.network:Name=enp3s0
10_xx__enp3s0.network:
10_xx__enp3s0.network:[Network]
10_xx__enp3s0.network:Description=LAN_NIC
10_xx__enp3s0.network:DNS=127.0.0.1
10_xx__enp3s0.network:DNS=8.8.8.8
10_xx__enp3s0.network:Domains=home.test
10_xx__enp3s0.network:IPv6AcceptRouterAdvertisements=false
10_xx__enp3s0.network:
10_xx__enp3s0.network:[Address]
10_xx__enp3s0.network:Address=192.168.11.132/24
10_xx__enp3s0.network:
10_xx__enp3s0.network:[Route]
10_xx__enp3s0.network:Gateway=192.168.11.1
11_xx__enp4s0.network:
11_xx__enp4s0.network:[Match]
11_xx__enp4s0.network:Name=enp4s0
11_xx__enp4s0.network:
11_xx__enp4s0.network:[Network]
11_xx__enp4s0.network:Description=SPA_NIC
11_xx__enp4s0.network:IPv6AcceptRouterAdvertisements=false
11_xx__enp4s0.network:
11_xx__enp4s0.network:[Address]
11_xx__enp4s0.network:Address=192.168.15.135/24
11_xx__enp4s0.network:
11_xx__enp4s0.network:

$ systemctl list-units --type=service | grep netw
  systemd-networkd-wait-online.service  loaded active exited  Wait for Network to be Configured
  systemd-networkd.service              loaded active running Network Service

CC: (none) => bittwister2

Comment 21 aguador 2018-10-30 23:43:37 CET
I will leave this open until others have chimed in, but it appears that today's initscript update (I am guessing) fixed this problem. I have rebooted a couple of times without having to connect manually.
Comment 22 aguador 2019-02-24 12:17:08 CET
As this is not working for me and there have been no more reports, I am closing this report.

Note that Barry's observation (Comment 13) still seems to me worthy of a separate bug report (unless fixed as part of the systemd update I have yet to install). My boot log constantly shows the LSB failure to bring up network service, and then delays in bringing up the systemd service.

Status: REOPENED => RESOLVED
Resolution: (none) => FIXED


Note You need to log in before you can comment on or make changes to this bug.