Bug 25000

Summary: Under mga7 systemd starts SYSV init scripts that require the network before the network comes up.
Product: Mageia Reporter: Robert Munro <robert.munro>
Component: RPM PackagesAssignee: Mageia Bug Squad <bugsquad>
Status: RESOLVED INVALID QA Contact:
Severity: normal    
Priority: Normal CC: ftg, lewyssmith, marja11, ouaurelien
Version: 7Keywords: NEEDINFO
Target Milestone: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Source RPM: systemd-241-8.mga7.src.rpm CVE:
Status comment:
Attachments: init script
test script for /sbin

Description Robert Munro 2019-06-25 13:46:52 CEST
Description of problem: Under mga6, systemd waits for the network to come up before it starts SYSV init scripts that require the network. Under mga7, systemd starts SYSV init scripts that require the network before the network comes up.


Version-Release number of selected component (if applicable): systemd-241-8.mga7

How reproducible: Install the script test-network-up, reboot the system and use systemctl status to view the test results.

Steps to Reproduce:
1. Place the test script test-network-up in the /etc/rc.d/init.d directory.
2. Define the soft link S12test-network-up in the /etc/rc.d/rc3.d directory pointing to /etc/rc.d/init.d/test-network-up.
3. Ensure the soft links S10network and S11network-up are in /etc/rc.d/rc3.d pointing to the scripts network and network-up in the /etc/rc.d/init.d directory.
4. Reboot the system.
5. Use systemctl status to view the test results.

The test-network-up script is below:

#!/bin/sh
#
# test-network-up       this tests whether the network is up
#
# Get the name of the interface that is the default network gateway
#
DEFAULT_NETWORK_IFACE=`netstat -nr | awk '$1 == "0.0.0.0" {print $8}'`;
#
#       echo -n "${DEFAULT_NETWORK_IFACE}"      # uncomment to debug
#       echo                                    #
#
if      [ -z "${DEFAULT_NETWORK_IFACE}" ] ; then
        echo -n "The network is not up."
        echo ;
else    echo -n "The network is up."
        echo ;
fi;
# end   test-network-up
Comment 1 Robert Munro 2019-06-25 14:03:44 CEST
There is a forum thread about the process that led to discovering this bug at https://forums.mageia.org/en/viewtopic.php?f=15&t=12829 .
Comment 2 Marja Van Waes 2019-06-26 08:46:58 CEST
(In reply to Robert Munro from comment #1)
> There is a forum thread about the process that led to discovering this bug
> at https://forums.mageia.org/en/viewtopic.php?f=15&t=12829 .

IIUC what doktor5000 wrote in his reply https://forums.mageia.org/en/viewtopic.php?f=15&t=12829#p75267 then there isn't a bug.

Did you already try to adjust your script like he suggested?

Keywords: (none) => NEEDINFO
CC: (none) => marja11

Comment 3 Robert Munro 2019-06-26 14:10:45 CEST
But there *is* a bug, in that *something changed* from mageia 6 to mageia 7 such that the bastille firewall script works under mageia 6 but fails under mageia 7.

What changed, as proven by the tests in that forum thread, is that under mageia 6 the network is up when the bastille script is run, but under mageia 7 the network is not up yet when the bastille script is run.

With all due respect, merely saying "there isn't a bug" does not make that true.

Issue the command [ rpm -qif `which systemd` ] (without the brackets) and read:

"Description :
systemd is a system and session manager for Linux, compatible with SysV and LSB
init scripts. systemd provides aggressive parallelization capabilities, uses
socket and D-Bus activation for starting services, offers on-demand starting of
daemons, keeps track of processes using Linux cgroups, supports snapshotting and
restoring of the system state, maintains mount and automount points and
implements an elaborate transactional dependency-based service control logic. It
can work as a drop-in replacement for sysvinit."

Note that it says "compatible with SysV and LSB init scripts", and further, says:
"It can work as a drop-in replacement for sysvinit."

Either those statements are true or they're not. If they're true, there's a bug.
Comment 4 Frank Griffin 2019-06-26 15:06:34 CEST
This goes back to an old debate years ago even before systemd came in.  systemd exacerbated it by making service start as asynchronous as possible, but it was possible under SYSV as well.

Any service which launches off an asynchronous process and then ends is considered to have "finished".  If "network" or "network-up" start the network, they don't necessarily block until the network is actually there and usable.
The crux of the debate was that newer services that require the network come up and do their own blocking/retry until the network is usable.  The idea was, "well, why should other services which are robust enough to do this have to block while older services expect everything to stop until they can proceed blindly ?"

On one side of the debate were people (like me) who said "this is breaking the way it always was", and on the other side were people saying "these older services ought to be redesigned to do their own blocking/retry and not hold everyone else up".  The kicker was that no one was redesigning these services, and in the meantime you got anomalous results like DM login screens which displayed "localhost localdomain" because DHCP retrieval of a hostname had not yet completed.  This was particularly annoying with wireless NICs since kernel initialization of wireless occurs long after ethX initialization, so you would get one result with ethX and a different one with wloX.

This was eventually addressed by creating additional network services which actually did attempt to wait for a usable network to be present, and making the older services dependent on those.  IIRC there were several flavors of these with minor differences.  These were regarded as kludges which would go away as services were upgraded.

systemd doesn't know about any of these kludges.  Therefore, if you create a service that unconditionally requires network access, you either have to wrap it in blocking/retry code of your own, or else make your own service file that explicitly requires one of the kludge services.

As to which one you need, doktor5000's list seems more current than my dated view.

CC: (none) => ftg

Comment 5 Robert Munro 2019-06-27 01:45:51 CEST
The "Steps to Reproduce" in the initial bug report above are hereby replaced with the following, which have been checked and modified to work and actually tested:

1. Place the test script test-network-up in the /sbin directory.
2. Create the soft link S11testnetup in the /etc/rc.d/rc5.d directory pointing to /etc/rc.d/init.d/testnetup with the commands:
   cd /etc/rc.d/rc5.d
   ln ../init.d/S11testnetup testnetup
3. Install the script /etc/rc.d/init.d/testnetup.
4. Use command "systemctl enable testnetup" to enable the service testnetup.
5. Ensure the soft link S10network is in /etc/rc.d/rc5.d pointing to the script network in the /etc/rc.d/init.d directory.
6. Reboot the system.
7. Use command "systemctl status testnetup" to reveal the test results. Under Mageia 7 the test run right after $network runs will say "The system is not up."
Comment 6 Robert Munro 2019-06-27 01:52:50 CEST
Created attachment 11130 [details]
init script
Comment 7 Robert Munro 2019-06-27 01:54:26 CEST
Created attachment 11131 [details]
test script for /sbin
Comment 8 Robert Munro 2019-06-27 02:01:19 CEST
(In reply to Robert Munro from comment #5)
The link command in the above steps should be "ln -s ..." to create a soft link.
The test output will say "The network is not up." Not "The system is not up."
Comment 9 Robert Munro 2019-06-27 12:43:46 CEST
Objectively there can be no doubt that a bug was introduced in Mageia 7 that was not present in Mageia 6. This bug leads to systemd starting SYSV init scripts after the SYSV network script has ended but before the network is actually up.

That this bug exists is proven in the forum thread linked above in Comment 1, which includes systemctl status logs and summarizes the situation as follows:

"To be clear, Mageia 6 doesn't start the bastille script until after the command subtasks issued by the network script end, but Mageia 7 starts the bastille script four seconds earlier, after the network script reports it has completed execution, but before the command subtasks end. Thus under Mageia 7 the network is not really up when the bastille script starts, which causes the bastille iptables netfilter firewall script to fail."

It appears that some bright spark decided to shorten the Mageia 7 bootup time by four (4) seconds by exiting the network startup script before network interface startup commands complete processing, but Mageia 6 waits for those to finish.

To fix this bug, reverse that change.

I don't care what you do, because I have a DSL line with a static IP address, so I just use my internet facing interface name and IP address to setup my firewall.

But people who have dhcp connections might have problems.
Comment 10 Frank Griffin 2019-06-27 19:23:08 CEST
It's not quite that simple.  One issue that surfaced during the previous discussion is what does it mean for the network to be up ?  Link beat signal from the NIC ?  Assignment of an IP address ?  Which NIC (if you have more than one) ?   

Anyway, as I said before, this problem has been around for years and predates systemd by quite a bit.  The concept of "network-up" has also been around for years, to attempt to block until a best guess at network available (which wasn't always correct).

The original application for this dealt with systems that authenticate logins via LDAP.  That obviously can't work unless the network is available, and "network-up" or whatever we called it then was used to delay display of the DM login screen until LDAP was usable.  Several of us who wanted to see accurate hostname and domain names on the DM login screen pretended we were using LDAP (I think it was called "network authentication") to make this happen.

The point is that this having worked for your particular case in MGA6 was "luck of the draw".  I doubt that anybody threw a switch that broke it for MGA7.  More likely, other environmental and kernel changes have altered the timing of the relative boot activities in such a way that your bastille now starts before it used to.

You have two choices: code a systemd service file for bastille that depends on one of the services that doktor5000 mentioned, or else make it depend on some other service which does this.
Comment 11 Robert Munro 2019-06-28 01:46:23 CEST
This bastille firewall script didn't just happen to work "for my particular case" in Mageia 6. Actually, it has worked since Linux Mandrake, before Mageia existed. And I don't have just two choices. I've taken a third choice to work around the Mageia 7 network-not-up-yet bug and set up my iptables netfilter firewall early. Although I hope Mageia finds and fixes this bug, I've fixed my system: I'm done.
Comment 12 Lewis Smith 2019-06-28 18:58:57 CEST
(In reply to Robert Munro from comment #11)
> I've taken a third choice
> to work around the Mageia 7 network-not-up-yet bug and set up my iptables
> netfilter firewall early. Although I hope Mageia finds and fixes this bug,
> I've fixed my system: I'm done.
Thank you for your persistence, and for having won through.

Despite:
> compatible with SysV and LSB init scripts
> a drop-in replacement for sysvinit
life is never that simple for refined DIY usage.
The forum thread
 https://forums.mageia.org/en/viewtopic.php?f=15&t=12829
and this bug are both very long, with generic conclusions about re-doing SysV scripts as systemd ones, or using surer 'network up' indicators.
I think this bug could be closed now, but hesitate to do so myself.

CC: (none) => lewyssmith

Comment 13 Robert Munro 2019-06-28 19:37:17 CEST
What I've done is build a workaround for my own system that doesn't fix the bug, so no, this bug report should not be closed. If Mageia 7 developers didn't make the change that created this bug, then this bug report should be sent upstream.
Comment 14 Aurelien Oudelet 2020-08-19 22:29:32 CEST
Currently, the only good solution to have scripts booting after network availability is to write a .service file which rely on these target:
network-auth.service which you have to enable first or
networkmanager-wait-online.service in case you're using networkmanager,
or systemd-networkd-wait-online.service if you use native systemd-networkd to manage your network connectivity.

Sadly, Bastille is not in our repos, closing this.

Resolution: (none) => INVALID
Status: NEW => RESOLVED
CC: (none) => ouaurelien