Bug 19566 - long timeouts at boot when network interfaces topology changes
Summary: long timeouts at boot when network interfaces topology changes
Status: NEW
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: Cauldron
Hardware: All Linux
Priority: Normal major
Target Milestone: ---
Assignee: Mageia tools maintainers
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-10-11 18:17 CEST by Giuseppe Ghibò
Modified: 2020-12-29 11:59 CET (History)
7 users (show)

See Also:
Source RPM: drakx-net-2.52-1.mga8.src.rpm
CVE:
Status comment:


Attachments

Description Giuseppe Ghibò 2016-10-11 18:17:17 CEST
Description of problem:

When the network interfaces topology changes (e.g. you add/remove or change a network card, e.g. you plug an USB ethernet adapter), the next reboot has very very long timeouts, that could resemble a freeze, either 4-6 minutes and more, before you get the other services starting and get finally the login prompt.

I tried to track down the problem into the sources, and it seems that in /usr/share/harddrake/service_harddrake there is a first call, for each network interface added/removed:

        harddrake::autoconf::network_conf($modules_conf, $in, [ @ID{@added} ]);

i.e. it calls network_conf() which is defined in the file /usr/lib/libDrakX/harddrake/autoconf.pm, which calls setup_ethernet_device() and other functions which call other functions. At the end of the nesting the function/subroutine write_hostname() is called, which is defined in the file /usr/lib/libDrakX/network/network.pm at the lines around 290 as this:

sub write_hostname {
    #- ovitters: adding 127.0.0.1 to /etc/hosts is obsolete as nss-myhostname handles it
    my ($hostname) = @_;

    addVarsInSh($::prefix . $network_file, { HOSTNAME => $hostname }, qw(HOSTNAME));
    output($::prefix . $hostname_file, $hostname || "localhost");

    unless ($::isInstall) {
        my $rc = syscall_("sethostname", $hostname, length $hostname);
        run_program::run("/usr/bin/run-parts", "--arg", $hostname, "/etc/sysconfig/network-scripts/hostname.d");
    }
}

The real slow down occurs when there is a syscall to the command:

run-parts -arg localhost.localdomain /etc/sysconfig/network-scripts/hostname.d

this run-parts program, calls each scripts in /etc/sysconfig/network-script /hostname.d with the argument "localhost.localdomain" (or any other hostname). Apart that localhost.localdomain is not recognized as loopback device (either because it's not automatically added in /etc/hosts for IP 127.0.0.1, but also because many network scripts checks for "localhost" only as loopback exception but not for the localhost.localdomain counterpart), but the directory /etc/sysconfig/network-scripts/hostname.d contains two scripts: one is "avahi", and the other is "s2u". s2u is alredy infamous for the bug https://bugs.mageia.org/show_bug.cgi?id=15737 which is not yet fixed. This program should call dbus-send executable and send a dbus message on hostname change to all running X11 dbus sessions. What is strange is that at the time of the call the X11 server is not yet started due to some systemd priority when starting the graphical server.

The trouble however occurs with avahi, which is called several times (I cound at least 3 with 1 interface move), with always the same localhost.localdomain (or just localhost) as argument, in particular, with:

su avahi -s /bin/bash -c  "avahi-set-host-name $1"

which is translated as a call to

"avahi-set-host-name localhost.localdomain"

This program, i.e. "avahi-set-host-name localhost", never gets completed or ends. If you call "avahi-set-host-name localhost.localdomain" standalone once the timeout has expired it might just complain with the error "Failed to create host name resolver: Invalid host name" for localhost.localdomain, or just exits, but is not the same as when called from within network.pm when the network interfaces change and after a boot. In fact the avahi call never gets completed and rather each time is killed by the two minutes timeout imposed originally in service_harddrake. Maybe also a bug in avahi.

IMHO the entire process given the information above, as well as given the new systemd and avahi, is pretty weak, and should be subjected to a review.
Giuseppe Ghibò 2016-10-11 18:19:17 CEST

CC: (none) => shlomif, thierry.vignaud

Marja Van Waes 2016-10-13 12:07:59 CEST

CC: (none) => marja11
Assignee: bugsquad => mageiatools

Angelo Naselli 2016-10-13 19:07:23 CEST

CC: (none) => anaselli

Florian Hubold 2018-01-21 14:41:28 CET

CC: (none) => doktor5000

Comment 1 papoteur 2020-06-06 13:38:04 CEST
It seems that this bug is still valid.

avahi-set-host-name localhost.localdomain
Failed to create host name resolver: Invalid host name

CC: (none) => yves.brungard_mageia

Comment 2 papoteur 2020-06-07 22:39:21 CEST
We found, with neoclust, that 
avahi-set-host-name localhost
is working. 
It seems that the hostname submitted through this command should not comprise the domain.
Comment 3 Giuseppe Ghibò 2020-06-08 11:43:47 CEST
Maybe we should strace it to find where is the timeout
Comment 4 Aurelien Oudelet 2020-12-29 11:59:36 CET
SRPM updated.

This is still the case?

Source RPM: drakx-net-2.27-1.mga6.src.rpm => drakx-net-2.52-1.mga8.src.rpm
CC: (none) => ouaurelien


Note You need to log in before you can comment on or make changes to this bug.