| Summary: | long timeouts at boot when network interfaces topology changes | ||
|---|---|---|---|
| Product: | Mageia | Reporter: | Giuseppe Ghibò <ghibomgx> |
| Component: | RPM Packages | Assignee: | Mageia tools maintainers <mageiatools> |
| Status: | NEW --- | QA Contact: | |
| Severity: | major | ||
| Priority: | Normal | CC: | anaselli, doktor5000, marja11, ouaurelien, shlomif, thierry.vignaud, yvesbrungard |
| Version: | Cauldron | ||
| Target Milestone: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Source RPM: | drakx-net-2.52-1.mga8.src.rpm | CVE: | |
| Status comment: | |||
|
Giuseppe Ghibò
2016-10-11 18:19:17 CEST
CC:
(none) =>
shlomif, thierry.vignaud
Marja Van Waes
2016-10-13 12:07:59 CEST
CC:
(none) =>
marja11
Angelo Naselli
2016-10-13 19:07:23 CEST
CC:
(none) =>
anaselli
Florian Hubold
2018-01-21 14:41:28 CET
CC:
(none) =>
doktor5000 It seems that this bug is still valid. avahi-set-host-name localhost.localdomain Failed to create host name resolver: Invalid host name CC:
(none) =>
yves.brungard_mageia We found, with neoclust, that avahi-set-host-name localhost is working. It seems that the hostname submitted through this command should not comprise the domain. Maybe we should strace it to find where is the timeout SRPM updated. This is still the case? Source RPM:
drakx-net-2.27-1.mga6.src.rpm =>
drakx-net-2.52-1.mga8.src.rpm |
Description of problem: When the network interfaces topology changes (e.g. you add/remove or change a network card, e.g. you plug an USB ethernet adapter), the next reboot has very very long timeouts, that could resemble a freeze, either 4-6 minutes and more, before you get the other services starting and get finally the login prompt. I tried to track down the problem into the sources, and it seems that in /usr/share/harddrake/service_harddrake there is a first call, for each network interface added/removed: harddrake::autoconf::network_conf($modules_conf, $in, [ @ID{@added} ]); i.e. it calls network_conf() which is defined in the file /usr/lib/libDrakX/harddrake/autoconf.pm, which calls setup_ethernet_device() and other functions which call other functions. At the end of the nesting the function/subroutine write_hostname() is called, which is defined in the file /usr/lib/libDrakX/network/network.pm at the lines around 290 as this: sub write_hostname { #- ovitters: adding 127.0.0.1 to /etc/hosts is obsolete as nss-myhostname handles it my ($hostname) = @_; addVarsInSh($::prefix . $network_file, { HOSTNAME => $hostname }, qw(HOSTNAME)); output($::prefix . $hostname_file, $hostname || "localhost"); unless ($::isInstall) { my $rc = syscall_("sethostname", $hostname, length $hostname); run_program::run("/usr/bin/run-parts", "--arg", $hostname, "/etc/sysconfig/network-scripts/hostname.d"); } } The real slow down occurs when there is a syscall to the command: run-parts -arg localhost.localdomain /etc/sysconfig/network-scripts/hostname.d this run-parts program, calls each scripts in /etc/sysconfig/network-script /hostname.d with the argument "localhost.localdomain" (or any other hostname). Apart that localhost.localdomain is not recognized as loopback device (either because it's not automatically added in /etc/hosts for IP 127.0.0.1, but also because many network scripts checks for "localhost" only as loopback exception but not for the localhost.localdomain counterpart), but the directory /etc/sysconfig/network-scripts/hostname.d contains two scripts: one is "avahi", and the other is "s2u". s2u is alredy infamous for the bug https://bugs.mageia.org/show_bug.cgi?id=15737 which is not yet fixed. This program should call dbus-send executable and send a dbus message on hostname change to all running X11 dbus sessions. What is strange is that at the time of the call the X11 server is not yet started due to some systemd priority when starting the graphical server. The trouble however occurs with avahi, which is called several times (I cound at least 3 with 1 interface move), with always the same localhost.localdomain (or just localhost) as argument, in particular, with: su avahi -s /bin/bash -c "avahi-set-host-name $1" which is translated as a call to "avahi-set-host-name localhost.localdomain" This program, i.e. "avahi-set-host-name localhost", never gets completed or ends. If you call "avahi-set-host-name localhost.localdomain" standalone once the timeout has expired it might just complain with the error "Failed to create host name resolver: Invalid host name" for localhost.localdomain, or just exits, but is not the same as when called from within network.pm when the network interfaces change and after a boot. In fact the avahi call never gets completed and rather each time is killed by the two minutes timeout imposed originally in service_harddrake. Maybe also a bug in avahi. IMHO the entire process given the information above, as well as given the new systemd and avahi, is pretty weak, and should be subjected to a review.