Bug 5830 - Network Test fails adding first connection after install
Summary: Network Test fails adding first connection after install
Status: RESOLVED FIXED
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: 2
Hardware: All Linux
Priority: Normal normal
Target Milestone: Mageia 3
Assignee: Olivier Blin
QA Contact:
URL:
Whiteboard: 3alpha1
Keywords: PATCH
Depends on:
Blocks:
 
Reported: 2012-05-10 19:30 CEST by Derek Jennings
Modified: 2012-09-09 14:23 CEST (History)
5 users (show)

See Also:
Source RPM: drakx-net
CVE:
Status comment:


Attachments
Wireshark log during connection test (10.45 KB, application/octet-stream)
2012-05-10 20:40 CEST, Derek Jennings
Details
Patch to netconnect.pm (634 bytes, patch)
2012-05-11 13:27 CEST, Derek Jennings
Details | Diff
Revised patch for netconnect.pm (1.29 KB, patch)
2012-05-11 14:24 CEST, Derek Jennings
Details | Diff

Description Derek Jennings 2012-05-10 19:30:35 CEST
Description of problem:
If Mageia is installed on a computer where no network is connected during installation then on adding connection with drakconnect the network test will fail even though the connection actually works.

How reproducible:
Seen when no network is available during install  e.g. laptops
I believe it affects wireless connections too, but this is harder for me to test in a virtual box.


Steps to Reproduce:
1. Install Mageia from DVD in a Virtual Box. Ensure the VBox Virtual Ethernet connection has its cable pulled out. (right click on the network icon in the Vbox)
Save the machine state when install is complete but before rebooting to facilitate retesting.

2. Boot into the completed install. Start MCC>drakconnect  Insert the virtual ethernet cable and go through the add a connection wizard.

3. Observe  the network connection test fails but the connection is actually up and working.

This bug is very similar to, and related to Bug 5772, but the fix for 5772 does not resolve this bug even after changing the patch to apply to an installed system.
Comment 1 Derek Jennings 2012-05-10 20:40:36 CEST
Created attachment 2250 [details]
Wireshark log during connection test

Note: when simulating the failure with a vbox it is not actually necessary to disconnect/reconnect the cable. Deleting the existing connection in MCC is all that is required.

Attached is wireshark log. As soon as DHCP is finished mDNS messages start appearing, but there is no sign of an network test taking place. I suspect the test is simply not starting.
Comment 2 Thierry Vignaud 2012-05-10 20:58:05 CEST
I'm waiting for your patch :-)

CC: (none) => thierry.vignaud
Assignee: bugsquad => mageia

Comment 3 Jin-tong Hu 2012-05-10 22:43:07 CEST
This bug exists both in Mageia 2 RC and in Mageia 2 beta 3. My computer connects to the Internet via PPPoE, so I need to add that connection manually. Whether I add it after I boot into the LiveCD or after I install from the LiveCD, the network test always fails while the network connection is actually up and working.

CC: (none) => piscestong

Comment 4 Derek Jennings 2012-05-10 23:11:20 CEST
It appears there are two bugs causing this test to fail. The first is our old friend bug 5772 If I use this patch in tools.pm that part of the test passes

sub connected {
    if ($::isInstall) {
        symlink "$::prefix/etc/resolv.conf", "/etc/resolv.conf" if ! -e "/etc/resolv.conf";
    }
        return scalar grep { /1 received/ } `$::prefix/bin/ping -qc1 www.mageia.org`;
}

The second problem is in network::connection::get_status  which is supposed to return if a gateway is defined by calling network::tools::get_interface_status unfortunately decoding get_interface_status is straining my miniscule perl skills
Comment 5 Derek Jennings 2012-05-11 12:55:33 CEST
Ok After having to learn how to use the perl debugger I have confirmed this bug is due to a timing isse in network:netconnect around line 315

                               if (!$::isInstall) {
                                   services::start('network-up');
                               } else {
                                   my $timeout = $connection->get_up_timeout;
                                   while ($timeout--) {
                                       my $status = $connection->get_status;
                                       last if $status;
                                       sleep 1;
                                    }
                               }
                               $success = $connection->get_status();

It test for the connection status without waiting for the network to come up (except during install when it does wait)

If I comment out the } else {
it works perfectly.
Comment 6 Derek Jennings 2012-05-11 13:08:31 CEST
Sorry, cancel Comment 5
After refreshing by Virtual Box it is failing again  :-(
Comment 7 Derek Jennings 2012-05-11 13:27:04 CEST
Created attachment 2268 [details]
Patch to netconnect.pm

No cancel my cancel.  I was right the first time.  I had left the virtual ethernet cable disconnected in my Virtual Box. Doh!

Attached is patch to netconnect.pm  to wait for the network to come up before testing for connectivity.  There is no need to do anything to tools.pm
Comment 8 Thierry Vignaud 2012-05-11 13:35:57 CEST
For the record, $::isInstall is set in installer but not in drakconnect, when run as a standalone tool.
So you're making drakconnect run that block.
Maybe should we just drop it since it's never run by drakconnect but only in drakx where we've failures.

WDYT Blino?

BTW, use "diff -u" next time in order to have better patches
Comment 9 Derek Jennings 2012-05-11 14:24:39 CEST
Created attachment 2269 [details]
Revised patch for netconnect.pm

This patch is better. The last one screws up the install.

Attachment 2268 is obsolete: 0 => 1

Comment 10 Thierry Vignaud 2012-05-11 15:09:22 CEST
oops. network-auth is a NoOp now.
That might be the real issue.

Keywords: (none) => PATCH
CC: (none) => mageia

Comment 11 Colin Guthrie 2012-05-12 02:04:27 CEST
network-auth has always been a noop. It's just a virtual script that changes ordering. It still has effect when enabled - i.e. it will delay prefdm.service startup until network-up.service is complete.
Comment 12 Derek Jennings 2012-05-14 21:17:13 CEST
I am pretty confident it is a timing issue.
If I run with the perl debugger and pause execution just before $connection->get_status()  then the test passes.

making it do that block in attachment 2269 [details] makes it delay long enough for the connection to be ready before testing it.
Comment 13 Colin Guthrie 2012-05-15 11:30:39 CEST
Could it be that the fact the code is calling service::start("network-up")? If it is already logged as running (systemctl status network-up.service) then systemd won't start it again.

e.g. compare:
systemctl start network-up.service

vs

systemctl try-restart network-up.service


the latter takes much longer.

So maybe the fix is simply to use service::restart rather than service::start?

Could you maybe test such a fix?
Comment 14 Derek Jennings 2012-05-15 14:54:33 CEST
I have tried using service::restart  no luck. Does not work.

We have a working patch in attachment 2269 [details] Why not use it?
Comment 15 Colin Guthrie 2012-05-15 15:14:19 CEST
(In reply to comment #14)
> We have a working patch in attachment 2269 [details] Why not use it?

Because I do not think it solves the problem properly, it's just a hacky solution that unconditionally injects a delay and doesn't actually analyse the problem itself fully.


The patch you refer to unconditionally runs code meant only for the installer to always run, even when network-up script exists to do this job.

Now in this case, because we start the services in non-blocking mode (for other reasons on another bug) the actually command executed will be non-blocking....

So we actually need a fix here to use blocking mode in this particular case. (I knew this would come back to bite me TV :p)

You can likely use:

run_program::rooted($::prefix, '/bin/systemctl', 'restart', 'network-up.service');

(rather than the "services::start('network-up')") as a hacky test, but there may be other reasons that this might not work. Probably easier just to add a non-block option to _run_action() and allow restart to use it.

Sadly the initscript doesn't have a "restart" action, so it would need to be modified as well to ensure it works with both sysvinit and systemd.

Not sure this qualifies as a release blocker so it might have to wait until after release for an update.



However, the other option is simply to ditch using network up at all here and only use the code that the installer normally uses (i.e. remove some more code in your patch). This would likely be fine as I think using network-up here is actually overkill.

TV, WDYT?
Comment 16 Thierry Vignaud 2012-05-15 16:22:00 CEST
Blino WDYT?
Comment 17 Derek Jennings 2012-05-15 21:44:57 CEST
Having looked at how services::start is structured I can see you are probably concerned this may be a bug introduced by the move to systemd.

So I did some experiments.
First of all I forced it to use systemd, then /etc/rc.d/init.d/ : No change

Next I measured the time between executing services::start('network-up') and a gateway address appearing in 'route'.  It takes between 1 and 2 seconds for the gateway to appear  (Limited by the 1 second resolution of my timer loop)

So if you refer to my patch the test succeeds on the third pass of the loop.

Next I removed the --no-block option from systemctl - No change. It still takes 1-2 seconds for route to be updated
Comment 18 Marja Van Waes 2012-05-26 13:08:18 CEST
Hi,

This bug was filed against cauldron, but we do not have cauldron at the moment.

Please report whether this bug is still valid for Mageia 2.

Thanks :)

Cheers,
marja

Keywords: (none) => NEEDINFO

Comment 19 Derek Jennings 2012-05-26 13:22:34 CEST
This bug is still valid on Mageia 2

Version: Cauldron => 2

Manuel Hiebel 2012-05-27 17:14:24 CEST

Keywords: NEEDINFO => (none)

Comment 20 Marja Van Waes 2012-07-06 15:04:48 CEST
Please look at the bottom of this mail to see whether you're the assignee of this  bug, if you don't already know whether you are.


If you're the assignee:

We'd like to know for sure whether this bug was assigned correctly. Please change status to ASSIGNED if it is, or put OK on the whiteboard instead.

If you don't have a clue and don't see a way to find out, then please put NEEDHELP on the whiteboard.

Please assign back to Bug Squad or to the correct person to solve this bug if we were wrong to assign it to you, and explain why.

Thanks :)

**************************** 

@ the reporter and persons in the cc of this bug:

If you have any new information that wasn't given before (like this bug being valid for another version of Mageia, too, or it being solved) please tell us.

@ the reporter of this bug

If you didn't reply yet to a request for more information, please do so within two weeks from now.

Thanks all :-D
Comment 21 Thierry Vignaud 2012-09-04 18:03:30 CEST
*** Bug 7335 has been marked as a duplicate of this bug. ***

CC: (none) => davidwhodgins

Thierry Vignaud 2012-09-04 18:06:00 CEST

Attachment 2269 filename: patch-netconnect.pm => patch-netconnect.diff

claire robinson 2012-09-04 18:11:28 CEST

Target Milestone: --- => Mageia 3
Whiteboard: (none) => 3alpha1

Comment 22 Thierry Vignaud 2012-09-04 18:12:40 CEST
Fixed in git.
Since blino didn't answered, and since it's good enough for drakx, then it can't hurt in standalone mode

Status: NEW => RESOLVED
Resolution: (none) => FIXED

Comment 23 Thomas Backlund 2012-09-09 14:23:19 CEST
drakx-net update pushed:
https://wiki.mageia.org/en/Support/Advisories/MGAA-2012-0187

CC: (none) => tmb


Note You need to log in before you can comment on or make changes to this bug.