| Summary: | Network Test fails adding first connection after install | ||
|---|---|---|---|
| Product: | Mageia | Reporter: | Derek Jennings <derekjenn> |
| Component: | RPM Packages | Assignee: | Olivier Blin <mageia> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | normal | ||
| Priority: | Normal | CC: | davidwhodgins, mageia, piscestong, thierry.vignaud, tmb |
| Version: | 2 | Keywords: | PATCH |
| Target Milestone: | Mageia 3 | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | 3alpha1 | ||
| Source RPM: | drakx-net | CVE: | |
| Status comment: | |||
| Attachments: |
Wireshark log during connection test
Patch to netconnect.pm Revised patch for netconnect.pm |
||
|
Description
Derek Jennings
2012-05-10 19:30:35 CEST
Created attachment 2250 [details]
Wireshark log during connection test
Note: when simulating the failure with a vbox it is not actually necessary to disconnect/reconnect the cable. Deleting the existing connection in MCC is all that is required.
Attached is wireshark log. As soon as DHCP is finished mDNS messages start appearing, but there is no sign of an network test taking place. I suspect the test is simply not starting.
I'm waiting for your patch :-) CC:
(none) =>
thierry.vignaud This bug exists both in Mageia 2 RC and in Mageia 2 beta 3. My computer connects to the Internet via PPPoE, so I need to add that connection manually. Whether I add it after I boot into the LiveCD or after I install from the LiveCD, the network test always fails while the network connection is actually up and working. CC:
(none) =>
piscestong It appears there are two bugs causing this test to fail. The first is our old friend bug 5772 If I use this patch in tools.pm that part of the test passes sub connected { if ($::isInstall) { symlink "$::prefix/etc/resolv.conf", "/etc/resolv.conf" if ! -e "/etc/resolv.conf"; } return scalar grep { /1 received/ } `$::prefix/bin/ping -qc1 www.mageia.org`; } The second problem is in network::connection::get_status which is supposed to return if a gateway is defined by calling network::tools::get_interface_status unfortunately decoding get_interface_status is straining my miniscule perl skills Ok After having to learn how to use the perl debugger I have confirmed this bug is due to a timing isse in network:netconnect around line 315
if (!$::isInstall) {
services::start('network-up');
} else {
my $timeout = $connection->get_up_timeout;
while ($timeout--) {
my $status = $connection->get_status;
last if $status;
sleep 1;
}
}
$success = $connection->get_status();
It test for the connection status without waiting for the network to come up (except during install when it does wait)
If I comment out the } else {
it works perfectly.
Sorry, cancel Comment 5 After refreshing by Virtual Box it is failing again :-( Created attachment 2268 [details]
Patch to netconnect.pm
No cancel my cancel. I was right the first time. I had left the virtual ethernet cable disconnected in my Virtual Box. Doh!
Attached is patch to netconnect.pm to wait for the network to come up before testing for connectivity. There is no need to do anything to tools.pm
For the record, $::isInstall is set in installer but not in drakconnect, when run as a standalone tool. So you're making drakconnect run that block. Maybe should we just drop it since it's never run by drakconnect but only in drakx where we've failures. WDYT Blino? BTW, use "diff -u" next time in order to have better patches Created attachment 2269 [details]
Revised patch for netconnect.pm
This patch is better. The last one screws up the install.
Attachment 2268 is obsolete:
0 =>
1 oops. network-auth is a NoOp now. That might be the real issue. Keywords:
(none) =>
PATCH network-auth has always been a noop. It's just a virtual script that changes ordering. It still has effect when enabled - i.e. it will delay prefdm.service startup until network-up.service is complete. I am pretty confident it is a timing issue.
If I run with the perl debugger and pause execution just before $connection->get_status() then the test passes.
making it do that block in attachment 2269 [details] makes it delay long enough for the connection to be ready before testing it.
Could it be that the fact the code is calling service::start("network-up")? If it is already logged as running (systemctl status network-up.service) then systemd won't start it again.
e.g. compare:
systemctl start network-up.service
vs
systemctl try-restart network-up.service
the latter takes much longer.
So maybe the fix is simply to use service::restart rather than service::start?
Could you maybe test such a fix?
I have tried using service::restart no luck. Does not work.
We have a working patch in attachment 2269 [details] Why not use it?
(In reply to comment #14) > We have a working patch in attachment 2269 [details] Why not use it? Because I do not think it solves the problem properly, it's just a hacky solution that unconditionally injects a delay and doesn't actually analyse the problem itself fully. The patch you refer to unconditionally runs code meant only for the installer to always run, even when network-up script exists to do this job. Now in this case, because we start the services in non-blocking mode (for other reasons on another bug) the actually command executed will be non-blocking.... So we actually need a fix here to use blocking mode in this particular case. (I knew this would come back to bite me TV :p) You can likely use: run_program::rooted($::prefix, '/bin/systemctl', 'restart', 'network-up.service'); (rather than the "services::start('network-up')") as a hacky test, but there may be other reasons that this might not work. Probably easier just to add a non-block option to _run_action() and allow restart to use it. Sadly the initscript doesn't have a "restart" action, so it would need to be modified as well to ensure it works with both sysvinit and systemd. Not sure this qualifies as a release blocker so it might have to wait until after release for an update. However, the other option is simply to ditch using network up at all here and only use the code that the installer normally uses (i.e. remove some more code in your patch). This would likely be fine as I think using network-up here is actually overkill. TV, WDYT? Blino WDYT? Having looked at how services::start is structured I can see you are probably concerned this may be a bug introduced by the move to systemd.
So I did some experiments.
First of all I forced it to use systemd, then /etc/rc.d/init.d/ : No change
Next I measured the time between executing services::start('network-up') and a gateway address appearing in 'route'. It takes between 1 and 2 seconds for the gateway to appear (Limited by the 1 second resolution of my timer loop)
So if you refer to my patch the test succeeds on the third pass of the loop.
Next I removed the --no-block option from systemctl - No change. It still takes 1-2 seconds for route to be updated
Hi, This bug was filed against cauldron, but we do not have cauldron at the moment. Please report whether this bug is still valid for Mageia 2. Thanks :) Cheers, marja Keywords:
(none) =>
NEEDINFO This bug is still valid on Mageia 2 Version:
Cauldron =>
2
Manuel Hiebel
2012-05-27 17:14:24 CEST
Keywords:
NEEDINFO =>
(none) Please look at the bottom of this mail to see whether you're the assignee of this bug, if you don't already know whether you are. If you're the assignee: We'd like to know for sure whether this bug was assigned correctly. Please change status to ASSIGNED if it is, or put OK on the whiteboard instead. If you don't have a clue and don't see a way to find out, then please put NEEDHELP on the whiteboard. Please assign back to Bug Squad or to the correct person to solve this bug if we were wrong to assign it to you, and explain why. Thanks :) **************************** @ the reporter and persons in the cc of this bug: If you have any new information that wasn't given before (like this bug being valid for another version of Mageia, too, or it being solved) please tell us. @ the reporter of this bug If you didn't reply yet to a request for more information, please do so within two weeks from now. Thanks all :-D
Thierry Vignaud
2012-09-04 18:06:00 CEST
Attachment 2269 filename:
patch-netconnect.pm =>
patch-netconnect.diff
claire robinson
2012-09-04 18:11:28 CEST
Target Milestone:
--- =>
Mageia 3 Fixed in git. Since blino didn't answered, and since it's good enough for drakx, then it can't hurt in standalone mode Status:
NEW =>
RESOLVED drakx-net update pushed: https://wiki.mageia.org/en/Support/Advisories/MGAA-2012-0187 CC:
(none) =>
tmb |