Bug 15060 - network in virtualbox is horribly slow (NAT)
Summary: network in virtualbox is horribly slow (NAT)
Status: RESOLVED FIXED
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: 5
Hardware: i586 Linux
Priority: High critical
Target Milestone: ---
Assignee: Thomas Backlund
QA Contact:
URL: https://www.virtualbox.org/ticket/15295
Whiteboard: FOR_ERRATA MGA5TOO
Keywords: UPSTREAM
Depends on: 18944
Blocks:
  Show dependency treegraph
 
Reported: 2015-01-16 15:09 CET by David Walser
Modified: 2020-08-26 08:47 CEST (History)
7 users (show)

See Also:
Source RPM: kernel, virtualbox
CVE:
Status comment:


Attachments

Description David Walser 2015-01-16 15:09:53 CET
Originally noticed in pre-release Mageia 4 testing, the issue is still present.  I could not find a bug report for it from back then.  Mageia 3 and previous releases did not exhibit this problem.

With a local Cauldron mirror (or Mageia 4 mirror) on a host machine running Virtualbox, booting a VM with boot.iso to bootstrap the installer, you can first see the slowness when stage1 downloads stage2.  Then in stage2 as it tries to retrieve hdlists or packages, it is very slow.  You can see in one of the virtual consoles curl downloading them very slowly (speed hovering around 30000).

Reproducible: 

Steps to Reproduce:
Comment 1 Manuel Hiebel 2015-01-20 21:49:25 CET
well it's the same with an installed mga5 vm since some months :/
bridge is working well, I use it as workaround
Comment 2 Manuel Hiebel 2015-01-20 21:54:21 CET
I forget, issue should be only with nat mode
Comment 3 David Walser 2015-01-20 22:27:23 CET
Indeed I am using NAT mode.
Comment 4 David Walser 2015-01-26 01:08:29 CET
OK, in Cauldron just installing updates I am also seeing similar slowness now as curl downloads packages.  I just tested loading the Mageia 3 installer through boot.iso and stage2 loads quickly as it did in the past.  So, this isn't simply a virtualbox issue, some package (maybe the kernel) that's newer in Mageia 4 and Cauldron is causing the problem.

CC: (none) => tmb

Comment 5 David Walser 2015-01-30 23:29:22 CET
I've also confirmed the problem in installed systems in Mageia 4 and Cauldron.  Mageia 3 works fine and can download a 160MB hdlist file in a couple seconds.  Mageia 4 and Cauldron estimate they'll take 1-2 hours.  It's not specific to curl either.

Severity: normal => critical

Comment 6 David Walser 2015-01-31 23:30:01 CET
I installed the Mageia 3 kernel on the Mageia 4 VM, and downloads are fast like they were on the Mageia 3 VM.  I feel a bisection coming on...

Assignee: ennael1 => tmb
CC: (none) => ennael1
Source RPM: drakx-installer-images-2.15-6.mga5.src.rpm => kernel

Comment 7 David Walser 2015-02-01 02:36:06 CET
Testing various kernel versions, I've found that 3.18 in Cauldron is the worst, as download speeds are consistently awful.  With 3.14 and 3.12 from Mageia 4, as well as 3.11.4 built from the kernel package revision 490924, it's inconsistent.  It usually starts very slow, but then has some periods of going fast.  Sometimes it'll start slow and then zip to the finish, and sometimes it will vary back and forth between fast and slow.  So there's probably a couple of points in the development that contributed to the degraded performance, but the first happened between 3.10 and 3.11, so that's the place to start.
Manuel Hiebel 2015-02-01 16:19:16 CET

Summary: downloading with curl in the installer in virtualbox is horribly slow (stage1 and stage2) => network in virtualbox is horribly slow (NAT)

Comment 8 Manuel Hiebel 2015-02-01 16:20:36 CET
(btw last time I searched I found some related issue on other distro, new version of virtualbox was supposed to fix it, but nop, not checked recently)
Comment 9 David Walser 2015-02-16 03:45:58 CET
(In reply to David Walser from comment #7)
> Testing various kernel versions, I've found that 3.18 in Cauldron is the
> worst, as download speeds are consistently awful.  With 3.14 and 3.12 from
> Mageia 4, as well as 3.11.4 built from the kernel package revision 490924,
> it's inconsistent.  It usually starts very slow, but then has some periods
> of going fast.  Sometimes it'll start slow and then zip to the finish, and
> sometimes it will vary back and forth between fast and slow.  So there's
> probably a couple of points in the development that contributed to the
> degraded performance, but the first happened between 3.10 and 3.11, so
> that's the place to start.

As far as the place between 3.10 and 3.11 where it started to go bad, I did a bisection for that (it took a long time!) and ended up with this commit:
https://github.com/torvalds/linux/commit/30f3a40f9a2a2869a560a9cb9ef488d10c803e14

Unfortunately, I reverted that commit from 3.14.15 sources that I had laying around, and it didn't fix the problem.

The last git bisect good commit before the bisection ended in a string of bads was this one:
https://github.com/torvalds/linux/commit/aa63e18e3ddad4eb15d4af34ae66e7f4dcc7a6d0

So, I don't understand this bisection thing, because those commits aren't right next to each other.  Those commits are on these pages:
https://github.com/torvalds/linux/commits/v3.11?page=239
https://github.com/torvalds/linux/commits/v3.11?page=236

So maybe the breakage is between them still?  But this doesn't make sense either, as both of those commits are *before* 3.10.0, which was where I started the bisection!

Here is my bisect log:
git bisect start
# bad: [6e4664525b1db28f8c4e1130957f70a94c19213e] Linux 3.11
git bisect bad 6e4664525b1db28f8c4e1130957f70a94c19213e
# good: [8bb495e3f02401ee6f76d1b1d77f3ac9f079e376] Linux 3.10
git bisect good 8bb495e3f02401ee6f76d1b1d77f3ac9f079e376
# good: [8b70a90cabafb6a6e1a0d3f838b38355fe48337e] Merge branch 'for-v3.11' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping
git bisect good 8b70a90cabafb6a6e1a0d3f838b38355fe48337e
# bad: [b41e6a51d57e231d2ed237f17f002cc536c0987c] sh_eth: SH_ETH should depend on HAS_DMA
git bisect bad b41e6a51d57e231d2ed237f17f002cc536c0987c
# good: [2e17c5a97e231f3cb426f4b7895eab5be5c5442e] Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux
git bisect good 2e17c5a97e231f3cb426f4b7895eab5be5c5442e
# bad: [07cc61bfc0e5d9da80e353365717d45d29db0081] xen-netback: double free on unload
git bisect bad 07cc61bfc0e5d9da80e353365717d45d29db0081
# good: [0a4db187a999c4a715bf56b8ab6c4705b524e4bb] Merge branch 'll_poll'
git bisect good 0a4db187a999c4a715bf56b8ab6c4705b524e4bb
# good: [aa63e18e3ddad4eb15d4af34ae66e7f4dcc7a6d0] cw1200: Sanity-check arguments in copy_from_user()
git bisect good aa63e18e3ddad4eb15d4af34ae66e7f4dcc7a6d0
# bad: [5d21cb70db0122507cd18f58b4a9112583c1e075] tipc: allow implicit connect for stream sockets
git bisect bad 5d21cb70db0122507cd18f58b4a9112583c1e075
# bad: [4a2e667ac15edd19b02321bc030acb3ebeb22ab6] Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
git bisect bad 4a2e667ac15edd19b02321bc030acb3ebeb22ab6
# bad: [5eaedf31319d5f80eaaee1eec8dd18c0b452f0d1] gianfar: Add backwards compatible Single Queue mode polling
git bisect bad 5eaedf31319d5f80eaaee1eec8dd18c0b452f0d1
# bad: [da5bab079f9b7d90ba234965a14914ace55e45e9] net: udp4: move GSO functions to udp_offload
git bisect bad da5bab079f9b7d90ba234965a14914ace55e45e9
# bad: [da12c90e099789a63073fc82a19542ce54d4efb9] netlink: Add compare function for netlink_table
git bisect bad da12c90e099789a63073fc82a19542ce54d4efb9
# bad: [867a59436fc35593ae0e0efcd56cc6d2f8506586] bridge: Add a flag to control unicast packet flood.
git bisect bad 867a59436fc35593ae0e0efcd56cc6d2f8506586
# bad: [9ba18891f75535eca3ef53138b48970eb60f5255] bridge: Add flag to control mac learning.
git bisect bad 9ba18891f75535eca3ef53138b48970eb60f5255
# bad: [30f3a40f9a2a2869a560a9cb9ef488d10c803e14] net: remove last caller of skb_tail_offset() and itself
git bisect bad 30f3a40f9a2a2869a560a9cb9ef488d10c803e14
# first bad commit: [30f3a40f9a2a2869a560a9cb9ef488d10c803e14] net: remove last caller of skb_tail_offset() and itself
Comment 10 David Walser 2015-02-16 03:46:27 CET
At this point, I guess the more fruitful effort would be determining where things got worse between 3.14 and 3.18.
Comment 11 David Walser 2015-02-18 02:16:18 CET
This thing is very hard to pin down.  Did some more testing, and even with 3.18.3, sometimes it'll just stay slow, and sometimes it'll do the start slow for a while and then go fast and finish thing, or fast then alternate before it finishes.

3.15.2 from our package r640211 seems similar to 3.14, but maybe a bit worse.

3.17.4 from r798117 seems noticeably worse than 3.15.2.

I did some Googling, and found this:
http://superuser.com/questions/850357/how-to-fix-extremely-slow-virtualbox-network-download-speed

I tried booting my Cauldron VM to virtio and it wouldn't boot all the way, but in single user mode I could ifup eth0 and downloads were fast again, so I'm not sure why it wouldn't boot all the way in normal mode, but that could be a workaround, and it would suggest that any issues would have been introduced in the e1000 driver.  However, the only commit to that between 3.10 and 3.11 is this one:
https://github.com/torvalds/linux/commit/ede23fa8161c1a04aa1b3bf5447812ca14b3fef1#diff-bdebde8d78dd80e7601c1394bbcfe72e

which looks like basically a cosmetic change that shouldn't make any functional difference.

I'm back to thinking the best thing would be to find where the problem actually started, but having tried to do that unsuccessfully already, I'm not sure where to go from here.
Thierry Vignaud 2015-02-18 11:30:26 CET

CC: (none) => thierry.vignaud
Priority: Normal => release_blocker

Comment 12 Manuel Hiebel 2015-02-18 12:26:07 CET
           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|Normal                      |release_blocker
                 CC|                            |thierry.vignaud@gmail.com

how can this be a blocker as it's the case since mageia4 ? :)
Comment 13 David Walser 2015-02-18 14:43:25 CET
Yeah, ideally it would be fixed before release, as it makes installing or upgrading VMs in Virtualbox take way too long.  It was the case in mga4, but it's even a bit worse now.  Unfortunately, I don't have much hope that this can be fixed, as I've done all I can and put over three full days into it and gotten basically nowhere.  I'm not even sure what else to do.
William Kenney 2015-03-14 20:49:46 CET

CC: (none) => wilcal.int

Comment 14 Rémi Verschelde 2015-03-19 09:06:34 CET
Thomas, do you have any comment about this bug? Since you're the assignee and it looks like a long-running kernel regression, I leave it up to you to determine if it should stay a release_blocker, or if it's something to work on for a post-release kernel update.

CC: (none) => remi

Comment 15 Rémi Verschelde 2015-03-19 09:07:08 CET
Note that if we don't fix it before the release, it should probably be mentioned in the errata.
Comment 16 Thierry Vignaud 2015-03-19 09:41:29 CET
We could patch VBox to default to virtio for network intf for Linux hosts...
Comment 17 William Kenney 2015-03-25 20:08:28 CET
Testing here against:

http://www.att.com/speedtest/

AT&T is my internet provider. The above site is on their system
so connection from me to them does not go outside the AT&T network.
Their techs use it during an install.

Host system:

M4.1 i7 x86_64
Dn 24.71 Mbps
Up  5.58 Mbps

M4.1 Vbox Client, i586
Dn 24.57 Mbps
Up  4.76 Mbps

M4.1 Vbox Client, x86_64
Dn 24.75 Mbps
Up  4.75 Mbps

M5 Vbox Client, i586
Dn 25.04 Mbps
Up  4.75 Mbps

M5 Vbox Client, x86_64
Dn 24.75 Mbps
Up  4.75 Mbps

Is this more likely to occur on really big files?
Comment 18 William Kenney 2015-03-25 20:18:28 CET
If using the M5 Vbox Client, x86_64  I attempt to download:

mirrors.kenel.org/mageia/iso/cauldron/Mageia-5-beta3-LiveCD-KDE4-en-i586-CD/Mageia-5-beta3-LiveCD-KDE4-en-i586-CD.iso

That gets horribly slow.
Comment 19 Thierry Vignaud 2015-03-25 21:15:20 CET
You should both report what net driver you're using (virtio or some emulated one) and the host OS...
Comment 20 Thomas Backlund 2015-03-25 21:21:23 CET
(In reply to Thierry Vignaud from comment #16)
> We could patch VBox to default to virtio for network intf for Linux hosts...

Heh, vell I see reports on lkml about virtio getting in trouble on vm's too :)

for those having the problem, try:

ethtool -K NIC gro off

(replace "NIC" with the real name of the nic in the vm)
Comment 21 David Walser 2015-03-26 01:41:07 CET
Your internet connection has no relevance to this bug.  This is only about the guest's network communication with the host itself, i.e., if your local mirror is on the host machine.
Comment 22 William Kenney 2015-03-26 04:09:45 CET
OK, I've got an NFS server on which are all the M5RC isos. Lets use the
M5RC i586 KDE Live-CD. That's an about 700MB file. The directory that
the isos are in are part of a directory that is in an NFS share. So I
launch a M5RC i586 KDE client that mounts the NFS share and has filezilla
installed. The M4.1 Vbox Host machine, i7 16GB x86_64, transfers big files
from the NFS server at about 11MiB/s. If I launch Filezilla in the Vbox
client and attempt to FTP the Live-CD about 2/3's of the time it runs at
about 11MiB/s. Sometimes, about 1/3 of the time, it gets to about 80%
through that 700MB then slows down to a crawl all but stopping.
But, If I open Dolphin and copy the file from the NFS share mounted
in the Client I can copy the file repeatedly at 11MiB/s. No problem.
I hope this helps.
Comment 23 David Walser 2015-03-28 18:21:46 CET
(In reply to Thomas Backlund from comment #20)
> (In reply to Thierry Vignaud from comment #16)
> > We could patch VBox to default to virtio for network intf for Linux hosts...
> 
> Heh, vell I see reports on lkml about virtio getting in trouble on vm's too
> :)
> 
> for those having the problem, try:
> 
> ethtool -K NIC gro off
> 
> (replace "NIC" with the real name of the nic in the vm)

OMG that fixes it!!!!!!
Comment 24 Sander Lepik 2015-04-05 23:10:10 CEST
Is this still release critical or can we add the tip in comment #20 into errata and decrease priority?

CC: (none) => mageia

Comment 25 Anne Nicolas 2015-04-05 23:13:09 CEST
I guess so. David would you be ok to do it?
Comment 26 David Walser 2015-04-06 00:41:38 CEST
Why add a tip in the errata (and how would you even enter that command during stage1?  I don't think it's possible?).  It sounds like tmb figured out the problem, now someone just needs to use that to implement some sort of solution.
Comment 27 Rémi Verschelde 2015-04-29 11:48:58 CEST
Thomas, could you give some more details about why we can't/shouldn't implement the workaround, and what should be put in the errata?

CC'ing docteam to handle the errata part, and decreasing priority.

CC: (none) => doc-bugs
Whiteboard: (none) => ERRATA
Priority: release_blocker => High

Samuel Verschelde 2015-05-31 21:57:26 CEST

Whiteboard: ERRATA => FOR_ERRATA

Comment 28 William Kenney 2015-06-27 18:41:57 CEST
So lets update the status of this. I don't see it mentioned in:

https://wiki.mageia.org/en/Mageia_5_Errata

Also the fix as suggested in #20 does in fact work but is there a way
to put this command into the boot up sequence of the Client such that you
don't have to enter it after getting to a working desktop.

Another issue is that it appears that this is an issue during the client install.
Especially if your using a boot.iso file. Is there a way to get around that?
Comment 29 Thierry Vignaud 2016-06-18 00:40:16 CEST
Might be fixed with VirtualBox 5.0.22 according to its changelog.
Can you confirm?

Keywords: (none) => NEEDINFO
Source RPM: kernel => kernel, virtualbox

Comment 30 William Kenney 2016-06-18 01:29:09 CEST
Thierry we have a serious urgent problem with Vbox right now. Go to:

https://bugs.mageia.org/show_bug.cgi?id=18724
Comment 31 Thierry Vignaud 2016-06-18 05:57:33 CEST
Of which I'm aware (I'm already CC) and which is orthogonal to this one, which can be tested with a cauldron host
Comment 32 Thierry Vignaud 2016-06-18 15:39:09 CEST
Not an installer bug.

Component: Installer => RPM Packages

Comment 33 David Walser 2016-06-18 19:13:10 CEST
Definitely fixed in 5.0.22 (even if only the host is updated).  Nice catch Thierry!  We can mark this FIXED once 5.0.22 is pushed in Mageia 5.

Version: Cauldron => 5
Keywords: NEEDINFO => (none)

Comment 34 William Kenney 2016-06-18 20:35:18 CEST
(In reply to David Walser from comment #33)

> Definitely fixed in 5.0.22 (even if only the host is updated).  Nice catch
> Thierry!  We can mark this FIXED once 5.0.22 is pushed in Mageia 5.

This will be a big help.
Comment 35 Thomas Backlund 2016-06-18 20:38:03 CEST
(In reply to David Walser from comment #33)
> Definitely fixed in 5.0.22 (even if only the host is updated).

Yes, thats because its the host side ioapic code that was broken, and now finally rewritten / fixed upstream
Comment 36 Thierry Vignaud 2016-06-21 14:09:34 CEST
(In reply to David Walser from comment #33)
Let's close it now. The fix is there, it's just not yet approved for global distribution as an update yet (thus making easier to browse bug lists).

(In reply to William Kenney from comment #34)
> This will be a big help.

You can use virt-manager else.

Status: NEW => RESOLVED
Resolution: (none) => FIXED

David Walser 2016-06-21 20:10:05 CEST

Depends on: (none) => 18727

Comment 37 David Walser 2016-06-29 13:46:35 CEST
Nope, we never released this, and now we won't.  5.0.24 is out, and they have reverted the fix upstream.

Resolution: FIXED => (none)
Status: RESOLVED => REOPENED

Comment 38 Thierry Vignaud 2016-06-29 16:24:09 CEST
@Thomas: maybe could we readd back the new code and make its usage conditionnal on the guest type?
Comment 39 David Walser 2016-06-29 16:25:46 CEST
Do we know why they reverted the new io-apic code?  What problems was it causing?

I wonder if it has anything to do with the kernel module loading / Xorg issues.
Comment 40 Thierry Vignaud 2016-06-29 16:27:04 CEST
aka whether it's Windows/unknown or unix.
Or whether it's Linux or not.

Version: 5 => Cauldron
Whiteboard: FOR_ERRATA => FOR_ERRATA MGA5TOO
Keywords: (none) => UPSTREAM
URL: (none) => https://www.virtualbox.org/ticket/15295
See Also: (none) => https://www.virtualbox.org/ticket/15529

Comment 41 Thierry Vignaud 2016-06-29 16:28:11 CEST
(In reply to David Walser from comment #39)
See the URL I added: https://www.virtualbox.org/ticket/15529
It's Windows guest locks
Comment 42 Thierry Vignaud 2016-06-29 16:28:51 CEST
or else Thomas, could we auto disable GRO if detecting a VBox hypervisor?
Comment 43 Thomas Backlund 2016-06-29 16:29:29 CEST
@David, mostly Windows vms.

@Thierry, I will take a look on how intrusive it would be...

Version: Cauldron => 5

Comment 44 Thierry Vignaud 2016-06-29 16:36:53 CEST
it could be done in userspace.
If drakx detects vbox, we install one more extra package that run sg like:

exit if  systemd-detect-virt doesn't return "oracle"
foreach NIC { ethtool -K NIC gro off }"

We could call the same script from within drakx in order to cover installing.
Sadly that wouldn't cover stage1 for those doing net install from within vbox: downloading stage2 would still be slow...
Comment 45 David Walser 2016-06-29 16:38:35 CEST
It also wouldn't help stage1, but for the installed system, couldn't drakx just make use of the ETHTOOL_OPTS in /etc/sysconfig/network-scripts/ifcfg-? to add the gro off option?
Comment 46 Thierry Vignaud 2016-07-08 10:19:34 CEST
5.1beta1 seems to be properly fixed:
https://forums.virtualbox.org/viewtopic.php?f=15&t=77998&utm_source=anzwix
Comment 47 Thierry Vignaud 2016-07-08 10:22:28 CEST
Wrong link, 5.1 RC:
https://forums.virtualbox.org/viewtopic.php?f=15&t=78505
"VMM: enable X2-APIC for Linux guests by default."

Thomas, if you could backport that fix...
David Walser 2016-07-13 12:35:57 CEST

Depends on: 18727 => (none)

Comment 48 Thierry Vignaud 2016-07-14 22:59:16 CEST
VBox 5.1 final is out with that fix:
https://www.virtualbox.org/wiki/Changelog#v24
https://blogs.oracle.com/virtualization/entry/oracle_vm_virtualbox_5_14

We should backport it in mga5...
Thierry Vignaud 2016-07-14 22:59:36 CEST

Depends on: (none) => 18944

Comment 49 David Walser 2016-07-26 21:13:59 CEST
Fixed in 5.1.2, which was just pushed :o)

Resolution: (none) => FIXED
Status: REOPENED => RESOLVED

Comment 50 Isabell Gracia 2020-08-26 08:32:34 CEST Comment hidden (spam)

CC: (none) => work.isabellgracia

Aurelien Oudelet 2020-08-26 08:47:11 CEST

CC: work.isabellgracia => (none)


Note You need to log in before you can comment on or make changes to this bug.