Description of problem: My network card is a Nvidia corporation MCP61 ethernet. This computer was before with a windows OS. Internet connexion didn't work with live-dvd Mageai7, even live-dvd ubuntu 18.04. It didn't work after a full install with dvd mageia7. But when I put the computer to sleep, connection is working when it come back from sleep. I fixed the bug like this : I created a file /etc/modprobe.d/forcedeth.conf with inside : options forcedeth msi=0 msix=0 Or it may be those codes who fixed it : [root@localhost tieno]# ifconfig enp0s7 down [root@localhost tieno]# modprobe -r forcedeth [root@localhost tieno]# modprobe forcedeth msi=0 msix=0 [root@localhost tieno]# dhclient enp0s7 RTNETLINK answers: File exists For me now it's OK but the MLO forum (french) advise me to create a bug report for fixing this bug definitively. Thank you ! Here are some informations about my hardware and release : [tieno@localhost ~]$ lspcidrake -v |grep -i net forcedeth : NVIDIA Corporation|MCP61 Ethernet [BRIDGE_OTHER] (vendor:10de device:03ef subv:1849 subd:03ef) (rev: a2) [tieno@localhost ~]$ uname -a Linux localhost.localdomain 5.4.12-desktop-1.mga7 #1 SMP Tue Jan 14 21:14:55 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux And here is the journal when the bug was not fixed, when I started computer, and put it on sleep, and returned from sleep : journalctl -x -b 0 >journal.txt chown 1000:1000 journalctl.txt (sorry the text report is too long for here, I have an error message when I want to post it)
Thank you for reporting this. The solutions you have found are very expert, and point to the 'forcedeth' driver. It is surprising that the journal was so large. Do the sleep-wakeup manipulation quickly after booting to minimise it. For information to attach, whenever it is large, compress the file first. We recommend 'xz' : $ xz <filename> [creates filename.xz] --- Please post the trimmed output (just the section for the Ethernet controller) from : $ lspci -v Then please, from the UNcorrected system with the fault : 1. After booting, before sleep, save to post just the [dead] ethernet section from: # ifconfig 2. Do the sleep-wakeup manipulation [which you say kick-starts the connection]. 3. Save to post just the [live] ethernet section from: # ifconfig 4. $ dmesg > dmesg.txt to attach compressed to this bug. 5. # journalctl -ab > journal.txt [as root gives all messages] to attach compressed to this bug.
CC: (none) => lewyssmith
Oh I'm sorry I didn't see how attach a file to my post and I tryed to copy all the text inside the post. Anyway, here are the commands you ask to me : [tieno@localhost ~]$ lspci -v [...] 00:07.0 Bridge: NVIDIA Corporation MCP61 Ethernet (rev a2) Subsystem: ASRock Incorporation 939NF6G-VSTA Board Flags: bus master, 66MHz, fast devsel, latency 0, IRQ 28, NUMA node 0 Memory at edffd000 (32-bit, non-prefetchable) [size=4K] I/O ports at d080 [size=8] Capabilities: <access denied> Kernel driver in use: forcedeth Kernel modules: forcedeth (Idem when the bug is on or off) Then, from uncorrect system : [root@localhost tieno]# ifconfig enp0s7: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 ether d0:50:99:82:1a:03 txqueuelen 1000 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 13 bytes 3487 (3.4 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 After sleep-wakeup : [root@localhost tieno]# ifconfig enp0s7: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.0.11 netmask 255.255.255.0 broadcast 192.168.0.255 ether d0:50:99:82:1a:03 txqueuelen 1000 (Ethernet) RX packets 1 bytes 590 (590.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 25 bytes 5493 (5.3 KiB) TX errors 0 dropped 36 overruns 0 carrier 0 collisions 0 Then : # dmesg > dmesg.txt [see the attachment] # journalctl -ab > journal.txt [see the attachment] Good Luck !
Created attachment 11493 [details] command # dmesg
Created attachment 11494 [details] command # journalctl -ab
The critical line in the output appears to be forcedeth 0000:00:07.0 enp0s7: Got tx_timeout. irq status: 00000032 As per https://forums.gentoo.org/viewtopic-t-860574-start-0.html try adding pci=nomsi to the kernel command line parameters. The easiest way to do that is using mcc/boot/Set up boot system, on the second screen presented in that function. If that works, an entry describing the problem/fix should be added to the Mageia 7 errata.
CC: (none) => davidwhodgins
Ok, it works like this. I open the CCM, choosed section boot/set up boot system, went on the second screen (just clic "next" on the first) and just add at the end of the kernel command line parameters pci=nomsi Restarted my computer and the internet connection is working. Thank you !
Added to errata https://wiki.mageia.org/en/Mageia_7_Errata#Nvidia_corporation_MCP61_ethernet_fails_to_connect Closing the bug report. Please reopen if the problem does show up again.
Resolution: (none) => FIXEDKeywords: (none) => IN_ERRATA7Status: NEW => RESOLVED
Looking at the journalctl, it appears that the problem is with your DHCP server during initialization. If you search for "link beat", you'll see that link beat is detected during initialization (so there isn't a problem with the NIC or the driver), but then it issues a DHCPREQUEST for 192.168.0.11 (because this is the IP it had last) and gets no response. Then it tries DHCPDISCOVER, which means "I'll take any IP", and gets no response. Then it seems to give up. Later, when you sleep and wake up, it dropped link beat and immediately finds it again and goes through the DHCP process above, except now the DHCPREQUEST works on the first try. So it appears that during initialization, your DHCP server isn't available, but becomes available at some later time, and is available when you do the sleep/wake activity. So there are two issues here. One is why the DHCP server isn't there during initialization. Why it isn't at exactly the time you boot the machine is a mystery, unless you're running it on this system and it just hasn't started yet. You know where your DHCP server is, and presumably why it's not available during boot, but *is* available and working properly later. The other is why net-applet is giving up and not continuing to retry DHCP until it manages to connect. Possibly there is something in your dhclient conf or even in the ifcfg file that shuts off retry after one or two tries.
As I see it, the dhcp lookup fails because the ethernet device is timing out trying to send the lookup packet, which is fixed by using nomsi.
Well solved! Thank you Etienne for all the evidence you provided, and the work for that. And Dave & Frank for your inputs. That Gentoo thread (comment 5) is 9y old; and looks a different problem. The suggestion "try booting with "pci=nomsi" appended to the kernel commandline" was not verified there; but is here in comment 5. The last word was "forcedeth did not like my switch being hard coded to 100M full duplex even though the NIC is still hard coded. On a whim I decided to auto-negotiate the port speed and it hasn't happened since" I was puzzled by the differences before/after sleep/wakeup from ifconfig, which agrees with Frank's (and my) supposition about the Internet 'box' synchronisation not happening initially; from comment 2: BEFORE enp0s7: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 [RUNNING?] ether d0:50:99:82:1a:03 txqueuelen 1000 (Ethernet) AFTER enp0s7: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.0.11 netmask 255.255.255.0 broadcast 192.168.0.255 ether d0:50:99:82:1a:03 txqueuelen 1000 (Ethernet)
(In reply to Dave Hodgins from comment #9) > As I see it, the dhcp lookup fails because the ethernet device is timing out > trying to send the lookup packet, which is fixed by using nomsi. Dave, I don't dispute that this fixed it, but I don't understand why. MSI, IIRC, is just a replacement for hardware IRQs. Why has this anything to do with whether the lookup packet gets to the DHCP server ? Or does MSI not initialize in time to service dhclient ?
CC: (none) => ftg
In the dmesg log the line [ 31.994626] NETDEV WATCHDOG: enp0s7 (forcedeth): transmit queue 0 timed out is the first indication of a problem. The corresponding line in the journal is févr. 03 18:08:00 localhost.localdomain kernel: NETDEV WATCHDOG: enp0s7 (forcedeth): transmit queue 0 timed out The dhcp lookup has been sent from the dhcp client to the kernel, but the kernel module is failing to transmit that over the pci bus to the ethernet device. The ethernet device doesn't get the lookup to try and send over the network, so the dhcp server never sees it. When the system is recovering from sleep, it doesn't time out sending the packet from the kernel to the ethernet device, so the lookup is then sent to the network, and works. Why it works when recovering from sleep and not during normal boot, is not clear to me, but likely due to the order devices are powered up, or recovering from sleep needing fewer amps than booting. Using the kernel option pci=nomsi disables the use of MSI interrupts https://en.wikipedia.org/wiki/Message_Signaled_Interrupts That forces the kernel to fall back to using the older (slightly slower) APIC method of handling interrupts. The APIC method is more reliable for some pci devices, as appears to be the case here. https://www.tldp.org/HOWTO/Plug-and-Play-HOWTO-7.html In the transmission chain, when working the steps involved are 1. dhcp client program sends lookup request to kernel 2. kernel calls the appropriate module to handle the packet 3. forcedeth module sends the packet to the ethernet device over the pci bus 4. pci ethernet device sends the packet over the network to the server During boot, step 3 is timing out, not step 4, but works when recovering from sleep. Using APIC interrupt handling instead of MSI allows step 3 to work during boot.
It's the timeout message coming from the network device watchdog, not from the dhcp client that indicates it's the pci bus that's timing out, not the ip network.
So, it sounds like there is a bug here. I get that the DHCP packet never gets sent, so the fault is not in the network. But is the fault in the forcedeth driver timing out too soon, the NETDEV watchdog timing out too soon, or the MSI support not initializing soon enough to service what the driver is asking of it ? Regardless of whether the OP's problem is solved by the nomsi workaround, it seems like somebody's dropping the ball here and we should find out why.
Given that it works when recovering from sleep, but not on boot, I think it's more likely to be a problem with the hardware or the firmware on the ethernet card rather then the kernel module. Confirming that would require resources I don't have.