Bug 32148

Summary: no network after reboot; disabled NetworkManager to restore.
Product: Mageia Reporter: Pierre Fortin <pfortin>
Component: RPM PackagesAssignee: Kernel and Drivers maintainers <kernel>
Status: NEW --- QA Contact:
Severity: normal    
Priority: Normal CC: davidwhodgins, lewyssmith
Version: Cauldron   
Target Milestone: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Source RPM: CVE:
Status comment:
Attachments: rpm -qa --last output
Journal from reboot to network up.
journal of boot
journal of boot with comments

Description Pierre Fortin 2023-07-29 04:00:26 CEST
Description of problem:

My previous reboot was:
Mon Jul 3 02:37:47 PM EDT 2023
Mageia release 9 (Cauldron) for x86_64
Kernel 6.3.9-server-2.mga9 on a 20-processor x86_64 / \l
Linux pf.pfortin.com 6.3.9-server-2.mga9 #1 SMP PREEMPT_DYNAMIC Fri Jun 23 08:10:12 UTC 2023 x86_64 GNU/Linux

Reboot just now:
Fri Jul 28 07:58:25 PM EDT 2023
Mageia release 9 (Cauldron) for x86_64
Kernel 6.4.6-server-2.mga9 on a 20-processor x86_64 / \l
Linux pf.pfortin.com 6.4.6-server-2.mga9 #1 SMP PREEMPT_DYNAMIC Tue Jul 25 19:09:39 UTC 2023 x86_64 GNU/Linux

After reboot, I had no network.  Immediately tried re-configuring WiFi, no go.
Configured ethernet and connected wire to router. This worked minimally; DNS was not working:
$ dig cisco.com
;; communications error to ::1#53: connection refused
;; communications error to ::1#53: connection refused
;; communications error to ::1#53: connection refused
;; communications error to 127.0.0.1#53: connection refused

; <<>> DiG 9.18.15 <<>> cisco.com
;; global options: +cmd
;; no servers could be reached

Wireshark showed NO DNS packets.

$ systemctl start network.service 
Job for network.service failed because the control process exited with error code.
See "systemctl status network.service" and "journalctl -xeu network.service" for details.

$ systemctl status network.service 
× network.service - LSB: Bring up/down networking
     Loaded: loaded (/etc/rc.d/init.d/network; generated)
     Active: failed (Result: exit-code) since Fri 2023-07-28 21:04:20 EDT; 29s ago
       Docs: man:systemd-sysv-generator(8)
    Process: 948426 ExecStart=/etc/rc.d/init.d/network start (code=exited, status=1/FAILURE)
      Tasks: 2 (limit: 154182)
     Memory: 1.1M
        CPU: 1.984s
     CGroup: /system.slice/network.service
             ├─948622 /sbin/ifplugd -I -b -i docker0
             └─948759 /sbin/ifplugd -I -b -i p5p1

Jul 28 21:04:20 network[948887]: RTNETLINK answers: File exists
Jul 28 21:04:20 network[948888]: RTNETLINK answers: File exists
Jul 28 21:04:20 network[948889]: RTNETLINK answers: File exists
Jul 28 21:04:20 network[948890]: RTNETLINK answers: File exists
Jul 28 21:04:20 systemd[1]: network.service: Control process exited, code=exited, status=1/FAILURE
Jul 28 21:04:20 systemd[1]: network.service: Failed with result 'exit-code'.
Jul 28 21:04:20 systemd[1]: network.service: Unit process 948622 (ifplugd) remains running after unit stopped.
Jul 28 21:04:20 systemd[1]: network.service: Unit process 948759 (ifplugd) remains running after unit stopped.
Jul 28 21:04:20 systemd[1]: Failed to start network.service.
Jul 28 21:04:20 systemd[1]: network.service: Consumed 1.981s CPU time.




Version-Release number of selected component (if applicable): see 'rpm -q --last' output


How reproducible: Sorry, I have several services that I need to keep running, so no time to reproduce.  Once I got network up, I stopped messing with it...


Steps to Reproduce:
1. Applied updates from July 7 to present. (will attach 'rpm -qa --last' output) 
2. Rebooted.
3. No network...

After some quick debugging, went to mcc System services.
Enabled DNS; no change.

Disabled NetworkManager, NetworkManager-dispatcher, NetworkManager-wait-oneline and re-configured WiFi and network up.
Comment 1 Pierre Fortin 2023-07-29 04:02:02 CEST
Created attachment 13928 [details]
rpm -qa --last output

these are all the updates applied since last reboot.
Comment 2 Pierre Fortin 2023-07-29 04:02:40 CEST
Created attachment 13929 [details]
Journal from reboot to network up.
Comment 3 Lewis Smith 2023-07-29 20:56:19 CEST
Thank you for the report with its full details.
(In reply to Pierre Fortin from comment #0)
> Disabled NetworkManager, NetworkManager-dispatcher,
> NetworkManager-wait-oneline and re-configured WiFi and network up.
Can you please describe briefly what your network *is*. A single WiFi link?
What re-configuration of WiFi was necessary? Had previous details gone?

CC: (none) => lewyssmith

Comment 4 Pierre Fortin 2023-07-29 21:38:32 CEST
MY network has always been a simple WiFi connection to a Linksys WRT3200ACM running DD-WRT firmware to a DSL dual-link modem in bridge mode.

With the NetworkManager, which I don't recall setting up/using, after this reboot, I had no connectivity over WiFi.  Whenever I have network issues, I usually just go through the mcc "Set up a new network interface (LAN, ISDN, ADSL, ...)" without making changes (details still as expected) to get the network to come up again -- it's faster than trying to diagnose issues and manually correct whatever is wrong.  HTH
Comment 5 Dave Hodgins 2023-07-29 22:45:17 CEST
I can't confirm from the info in the log, but I suspect the network interface
changed names causing shorewall to reject the packets due to the new
nic not being listed in /etc/shorewall/interfaces

Using mcc to get the network up again would have fixed that.

Is this a usb wifi nic? Could it have been moved to a different usb plug?
That would cause the nic to change names.

CC: (none) => davidwhodgins

Comment 6 Pierre Fortin 2023-07-30 01:42:16 CEST
The nic name changed only once, back in Oct/22 (see bug 30965); no name change this time.

ifconfig
lp10s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.1.46  netmask 255.255.255.0  broadcast 192.168.1.255
        inet6 fe80::46e5:17ff:fefd:1187  prefixlen 64  scopeid 0x20<link>
        ether 44:e5:17:fd:11:87  txqueuelen 1000  (Ethernet)
        RX packets 6545788  bytes 6601378343 (6.1 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 3127160  bytes 3386140413 (3.1 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
uptime
 19:40:57 up 23:43, 12 users,  load average: 2.54, 2.43, 2.46


lscpi -v
0000:0a:00.0 Network controller: Intel Corporation Wi-Fi 6 AX210/AX211/AX411 160MHz (rev 1a)
        Subsystem: Rivet Networks Device 1674
        Flags: bus master, fast devsel, latency 0, IRQ 19
        Memory at 84500000 (64-bit, non-prefetchable) [size=16K]
        Capabilities: [c8] Power Management version 3
        Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [40] Express Endpoint, MSI 00
        Capabilities: [80] MSI-X: Enable+ Count=16 Masked-
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [14c] Latency Tolerance Reporting
        Capabilities: [154] L1 PM Substates
        Kernel driver in use: iwlwifi
        Kernel modules: iwlwifi
Comment 7 Pierre Fortin 2023-07-30 01:44:52 CEST
copy/paste missed the 'w' wlp10s0.
Comment 8 Lewis Smith 2023-08-02 20:35:07 CEST
Sorry to have left you.
A basic point: is this reproduceable, or did it just happen once? I appreciate that you would need to re-boot to try it...
Comment 9 Pierre Fortin 2023-08-03 16:33:42 CEST
Only when I rebooted.  I included the differences that led up to this from the previous reboot.  My guess is that it was a one-time occurrence; but if that was to happen to others, they may not have the skills to recover -- without a network, it's not possible to search for solutions.  I don't have a reboot scheduled at this time; but I see there's a new kernel, so I may update and reboot late tonight if time permits...
Comment 10 Lewis Smith 2023-08-05 21:32:48 CEST
If this was a one-off incident, there is no hope for a resolution.

If it happens again after another re-boot, please attach (yes again, sorry, another example in the hope it reveals something more) the compressed system journal.
[Just 'xz' the text journal extract].
We shall then investigate the issue more fully.
Comment 11 Pierre Fortin 2023-08-06 02:00:40 CEST
Created attachment 13933 [details]
journal of boot

My gut tells me we may be on the forefront of a race condition...

This time, WiFi came up; but wireless keyboard (Logitech K350) did not come up until I:
- plugged in a wired keyboard to be able to login
- disconnected/connected USB Unifying Receiver
- the wireless mouse (Logitech MX Master 3S came up OK (USB Bolt Receiver)
Comment 12 Pierre Fortin 2023-08-06 02:03:24 CEST
Created attachment 13934 [details]
journal of boot with comments

Oops.. forgot to save journal with comments before zipping...
Pierre Fortin 2023-08-06 02:03:50 CEST

Attachment 13933 is obsolete: 0 => 1

Comment 13 Dave Hodgins 2023-08-06 04:31:12 CEST
$ grep 'Logitech USB Receiver as' journal 
Aug 05 18:35:34 kernel: input: Logitech USB Receiver as /devices/pci0000:00/0000:00:14.0/usb1/1-5/1-5.4/1-5.4.3/1-5.4.3:1.0/0003:046D:C548.0002/input/input17
Aug 05 18:35:34 kernel: input: Logitech USB Receiver as /devices/pci0000:00/0000:00:14.0/usb1/1-5/1-5.4/1-5.4.4/1-5.4.4:1.0/0003:046D:C52B.0005/input/input21
Aug 05 18:41:59 kernel: input: Logitech USB Receiver as /devices/pci0000:00/0000:00:14.0/usb1/1-5/1-5.4/1-5.4.3/1-5.4.3:1.0/0003:046D:C548.000A/input/input28
Aug 05 18:43:01 kernel: input: Logitech USB Receiver as /devices/pci0000:00/0000:00:14.0/usb1/1-5/1-5.4/1-5.4.3/1-5.4.3:1.0/0003:046D:C548.0016/input/input39

Is there more than one Receiver?
Comment 14 Dave Hodgins 2023-08-06 04:39:05 CEST
Also, this doesn't look good ...
$ grep usb journal |grep error
Aug 05 18:35:32 kernel: usb 1-5.4.1: device descriptor read/64, error -71
Aug 05 18:35:32 kernel: usb 1-5.4.1: device descriptor read/64, error -71
Aug 05 18:35:32 kernel: usb 1-5.4.1: device descriptor read/64, error -71
Aug 05 18:35:32 kernel: usb 1-5.4.1: device descriptor read/64, error -71
Aug 05 18:35:33 kernel: usb 1-5.4.1: device not accepting address 9, error -71
Aug 05 18:35:33 kernel: usb 1-5.4.1: device not accepting address 10, error -71
Aug 05 18:42:13 kernel: usb 1-5.4.1: device not accepting address 23, error -71
Aug 05 18:42:13 kernel: usb 1-5.4.1: device descriptor read/all, error -71
Aug 05 18:42:15 kernel: usb 1-5.4.1: device not accepting address 26, error -71
Aug 05 18:42:16 kernel: usb 1-5.4.1: can't set config #1, error -71
Aug 05 18:42:18 kernel: usb 1-5.4.1: device descriptor read/64, error -71
Aug 05 18:42:18 kernel: usb 1-5.4.1: can't set config #1, error -71
Aug 05 18:42:19 kernel: usb 1-5.4.1: device descriptor read/64, error -71
Aug 05 18:42:20 kernel: usb 1-5.4.1: device descriptor read/64, error -71
Aug 05 18:42:23 kernel: usb 1-5.4.1: device descriptor read/64, error -71
Aug 05 18:42:23 kernel: usb 1-5.4.1: device descriptor read/64, error -71
Aug 05 18:42:24 kernel: usb 1-5.1: device descriptor read/64, error -71
Aug 05 18:42:35 kernel: usb 1-5.4.1: device not accepting address 33, error -62
Aug 05 18:42:36 kernel: usb 1-5.4.1: device not accepting address 34, error -71

https://ubuntuforums.org/showthread.php?t=797789 might be relevant.
Comment 16 Dave Hodgins 2023-08-06 04:42:36 CEST
https://stackoverflow.com/questions/9544557/debian-device-descriptor-read-64-error-71
explains the most likely causes.
Comment 17 Pierre Fortin 2023-08-06 05:07:36 CEST
(In reply to Dave Hodgins from comment #13)

> 
> Is there more than one Receiver?

Yes, all Logitech:
* Bolt Receiver for MX Master 3S mouse
* Unifying Receiver for K350 keyboard (can handle 6 devices except above keyboard)

See https://support.logi.com/hc/en-us/articles/1500012483162-What-is-the-difference-between-Bolt-and-Unifying-receivers-
Comment 18 Pierre Fortin 2023-08-06 05:30:33 CEST
(In reply to Dave Hodgins from comment #16)
> https://stackoverflow.com/questions/9544557/debian-device-descriptor-read-64-
> error-71
> explains the most likely causes.

I disagree with the comments in that post.  All hardware works great here; this is looking more like a kernel race condition or faulty code.  Mouse and keyboard have been connected and running great for months; including right now...  This started with WiFi, now the keyboard; next reboot could be more interesting... :)

In case it wasn't obvious, this time all I did to "fix" this keyboard issue was disconnect/reconnect the Unifying Receiver for the kernel to now see the keyboard that got missed on boot for the first time ever...
Comment 19 Dave Hodgins 2023-08-06 16:20:27 CEST
Assigning to the kernel and drivers team.

Assignee: bugsquad => kernel