Description of problem: Reporting this in case more happen... Rebooted for recent updates. System stayed up for a few moments; then locked up. Using old laptop, could only ping this machine; ssh did not respond. Hard power down, and reboot. Stayed up a little longer, and locked up. Hard power down, and reboot. Still up as I write this... Version-Release number of selected component (if applicable): How reproducible: Happened twice Steps to Reproduce: unknown. No commonality I could discern. 1. 2. 3.
Created attachment 13240 [details] Entire journal for first lockup
Created attachment 13241 [details] Entire journal of second lockup CORRECTION: system does not respond to pings. I accidentally typed "ping 192168.1.46" (note missing dot) to which my router responds. As I was about to upload this attachment, system locked up again... This log ends with: May 09 19:15:49 pf.pfortin.com kernel: Bluetooth: hci0: link tx timeout May 09 19:15:49 pf.pfortin.com kernel: Bluetooth: hci0: killing stalled connection a0:a8:cd:ad:3b:75 May 09 19:17:00 pf.pfortin.com kernel: Bluetooth: hci0: link tx timeout May 09 19:17:00 pf.pfortin.com kernel: Bluetooth: hci0: killing stalled connection a0:a8:cd:ad:3b:75 May 09 19:17:50 pf.pfortin.com kernel: Bluetooth: hci0: link tx timeout May 09 19:17:50 pf.pfortin.com kernel: Bluetooth: hci0: killing stalled connection a0:a8:cd:ad:3b:75 May 09 19:17:50 pf.pfortin.com kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000 Besides the kernel bug, I think the bluetooth timeouts are new... will try to check if the system will stay up long enough...
Had laptop connected via BT; disconnected and I may be avoiding lockup... the uptime before lockup appears random based on a sample of 3 lockups.
Summary: FYI: System lockup => System lockups
Created attachment 13242 [details] dmesg errors on boot With a total lockup, I can only guess at where to look. so will add oddities as I find them... Here, various services report "lacks a native systemd unit file"
Created attachment 13243 [details] latest lockup Did something in systemsettings5 get clobbered?
Could you run RAM memory check, using all cores, let it run overnight? (I had peculiar faults a while ago when a core in my CPU got bad, i saw it when running RAM check.) There is ram check option in our install medias.
CC: (none) => fri
Created attachment 13244 [details] htop showing 7 procs pegged at 100% A photo was the only way to get a screenshot when system hung this morning. Will try to run memory checks this evening... I have a zoom meeting shortly -- hopefully, I can get through it. Now running on $ uname -a Linux pf.pfortin.com 5.17.6-server-1.mga9 #1 SMP PREEMPT Mon May 9 18:34:15 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux and leaving "jounalctl -f" on screen in case something appears there that doesn't make to disk... amazingly, system ran all night; until... woke up to bluetooth at 100% and SIX: /usr/lib64/sa/sadc -F -L 600 6 /var/log/sa (System Activity Data Collector -- did not know of this) running at 100% and appear to have started, one each hour +|- 1 second According to htop, memory: 57.7G/62.5G (swap 0K/4.00G) and curiously, processors 2, 4, 6, 8, 10, 12, 14 (of 20) pegged at 100% (matches BT + 6x sadc at 100%) -- only even numbered procs... :? $ ll /var/log/sa total 3100 -rw-r--r-- 1 root root 39244 May 6 23:51 sa06 -rw-r--r-- 1 root root 460060 May 7 23:51 sa07 -rw-r--r-- 1 root root 460060 May 8 23:51 sa08 -rw-r--r-- 1 root root 399824 May 9 23:51 sa09 -rw-r--r-- 1 root root 45788 May 10 09:01 sa10 -rw-r--r-- 1 root root 70464 May 7 04:02 sar06 -rw-r--r-- 1 root root 829128 May 8 04:02 sar07 -rw-r--r-- 1 root root 829128 May 9 04:02 sar08 The sarNN files are human readable; saNN are binary -- any utility to read those? In case it's related: all this started after I reloaded to bring in latest glibc and current running kernel.
(In reply to Pierre Fortin from comment #4) > Created attachment 13242 [details] > dmesg errors on boot > > With a total lockup, I can only guess at where to look. so will add oddities > as I find them... Here, various services report "lacks a native systemd > unit file" that's only informal messages for some packages not yet converted from init scripts...
(In reply to Pierre Fortin from comment #7) > Created attachment 13244 [details] > htop showing 7 procs pegged at 100% > looks like sysstat is overloading your system. if you remove that package, does your system work ok then ?
(In reply to Pierre Fortin from comment #2) > May 09 19:17:50 pf.pfortin.com kernel: BUG: kernel NULL pointer dereference, > address: 0000000000000000 > This is a kernel crash, but since the rest of the crash info is missing, one can only guess...
This is a super-weird situation... the lockups started out of the blue and hit me every few minutes. Now, I've been up for 7 hours and all looks good. The kernel crash was the only I saw; others just locked up tight; screens frozen. A picture was the only option; reminded me of the camera dumps of neon bulb control panels on NORAD SAGE computers in the 1960s LOL Just removed sysstat and killed sdac...
Good morning! Feeling confident the lockups are resolved with 5.17.6 kernel, Thanks All!!
Nope... just had another lockup after being up 50 hours and a couple minutes. I had walked away to chat with a visitor; came back to the system repeating a 1 second audio clip over and over. This was from a news stream. Could not ssh into or ping the system. Sadly, other than seeing the times in the journal to discern uptime, there's nothing abnormal therein. :( Operating System: Mageia 9 KDE Plasma Version: 5.24.4 KDE Frameworks Version: 5.93.0 Qt Version: 5.15.2 Kernel Version: 5.17.6-server-2.mga9 (64-bit) # latest kernel via mcc Graphics Platform: X11 Processors: 20 × 12th Gen Intel® Core™ i7-12700K Memory: 62.5 GiB of RAM Graphics Processor: AMD Radeon RX 6600 XT
another kernel NULL pointer dereference... (5.17.6-server-2.mga9) This is everything at the end of this journal: May 12 18:28:42 pf.pfortin.com kernel: Bluetooth: hci0: link tx timeout May 12 18:28:42 pf.pfortin.com kernel: Bluetooth: hci0: killing stalled connection a0:a8:cd:ad:3b:75 May 12 18:29:32 pf.pfortin.com kernel: Bluetooth: hci0: link tx timeout May 12 18:29:32 pf.pfortin.com kernel: Bluetooth: hci0: killing stalled connection a0:a8:cd:ad:3b:75 May 12 18:30:38 pf.pfortin.com kernel: Bluetooth: hci0: link tx timeout May 12 18:30:38 pf.pfortin.com kernel: Bluetooth: hci0: killing stalled connection a0:a8:cd:ad:3b:75 May 12 18:32:04 pf.pfortin.com kernel: Bluetooth: hci0: link tx timeout May 12 18:32:04 pf.pfortin.com kernel: Bluetooth: hci0: killing stalled connection a0:a8:cd:ad:3b:75 May 12 18:33:15 pf.pfortin.com kernel: Bluetooth: hci0: link tx timeout May 12 18:33:15 pf.pfortin.com kernel: Bluetooth: hci0: killing stalled connection a0:a8:cd:ad:3b:75 May 12 18:33:19 pf.pfortin.com kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000 While I had just started typing here, got another... (next)
Should have included this on comment 14: May 12 11:01:05 pf.pfortin.com kernel: Linux version 5.17.6-server-2.mga9 (iurt@rabbit.mageia.org) (gcc (Mageia 12.1.1-0.20220507.1.mga9) 12.1.1 20220507, GNU ld (GNU Binutils) 2.38) #1 SMP PREEMPT Tue May 10 16:14:21 UTC 2022 May 12 11:01:05 pf.pfortin.com kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-5.17.6-server-2.mga9 root=UUID=957cd552-a8ad-4d8a-a90d-1c6eb3871ebd ro splash quiet noiswmd resume=UUID=63b04ef6-fb15-4f03-b7af-6573fb6070ec audit=0 May 12 11:01:05 pf.pfortin.com kernel: x86/split lock detection: #AC: crashing the kernel on kernel split_locks and warning on user-space split_locks New lockup... Starting to suspect some patterns... * Bluetooth errors just before the lockup. * kernel NULL pointer dereference -- sometimes in journal * lockup occurs very near a screen power save - just disabled Screen Energy Saving... May 12 18:44:03 pf.pfortin.com kernel: microcode: microcode updated early to revision 0x1f, date = 2022-03-03 May 12 18:44:03 pf.pfortin.com kernel: Linux version 5.17.7-server-1.mga9 (iurt@ecosse.mageia.org) (gcc (Mageia 12.1.1-0.20220507.1.mga9) 12.1.1 20220507, GNU ld (GNU Binutils) 2.38) #1 SMP PREEMPT Thu May 12 12:54:42 UTC 2022 May 12 18:44:03 pf.pfortin.com kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-5.17.7-server-1.mga9 root=UUID=957cd552-a8ad-4d8a-a90d-1c6eb3871ebd ro splash quiet noiswmd resume=UUID=63b04ef6-fb15-4f03-b7af-6573fb6070ec audit=0 May 12 18:44:03 pf.pfortin.com kernel: x86/split lock detection: #AC: crashing the kernel on kernel split_locks and warning on user-space split_locks [snip] May 12 21:06:58 pf.pfortin.com kernel: Bluetooth: hci0: link tx timeout May 12 21:06:58 pf.pfortin.com kernel: Bluetooth: hci0: killing stalled connection a0:a8:cd:ad:3b:75 May 12 21:07:58 pf.pfortin.com kernel: Bluetooth: hci0: link tx timeout May 12 21:07:58 pf.pfortin.com kernel: Bluetooth: hci0: killing stalled connection a0:a8:cd:ad:3b:75 May 12 21:09:20 pf.pfortin.com kernel: Bluetooth: hci0: link tx timeout May 12 21:09:20 pf.pfortin.com kernel: Bluetooth: hci0: killing stalled connection a0:a8:cd:ad:3b:75 May 12 21:12:24 pf.pfortin.com kernel: Bluetooth: hci0: link tx timeout May 12 21:12:24 pf.pfortin.com kernel: Bluetooth: hci0: killing stalled connection a0:a8:cd:ad:3b:75 May 12 21:14:10 pf.pfortin.com kernel: Bluetooth: hci0: link tx timeout May 12 21:14:10 pf.pfortin.com kernel: Bluetooth: hci0: killing stalled connection a0:a8:cd:ad:3b:75 May 12 21:16:54 pf.pfortin.com kernel: Bluetooth: hci0: link tx timeout May 12 21:16:54 pf.pfortin.com kernel: Bluetooth: hci0: killing stalled connection a0:a8:cd:ad:3b:75
Another lockup: May 12 23:26:37 Bluetooth messages starting to bother me. This machine and the old laptop(mga8) INSIST on staying connected. Kinda like the two computers in the movie "The Forbin Project" -- one letter off from my name, so it's easy for me to remember that title. The machines are so aggressive in staying connected that I finally disabled BT in this machine. I don't see a BT disable in mga8 settings.
Had this machine less than 70 days. I ALWAYS check for new BIOS when working on a new machine (mine or friends')... BIOS was up to date at 1.0.8... Just discovered BIOS is now at 1.0.13...!!! In two months! Looks like 1.0.12 is the one that may resolve my issues; 1.0.13 it is! A friend informed me he got an email from Dell about the BIOS. He got his similar machine a week or two before me.
Looks like this may have been cleared up with BIOS 1.0.13... Closing for now.
Resolution: (none) => WORKSFORMEStatus: NEW => RESOLVED
New BIOS released: File Name: XPS_8950_1.2.1_x64.exe File Size: 8.25 MB Importance: Urgent Fixes & Enhancements - Firmware updates to address security vulnerabilities including (Common Vulnerabilities and Exposures - CVE) such as CVE-2021-3712, CVE-2019-14584, CVE-2021-28210, and CVE-2021-28211. Will update this weekend. No lockups lately.