Bug 32530 - 32-bit server kernel 6.4.16-3 doesn't boot reliably
Summary: 32-bit server kernel 6.4.16-3 doesn't boot reliably
Status: RESOLVED FIXED
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: 9
Hardware: All Linux
Priority: Normal major
Target Milestone: ---
Assignee: Giuseppe Ghibò
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 32482
  Show dependency treegraph
 
Reported: 2023-11-16 00:19 CET by Thomas Andrews
Modified: 2025-03-27 19:04 CET (History)
1 user (show)

See Also:
Source RPM:
CVE:
Status comment:


Attachments

Description Thomas Andrews 2023-11-16 00:19:39 CET
Description of problem:
Affected hardware: HP Probook 6550b, i3 350M, 8GB DDR3 RAM, Intel graphics, TeamGroup SSD for / and /home partitions, Hitachi rust drive for data. MGA9-32, using Xfce. This system is a clean install (retaining /home) using the netinstall iso.

The 32-bit server kernel 6.4.16-3 only boots successfully once in a while on this system. Most of the time I see rapidly-scrolling text, ending with a notice about a kernel panic because of an interrupt problem. 

When the boot is successful, I do not see any obvious problems with system operation, but I have not used it extensively yet.

I attempted to get a journal of one of the failed boots, but apparently it never gets that far. When I get it to boot, the previous journal shows the last boot that worked, not the failed boot that actually precedes it.

The kernel-server 6.4.9-4 that shipped with MGA9 does not do this, but the kernel-server 6.4.16-5 currently under test in bug 32482 is.

Another set of hardware, a desktop with AMD Phenom II X4 910, 8GB DDR3 RAM, AMD HD 8490 graphics, Rust drives only, 32-bit Plasma system, is unaffected. This system was upgraded from a working MGA8 system using the MGAapplet and urpmi.

How reproducible: The affected system boots successfully maybe one time out of four, perhaps less often.


Steps to Reproduce:
1. Install or update to kernel-server 6.4.16-3 or kernel-server 6.4.15-5.
2. Attempt to boot. Affected hardware will probably fail; unaffected hardware will boot successfully.
3.
Comment 1 katnatek 2023-11-16 00:27:05 CET
I see similar behaviour on this computer:

inxi -F
System:
  Host: cefiro Kernel: 6.4.16-desktop-5.mga9 arch: i686 bits: 32 Desktop: LXQt
    v: 1.3.0 Distro: Mageia 9
Machine:
  Type: Laptop System: Hewlett-Packard product: Compaq Presario C700 Notebook
    PC v: F.34 serial: CND8452P36
  Mobo: Hewlett-Packard model: 30D9 v: 83.21 serial: CND8452P36
    BIOS: Hewlett-Packard v: F.34 date: 09/25/2008
Battery:
  ID-1: BAT0 charge: 4.8 Wh (100.0%) condition: 4.8/4.8 Wh (100.0%)
    volts: 10.8 min: 10.8
CPU:
  Info: dual core model: Intel Pentium Dual T2370 bits: 64 type: MCP cache:
    L2: 1024 KiB
  Speed (MHz): avg: 800 min/max: 800/1733 cores: 1: 800 2: 800
Graphics:
  Device-1: Intel Mobile GM965/GL960 Integrated Graphics driver: i915
    v: kernel
  Device-2: Chicony integrated USB webcam type: USB driver: uvcvideo
  Display: x11 server: X.org v: 1.21.1.8 with: Xwayland v: 22.1.9 driver: X:
    loaded: intel,v4l dri: i965 gpu: i915 resolution: 1: N/A 2: 1366x768~60Hz
  API: OpenGL v: 2.1 Mesa 23.1.7 renderer: Mesa Intel 965GM (CL)
Audio:
  Device-1: Intel 82801H HD Audio driver: snd_hda_intel
  API: ALSA v: k6.4.16-desktop-5.mga9 status: kernel-api
  Server-1: PipeWire v: 0.3.71 status: active (process)
  Server-2: PulseAudio v: 16.1 status: active (root, process)
Network:
  Device-1: Qualcomm Atheros AR242x / AR542x Wireless Network Adapter
    driver: ath5k
  IF: wlp1s0 state: up mac: 00:23:4e:4f:6d:6a
  Device-2: Realtek RTL-8100/8101L/8139 PCI Fast Ethernet Adapter
    driver: 8139too
  IF: enp2s1 state: down mac: 00:1e:ec:eb:b4:39
Drives:
  Local Storage: total: 465.76 GiB used: 114.18 GiB (24.5%)
  ID-1: /dev/sda vendor: Toshiba model: MK5076GSX size: 465.76 GiB
Partition:
  ID-1: / size: 48.91 GiB used: 9.56 GiB (19.5%) fs: ext4 dev: /dev/sda1
  ID-2: /home size: 411.55 GiB used: 104.61 GiB (25.4%) fs: xfs
    dev: /dev/sda3
Swap:
  ID-1: swap-1 type: partition size: 4 GiB used: 3.8 MiB (0.1%) dev: /dev/sda5
Sensors:
  System Temperatures: cpu: 56.0 C mobo: N/A
  Fan Speeds (RPM): N/A
Info:
  Processes: 189 Uptime: 25m Memory: 1.96 GiB used: 1.33 GiB (67.9%)
  Shell: Bash inxi: 3.3.26
Comment 2 Thomas Andrews 2023-11-16 00:45:27 CET
Kernel-desktop is also affected, but nowhere near as often. Once in a while it will fail to boot, but with a blank screen rathyer than a report of a kernel panic.

Kernel-linus appears to be unaffected - so far.
Comment 3 Giuseppe Ghibò 2023-11-16 00:50:04 CET
Did you get the same problems with 6.5.11-1.mga9? You might install just: 

https://distrib-coffee.ipsl.jussieu.fr/pub/linux/Mageia/distrib/9/i586/media/core/backports_testing/kernel-server-6.5.11-1.mga9-1-1.mga9.i586.rpm

CC: (none) => ghibomgx

Comment 4 Thomas Andrews 2023-11-16 02:34:49 CET
Installed i586 6.5.11-1, both server and desktop. Restarted each four times, all boots were normal.
Comment 5 Giuseppe Ghibò 2023-11-16 02:41:03 CET
(In reply to Thomas Andrews from comment #4)

> Installed i586 6.5.11-1, both server and desktop. Restarted each four times,
> all boots were normal.

586 works too? 

Ok, so it doesn't show the same problems as 6.4.16-5. There was a 6.5.11-2.mga9 under building, but actually build system seems having some problems, as it was actually failing building all the packages.
Comment 6 Giuseppe Ghibò 2023-11-16 02:44:36 CET
> 
> 586 works too? 

i.e. kernel-desktop586?
Comment 7 Thomas Andrews 2023-11-16 15:25:17 CET
I've never tried that one on this machine that I recall, though possibly once or twice, long ago, when I tested installing from the 32-bit Live iso. Certainly not lately.

I have real 32-bit hardware for testing desktop586, though actually it has a P4 and can use kernel-desktop. That's what I usually use.

I can give it a try, later today. For now, duty to real Life calls once again...
Comment 8 Giuseppe Ghibò 2023-11-16 17:42:40 CET
(In reply to Thomas Andrews from comment #7)

> I've never tried that one on this machine that I recall, though possibly
> once or twice, long ago, when I tested installing from the 32-bit Live iso.
> Certainly not lately.
> 
> I have real 32-bit hardware for testing desktop586, though actually it has a
> P4 and can use kernel-desktop. That's what I usually use.
> 
> I can give it a try, later today. For now, duty to real Life calls once
> again...

I tried to reproduce myself in a 32bit qemu VM, using penryn as CPU and 18GB of virtual RAM, and the kernel-server 6.4.16-5.mga9 goes panic, this happens usually in cold boot. With warm boot is less prone to get panic. I tried to dig a bit, with several local build attempts, and even stripping any mga patch including those (any) from CVEs, so basically producing a sort of kernel-linus with 64GB mem scheme, but it goes panic anyway, while 6.5.11-2 doesn't.
Comment 9 Lewis Smith 2023-11-16 20:23:44 CET
To summarise:
* 32-bit problem
* both desktop & server kernels
* 6.4.9-4 as released works
* so does 6.5.11-2
* 6.4.16-3 (under test with QA) does not work
So this stops the last passing QA. I have put a block on bug 32482.

TJ's and katnatek's laptops are different (if both HP), but the previous comment reports the problem in a VM, so it is likely not machine-dependant.

Giuseppe, since you are already on to this, assigning it to you. Re-assign it if you wish (say to kernel/drivers).

Blocks: (none) => 32482
Summary: 32-bit server kernel 6.4.16-3 doesn't boot reliably on HP Probook 6550b => 32-bit server kernel 6.4.16-3 doesn't boot reliably
CC: ghibomgx => (none)
Assignee: bugsquad => ghibomgx

Comment 10 katnatek 2023-11-16 20:51:43 CET
Reboot 2 times and select kernel-server, don't see any panic

uname -a
Linux cefiro 6.5.11-server-2.mga9 #1 SMP PREEMPT_DYNAMIC Thu Nov 16 05:31:26 UTC 2023 i686 GNU/Linux

Just an unexpected side effect when installing the packages, the display mode was reseted to make the integrated display of the laptop the main screen, I fix that with a custom desktop file that runs the needed xrandr command

I will test the other i586 kernels
Comment 11 katnatek 2023-11-16 21:43:21 CET
Tested kernel-desktop and kernel-desktop586 6.5.11-2, nothing weird until now.
I understand new test will be needed when Giuseppe move kernel 6.5 to update_testing
Comment 12 Giuseppe Ghibò 2023-11-18 19:15:11 CET
Apparently with version 6.5.11-3 in updates_testing there is panic too. Not only in i586 but also x86_64. Investigating. In the meanwhile don't use it.
Comment 13 katnatek 2023-11-18 20:44:21 CET
(In reply to Giuseppe Ghibò from comment #12)
> Apparently with version 6.5.11-3 in updates_testing there is panic too. Not
> only in i586 but also x86_64. Investigating. In the meanwhile don't use it.

Exist in backport a buildrequire that can affect in positive way to the package?
Comment 14 Giuseppe Ghibò 2023-11-19 00:04:16 CET
(In reply to katnatek from comment #13)
> (In reply to Giuseppe Ghibò from comment #12)
> > Apparently with version 6.5.11-3 in updates_testing there is panic too. Not
> > only in i586 but also x86_64. Investigating. In the meanwhile don't use it.
> 
> Exist in backport a buildrequire that can affect in positive way to the
> package?

It could, but I don't think it's related to some missed or additional BuildRequires.

Actually it's under building the same version in both backports_testing (6.5.11-4.1) and in updates_testing (6.5.11-4), same codebase, same config options, same patchset, just different naming scheme.

From a last test, apparently seems that the initrd in /boot/initrd-6.5.11-*server*.img is not generated automatically during the package installation, while it's not affected in the backport_testing (which is using the old naming scheme) that it's generating it. Did you get the initrd image generated the first installation?
Comment 15 Giuseppe Ghibò 2023-11-19 00:30:54 CET
OK, found. Apparently the problem is in a macro expansion in the SPEC file when the initrd image is generated.
Comment 16 Giuseppe Ghibò 2023-11-19 00:52:07 CET
(In reply to Giuseppe Ghibò from comment #15)

> OK, found. Apparently the problem is in a macro expansion in the SPEC file
> when the initrd image is generated.

Should be fixed in 6.5.11-5.mga9/6.5.11-5.1.mga9.
Comment 17 Morgan Leijström 2023-11-19 14:04:56 CET
Great
My main system is now on desktop 6.5.11-5.1 :)
Please create an update bug and I will report there.

CC: (none) => fri

Comment 18 Giuseppe Ghibò 2023-11-19 14:38:40 CET
Test also the 6.5.11-5 in updates_testing (same codebase but other naming scheme).
Comment 19 Morgan Leijström 2023-11-19 18:08:01 CET
6.5.11-5 desktop 64 bit working OK on my main system and also laptop dell precision M6300.

Also Frédéric runs it with success at https://bugs.mageia.org/show_bug.cgi?id=32533#c4

---

I told in another bug that on my main system, 6.5.11 desktop works better at returning my main system with nvidia driver from suspend than 6.4 desktop kernels, 7 of 8 tries was OK, but today 6.5.11-5 desktop fail (i have to power cycle the monitor) in half of the tries - it may be just random... but it also seem to depend on how long it sleeps (i.e long lunch) (GPU or monitor in deeper sleep by internal timer? Strange.)

Our linus 6.5.11-2 works OK in this regard (as do linus 6.4.* )
Will there be a new 6.5.11 linus (later than -2)?
Comment 20 katnatek 2023-11-19 21:28:39 CET
2 reboots without issues with kernel server :D

Tested on real hardware with Mageia 9 i586

uname -a
Linux cefiro 6.5.11-server-5.mga9 #1 SMP PREEMPT_DYNAMIC Sun Nov 19 01:37:01 UTC 2023 i686 GNU/Linux
Comment 21 katnatek 2023-11-19 21:41:44 CET
Packages in 9/core/updates_testing:

bpftool-6.5.11-5.mga9
cpupower-6.5.11-5.mga9
cpupower-devel-6.5.11-5.mga9
kernel-desktop-6.5.11-5.mga9
kernel-desktop-devel-6.5.11-5.mga9
kernel-desktop-devel-latest-6.5.11-5.mga9
kernel-desktop-latest-6.5.11-5.mga9
kernel-doc-6.5.11-5.mga9.noarch.rpm
kernel-server-6.5.11-5.mga9
kernel-server-devel-6.5.11-5.mga9
kernel-server-devel-latest-6.5.11-5.mga9
kernel-server-latest-6.5.11-5.mga9
kernel-source-6.5.11-5.mga9.noarch.rpm
kernel-userspace-headers-6.5.11-5.mga9
lib(64)bpf-devel-6.5.11-5.mga9
lib(64)bpf1-6.5.11-5.mga9
perf-6.5.11-5.mga9

i586 only:
kernel-desktop586-6.5.11-5.mga9
kernel-desktop586-devel-6.5.11-5.mga9
kernel-desktop586-devel-latest-6.5.11-5.mga9
kernel-desktop586-latest-6.5.11-5.mga9

From SRPM:
kernel-6.5.11-5.mga9.src.rpm
Comment 22 Morgan Leijström 2023-11-19 21:46:14 CET
We need a fresh bug, and full set of packages, example see
https://bugs.mageia.org/show_bug.cgi?id=32482#c57
Comment 23 katnatek 2023-11-19 22:03:47 CET
(In reply to Morgan Leijström from comment #22)
> We need a fresh bug, and full set of packages, example see
> https://bugs.mageia.org/show_bug.cgi?id=32482#c57

You can recycle the list in comment#21 , just need the virtualbox packages, but some of them need to be rebuild for this kernel
Comment 24 Giuseppe Ghibò 2023-11-19 22:28:45 CET
(In reply to Morgan Leijström from comment #19)
> 6.5.11-5 desktop 64 bit working OK on my main system and also laptop dell
> precision M6300.
> 
> Also Frédéric runs it with success at
> https://bugs.mageia.org/show_bug.cgi?id=32533#c4
> 
> ---
> 
> I told in another bug that on my main system, 6.5.11 desktop works better at

which was the #?

> returning my main system with nvidia driver from suspend than 6.4 desktop
> kernels, 7 of 8 tries was OK, but today 6.5.11-5 desktop fail (i have to
> power cycle the monitor) in half of the tries - it may be just random... but
> it also seem to depend on how long it sleeps (i.e long lunch) (GPU or
> monitor in deeper sleep by internal timer? Strange.)
> 
> Our linus 6.5.11-2 works OK in this regard (as do linus 6.4.* )
> Will there be a new 6.5.11 linus (later than -2)?

So 6.5.11 (which one 6.5.11-1? 6.5.11-5) works better at returning from suspend than 6.4.16-3, while either kernel-linus-6.4.16 or 6.5.11 works better all the times? Which version of nvidia driver? There is again the triplet 535.129.03, 470.223.02 and also newfeature 545.29.02.
Comment 25 Giuseppe Ghibò 2023-11-19 22:34:25 CET
(In reply to Morgan Leijström from comment #19)

> 
> Our linus 6.5.11-2 works OK in this regard (as do linus 6.4.* )
> Will there be a new 6.5.11 linus (later than -2)?

not planned, as it doesn't have generally extra patches (6.4.16 was an exception due to security bugs). So next should be 6.5.12.
Comment 26 Giuseppe Ghibò 2023-11-19 23:14:22 CET
(In reply to Morgan Leijström from comment #22)
> We need a fresh bug, and full set of packages, example see
> https://bugs.mageia.org/show_bug.cgi?id=32482#c57

https://bugs.mageia.org/show_bug.cgi?id=32537
Comment 27 Morgan Leijström 2023-11-20 10:35:38 CET
(In reply to Giuseppe Ghibò from comment #24)
> (In reply to Morgan Leijström from comment #19)
> > 6.5.11-5 desktop 64 bit working OK on my main system
...
> > I told in another bug that on my main system, 6.5.11 desktop works better at
> 
> which was the #?

https://bugs.mageia.org/show_bug.cgi?id=32482#c102

> 
> > returning my main system with nvidia driver from suspend than 6.4 desktop
> > kernels, 7 of 8 tries was OK, but today 6.5.11-5 desktop fail (i have to
> > power cycle the monitor) in half of the tries - it may be just random... but
> > it also seem to depend on how long it sleeps (i.e long lunch) (GPU or
> > monitor in deeper sleep by internal timer? Strange.)
> > 
> > Our linus 6.5.11-2 works OK in this regard (as do linus 6.4.* )
> > Will there be a new 6.5.11 linus (later than -2)?
> 
> So 6.5.11 (which one 6.5.11-1? 6.5.11-5) 

Tested desktop 6.5.11-2 & 6.5.11-5

> works better 

I think so but the randomness is a problem - I can not say really sure.

> at returning from
> suspend than 6.4.16-3, 

and 6.4.16-5

< while either kernel-linus-6.4.16 or 6.5.11 works
> better all the times? 

Absolutely, no doubt.

> Which version of nvidia driver?

Any nvidia proprietary I have tried from both 470, 525 (some months ago) and 535.
470 is my daily driver, i have an unmeasurable feeling this works slightly better than 535 on my GTX750Ti.

Using Xorg modesetting, there is no problem with either desktop or linux, just a bit slower performance.

I need the best possible option for this old weak card on 4K screen so I opt for proprietary and power cycle monitor whenever it fail to wake up.

> There is again the
> triplet 535.129.03, 470.223.02 and also newfeature 545.29.02.

I am on 470.223.02 since it appeared.

Will try newfeature.
Comment 28 Morgan Leijström 2023-11-20 13:16:43 CET
Newfeature is working OK, will see about resume reliability.
I will report back later in bug 32537.

I did not find bugs for the Nvidia drivers.  
Please open one for each and set to QA if you want them officially tested.
Comment 29 Giuseppe Ghibò 2023-11-20 16:37:37 CET
(In reply to Morgan Leijström from comment #27)

> 
> I need the best possible option for this old weak card on 4K screen so I opt
> for proprietary and power cycle monitor whenever it fail to wake up.
> 

750Ti cards are still sold nowadays on the as new product on the market...

Problem is getting everything in 4K which maybe requires more power.
Comment 30 Thomas Andrews 2025-03-27 19:04:35 CET
The original issue for this bug appears to have been resolved in another "fresh" bug 32537 but this bug was never updated accordingly. I had removed the 32-bit server kernel from the machine that had the original problem, but just to confirm that it is now gone I installed the server kernel 6.6.83-1 just now, and booted into it with no issues.

Since we moved on from the 6.4 and 6.5 series kernels some time ago, and there have been no reports of the issue resurfacing, I'm closing this bug as FIXED.

Resolution: (none) => FIXED
Status: NEW => RESOLVED


Note You need to log in before you can comment on or make changes to this bug.