Bug 32082 - The laptop crash when sleep or boot sometimes with newer 6.3/4 kernels, but not older 6.1.
Summary: The laptop crash when sleep or boot sometimes with newer 6.3/4 kernels, but n...
Status: RESOLVED FIXED
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: Cauldron
Hardware: All Linux
Priority: High critical
Target Milestone: ---
Assignee: Kernel and Drivers maintainers
QA Contact:
URL:
Whiteboard: MGA9TOO
Keywords:
Depends on: 32537
Blocks: 31984
  Show dependency treegraph
 
Reported: 2023-07-06 11:55 CEST by Jose Manuel López
Modified: 2023-11-29 21:34 CET (History)
4 users (show)

See Also:
Source RPM: kernel 6.4.16-desktop-1.mga9
CVE:
Status comment:


Attachments
Journal.txt when bug happens (181.15 KB, text/plain)
2023-07-06 11:56 CEST, Jose Manuel López
Details
Journalctl with a boot with kernel 6.1.6 (101.32 KB, text/plain)
2023-07-11 10:22 CEST, Jose Manuel López
Details
Journa log (195.41 KB, text/plain)
2023-07-20 22:36 CEST, Jose Manuel López
Details
Details of my hardware (17.41 KB, text/plain)
2023-10-15 18:05 CEST, Jose Manuel López
Details
Option package (353.20 KB, image/jpeg)
2023-10-17 17:21 CEST, Jose Manuel López
Details
Warnings with kernel 6.5.8-3 (132.81 KB, image/jpeg)
2023-10-24 15:45 CEST, Jose Manuel López
Details

Description Jose Manuel López 2023-07-06 11:55:14 CEST
Description of problem: Since some kernels ago, I have noticed that when my computer goes to sleep, it locks up and the Shift Lock key is flashing. After this, I can't do nothing and I only can power off the laptop from power button.

I haven't can downgrade the kernel because there isn't a previous version in repositories.

I attach a file of journal when this happens.

My laptop is a Slimbook ProX15 AMD 4800H with SSD disk and Mageia 9 Plasma KDE x86_64.


Version-Release number of selected component (if applicable): Mageia 9 Kernel 6.3.9 and 6.4


How reproducible: Boot laptop and after sleep.
Comment 1 Jose Manuel López 2023-07-06 11:56:38 CEST
Created attachment 13905 [details]
Journal.txt when bug happens

This is the journal file, when the bug is happens.
Comment 2 Jose Manuel López 2023-07-06 11:57:43 CEST
I have tried install Endeavour Os for probe this and happens too.
Jose Manuel López 2023-07-08 00:06:42 CEST

Priority: Normal => release_blocker

Comment 3 Jose Manuel López 2023-07-10 16:47:01 CEST
Reported here too: https://gitlab.freedesktop.org/drm/amd/-/issues/2691
Comment 4 Jose Manuel López 2023-07-11 08:09:30 CEST
Please, I need downgrade to kernel 6.1.14 for answer to comment in the link of the comment 3.

I have searched in madb.mageia.org, but the rpm file don't exist.
Comment 5 Jose Manuel López 2023-07-11 10:21:55 CEST
Hi all, with kernel 6.1.6 works fine for me, no issues with sleep. Attach journalctl with this.
Comment 6 Jose Manuel López 2023-07-11 10:22:27 CEST
Created attachment 13919 [details]
Journalctl with a boot with kernel 6.1.6
Comment 7 Jose Manuel López 2023-07-11 10:34:12 CEST
The same result with 6.1.34. Works fine for me.
Comment 8 Lewis Smith 2023-07-15 20:36:57 CEST
Thank you for the comparisons with older kernels.
Since you do not have the problem with 6.1.x, but it happens with M9 kernels 6.3|4.x, assigning this to the kernel team.

Assignee: bugsquad => kernel
Source RPM: Kernel, acpi, power, sleep and boot => kernel 6.4.3-desktop-1.mga9
Summary: The laptop crash when sleep or boot sometimes. => The laptop crash when sleep or boot sometimes with newer 6.3/4 kernels, but not older 6.1.

Comment 9 Jose Manuel López 2023-07-20 22:35:36 CEST
Hi all,

I have tried this with the last version of kernel 6.4.4. The issue is here still.

Attach journal log from 4 minutes ago with this kernel
Comment 10 Jose Manuel López 2023-07-20 22:36:27 CEST
Created attachment 13927 [details]
Journa log

This journal log with kernel 6.4.4
Comment 11 Jose Manuel López 2023-07-24 11:21:21 CEST
Hi all,

I have tried this with the last version of kernel 6.4.5 The issue is here still.
Comment 12 Jose Manuel López 2023-09-20 14:49:32 CEST
Hi all,

Something must be wrong. Mageia 9 with regard to what controls the kernel on my computer. I've tried with version 6.5.3 and the same thing keeps happening.

On the other hand and to rule out, since it seems to me many kernel versions already with the error, I have tried to install other distros in real to see if it reproduces.

- Slimbook Os, kernel 6.2, works fine.
- Endeavour Os, kernel 6.5.3 works fine.
- Kdeneon Os, kernel 6.2.3 works fine.

So I think the bug must be somewhere in the kernel compilation in Mageia from version 6.2 onwards (which started to crash on me...).

Greetings!
Comment 13 Jose Manuel López 2023-10-02 21:52:46 CEST
I still have the same problem with kernel 6.4.16 in repos testing.
Comment 14 christian barranco 2023-10-02 21:59:22 CEST
Should we try CONFIG_AMD_PMC=y in our kernel defconfig file?
Comment 15 Jose Manuel López 2023-10-02 22:22:51 CEST
I couldn't say how the distros I named above have it configured.
Comment 16 r howard 2023-10-02 22:51:29 CEST
(In reply to Jose Manuel López from comment #15)
> I couldn't say how the distros I named above have it configured.

Can you take a look at those distros' source control system and see if it is in their kernel config ?

CC: (none) => rihoward1

Comment 17 Giuseppe Ghibò 2023-10-02 22:59:36 CEST
mostly most of them have CONFIG_AMD_PMC=m, like us; you might try ensuring amd_pmc module is loaded before, using 'modprobe amc_pmc'.

CC: (none) => ghibomgx

Comment 18 Giuseppe Ghibò 2023-10-02 23:01:04 CEST
(In reply to Giuseppe Ghibò from comment #17)
> mostly most of them have CONFIG_AMD_PMC=m, like us; you might try ensuring
> amd_pmc module is loaded before, using 'modprobe amc_pmc'.

amd_pmc not amc_pmc.
Comment 19 Jose Manuel López 2023-10-03 06:34:50 CEST
The module is loaded into the kernel


[root@localhost ~]# modprobe amd_pmc
[root@localhost ~]# modprobe amd_pmc --first-time
modprobe: ERROR: could not insert 'amd_pmc': Module already in kernel
[root@localhost ~]# lsmod | grep amd_pmc
amd_pmc                28672  0
Comment 20 Jose Manuel López 2023-10-03 06:41:14 CEST
Same issue with module loaded.
Comment 21 Jose Manuel López 2023-10-03 06:44:00 CEST
Any log that can show you the error apart from journal?
Comment 22 Jose Manuel López 2023-10-12 09:41:15 CEST
Hello everyone,

Today I tried to install the "kernel-linus" version on my computer, and the suspension problem was solved, so I deduce that there is something in the packaging of our kernel-desktop that causes my computer to crash.

But how can we check where the difference is?

Greetings!
Comment 23 Giuseppe Ghibò 2023-10-14 18:11:21 CEST
(In reply to Jose Manuel López from comment #22)
> Hello everyone,
> 
> Today I tried to install the "kernel-linus" version on my computer, and the
> suspension problem was solved, so I deduce that there is something in the
> packaging of our kernel-desktop that causes my computer to crash.
> 
> But how can we check where the difference is?
> 
> Greetings!

Are u using the same version for both standard and linus version, i.e.:

kernel-desktop-6.4.16-3.mga9
kernel-linus-6.4.16-3.mga9

?

Can you try also with kernel-server-6.4.16-3.mga9?
Comment 24 Jose Manuel López 2023-10-15 18:03:30 CEST
Something must be wrong in the kernels packaged by Mageia (desktop and server)

Installed kernel-server 6.4.9 I get the same suspension problems as with the desktop series.

I continue with the kernel from the stable 6.4.9 repos (or is it the one that appears to me after refreshing the repos several times...)

I also tried 6.4.16 and the same thing happened to me, so I guess something is wrong in a patch or in the kernel packaging that affects my hardware:

Slimbook ProX 15 AMD
AMD 4 Ryzen 7 4800H
Mageia Plasma x86_64

Tested with other distros successfully without errors.
Comment 25 Jose Manuel López 2023-10-15 18:05:10 CEST
Created attachment 14061 [details]
Details of my hardware

I attach konsole output with dmidecode command for see details of my hardware.
Comment 26 Morgan Leijström 2023-10-15 19:50:47 CEST
Thank you for testing.

Added kernel-linus as tip at
https://wiki.mageia.org/en/Setup_the_graphical_server#If_problems

Priority: release_blocker => High
CC: (none) => fri
Source RPM: kernel 6.4.3-desktop-1.mga9 => kernel 6.4.16-desktop-1.mga9
Whiteboard: (none) => MGA9TOO

Comment 27 Giuseppe Ghibò 2023-10-15 23:10:06 CEST
(In reply to Jose Manuel López from comment #24)

> Something must be wrong in the kernels packaged by Mageia (desktop and
> server)
> 
> Installed kernel-server 6.4.9 I get the same suspension problems as with the
> desktop series.
> 
> I continue with the kernel from the stable 6.4.9 repos (or is it the one
> that appears to me after refreshing the repos several times...)
> 
> I also tried 6.4.16 and the same thing happened to me, so I guess something
> is wrong in a patch or in the kernel packaging that affects my hardware:
> 
> Slimbook ProX 15 AMD
> AMD 4 Ryzen 7 4800H
> Mageia Plasma x86_64
> 
> Tested with other distros successfully without errors.

If the kernel-linus works all the times, it means that there is some extra patch in stock kernel that interferes with your hardware (or either needs to be completed with some further extra patch from newer kernels).

Since you've learned packaging I think you can do some test with compiling kernel packages yourself locally to find which is the patch that it causing problems. It's not difficult, you might follow this:

a) get the current mga9 kernel package from svn (in updates/9/kernel).

b) edit the spec file and disable some building package, to speed-up building, e.g.:

%define build_server 0
%define build_doc 0
%define build_cpupower 0
%define build_perf 0
%define build_bpftool 0
%define build_libbpf 0

c) then disable some patches in the spec file which might be related to AMD CPU or chipset, e.g. those from 1020 to 1050 (just comment the PatchXXXX line), the 1520, 1760, etc.

d) and then build the modified rpm package, bumping the release number (e.g. 6.4.16-3.0, 3.1, etc.).

e) install just the kernel-desktop package rpm (or kernel-devel if you have some dkms related modules).

f) if you hit that a kernel works correctly with your problem, then try to enable some patch until you get the break point, so to detect which patch was causing the problem with your hardware.

g) if it still fails, then you can prosecute with building from c) disabling more patches and so on.
Comment 28 Jose Manuel López 2023-10-16 13:03:48 CEST
Well, I have followed the indications in comment 27. It seems that the patch that is generating the error is 1050. If I comment this patch the machine works perfectly with kernel 6.5.7.4 which is the one I'm working with right now.

So I think that whoever is responsible should check this patch in case there is a bug.

Thanks for the directions to find it, I hope it can be fixed soon in the stable branch to leave the machine with the stable kernel.

Greetings!!
Comment 29 Giuseppe Ghibò 2023-10-16 13:13:25 CEST
(In reply to Jose Manuel López from comment #28)
> Well, I have followed the indications in comment 27. It seems that the patch
> that is generating the error is 1050. If I comment this patch the machine
> works perfectly with kernel 6.5.7.4 which is the one I'm working with right
> now.
> 
> So I think that whoever is responsible should check this patch in case there
> is a bug.
> 
> Thanks for the directions to find it, I hope it can be fixed soon in the
> stable branch to leave the machine with the stable kernel.
> 
> Greetings!!

You said kernel 6.5.7.4? Where it come from? latest for cauldron/updates_testing is 6.5.7-3.mga10 (which works also on mga9).

What about kernel-6.4.16-3.mga9 with the patch1050 removed?
Comment 30 Jose Manuel López 2023-10-16 13:25:12 CEST
Sorry, the kernel is 6.5.7.3, but I changed the version to differentiate it.

I'm going to see if I can download the 6.4.16 srpm and check it out.
Comment 31 Giuseppe Ghibò 2023-10-16 13:27:24 CEST
(In reply to Jose Manuel López from comment #30)

> I'm going to see if I can download the 6.4.16 srpm and check it out.

Just use "mgarepo co 9/kernel" or "mgarepo co svn://svn.mageia.org/packages/updates/9/kernel" (as anonymous).
Comment 32 Jose Manuel López 2023-10-17 15:58:31 CEST
Yesterday, I have tried this during all day.

With kernel 6.4.16-3 I have comment the patchs 1030 and 1050 so my laptop works fine.

Both, if I leave patch 1030 and comment on 1050, or vice versa, the error appears again.

So I can't leave either one activated or the other.

I hope this helps clarify the error.

Greetings!
Comment 33 Jose Manuel López 2023-10-17 17:21:20 CEST
Created attachment 14067 [details]
Option package

I have attached a screenshot of the option requested when packaging, which I have marked as "y". The resulting kernel version, as I said above, does not give me any sleep problems.
Comment 34 Giuseppe Ghibò 2023-10-17 18:43:42 CEST
(In reply to Jose Manuel López from comment #32)
> Yesterday, I have tried this during all day.
> 
> With kernel 6.4.16-3 I have comment the patchs 1030 and 1050 so my laptop
> works fine.
> 
> Both, if I leave patch 1030 and comment on 1050, or vice versa, the error
> appears again.
> 
> So I can't leave either one activated or the other.
> 
> I hope this helps clarify the error.
> 
> Greetings!

Patch1050, appartently seems it was backported from what is actual kernel-6.6rcX upstream, so probably something further changed in 6.6, or requires some further patches to be backported to 6.4 to get it working properly. I think for now Patch1050 (i.e. x86-fpu-xstate-Fix-PKRU-covert-channel)
could be removed in either 6.4 and 6.5 in a next build.

For Patch1030, i.e. "sched-fair-Multi-LLC-select_idle_sibling", seems it was introduced in our kernel between kernel-6.3.5 and kernel-6.3.6 apparently for better supporting Zen2 CPU with 3 core only which was idling too long. I've not seen integrated the code from upstream for kernel 6.5, nor 6.6rcX.
Comment 35 Morgan Leijström 2023-10-18 11:08:52 CEST
This is very interesting!

kernel-linus works flawlessly on my system.
The first kernel i tried that is fully reliable on resuming from suspend, with nvidia drivers.

With desktop kernel, at resuming screen only turns on to say "no signal" and i have to shut the screen off-on and it works.  (first i was thinking the computer crashed, and i had to reboot it by issuing the RE part of REISUB, screen woke up at DM login and I rebooted from there) This with nvidia 470 and 535 - modesetting driver was working OK. (as do nouveau, but it is very slow here)

GPU Nvidia GTX750Ti
CPU Intel i7-870
Chipset Intel P55

I also tried a AMD RX6400, but it was worse - screen never woke up until after full reboot, i used REIS part of REISUB if i remember correctly.  I never thought of testing linus kernel then, this was with desktop kernel.

I will soon try that card with linus kernel.
Comment 36 Morgan Leijström 2023-10-18 22:20:47 CEST
NO, with RX6400 this system hang hard on resuming from suspend - not even REISUB have any effect, have to cut power, with both linus and desktop kernel.
I dont have time for this, swapped in my nvidia GTX750 again.

Like for nvidia470, kernel linus works perfectly with latest 535 version from nonfree updates testing.
BTW what about that driver, it have been sitting there soon a month with no bug for it?
Comment 37 Jose Manuel López 2023-10-19 11:29:20 CEST
What about what was seen in bug 32082?

Greetings!
Comment 38 Morgan Leijström 2023-10-19 11:37:16 CEST
This is 32082.  More precisely what do you mean?
Comment 39 Giuseppe Ghibò 2023-10-19 11:46:47 CEST
(In reply to Morgan Leijström from comment #38)
> This is 32082.  More precisely what do you mean?

I think the comment was a carbon copy from bug https://bugs.mageia.org/show_bug.cgi?id=32296#c95
Comment 40 Jose Manuel López 2023-10-19 13:58:22 CEST
Yeah, I got confused..:(
Comment 41 Giuseppe Ghibò 2023-10-24 00:14:49 CEST
(In reply to Jose Manuel López from comment #37)
> What about what was seen in bug 32082?
> 
> Greetings!

6.4.16-4.mga9 in building queues.

BTW, there is also 6.5.8-3.mga9 in core/backports_testing.
Comment 42 Jose Manuel López 2023-10-24 11:40:24 CEST
Ok, I will wait to 6.4.16-4 And meanwhile I try 6.5.8-3.
Comment 43 Jose Manuel López 2023-10-24 15:43:45 CEST
I have installed kernel 6.5.8-3 from the backport repos. Works fine for me. I no longer have sleep problems with this kernel either. But some warnings appear at the beginning. Attachment.
Comment 44 Jose Manuel López 2023-10-24 15:45:21 CEST
Created attachment 14089 [details]
Warnings with kernel 6.5.8-3

These warnings appear at startup with kernel 6.5.8-3 from the backport repositories.
Comment 45 Giuseppe Ghibò 2023-10-24 21:48:26 CEST
(In reply to Jose Manuel López from comment #43)

6.4.16-4.mga9 is available now in updates_testing.

> I no longer have sleep problems with this kernel either. But some warnings
> appear at the beginning. Attachment.

Hints?

Maybe required to be update also kernel-firmware/kernel-firmware-nonfree, radeon-firmware to some newer release.
Comment 46 Jose Manuel López 2023-10-24 22:54:22 CEST
I have installed kernel 6.4.16-4 from testing repos. Works ok for me. No sleep issues here. 

Good works guys!!

Greetings!
Comment 47 Frédéric "LpSolit" Buclin 2023-10-24 23:48:13 CEST
(In reply to Giuseppe Ghibò from comment #41)
> BTW, there is also 6.5.8-3.mga9 in core/backports_testing.

Why is 6.5.8 in backports_testing instead of updates_testing? 6.4.x is no longer maintained upstream, so how will users who didn't enable Backports get newer kernels?
Comment 48 Giuseppe Ghibò 2023-10-25 00:03:35 CEST
(In reply to Frédéric "LpSolit" Buclin from comment #47)
> (In reply to Giuseppe Ghibò from comment #41)
> > BTW, there is also 6.5.8-3.mga9 in core/backports_testing.
> 
> Why is 6.5.8 in backports_testing instead of updates_testing? 6.4.x is no
> longer maintained upstream, so how will users who didn't enable Backports
> get newer kernels?

Because there will be still one shot with 6.4.16-4(5).mga9, fixing also couple of CVEs, then moving to 6.5.8(9).
Comment 49 Jose Manuel López 2023-10-31 12:46:35 CET
Como este error se solucionó en el kernel 6.4.16-4, podemos cerrarlo. En caso de que vuelva a aparecer, abriré un nuevo error.
Comment 50 Jose Manuel López 2023-10-31 12:47:07 CET
As this bug is fixed in kernel 6.4.16-4, we can close this bug. In case of appears again, I will open a new bug.
Comment 51 Morgan Leijström 2023-10-31 19:34:31 CET
Lets hod off closing until we really release something to updates.

I think you should try kernel-6.4.16-5.mga9 now in updates testing.
Morgan Leijström 2023-10-31 19:47:09 CET

Blocks: (none) => 31984

Lewis Smith 2023-11-02 20:52:49 CET

See Also: (none) => https://bugs.mageia.org/show_bug.cgi?id=32482

Comment 52 Jose Manuel López 2023-11-03 10:38:41 CET
(In reply to Morgan Leijström from comment #51)
> Lets hod off closing until we really release something to updates.
> 
> I think you should try kernel-6.4.16-5.mga9 now in updates testing.

I have tried this version (6.4.16-5), works fine for me. All sleep issues are dessapeared.
Marja Van Waes 2023-11-03 16:10:07 CET

Depends on: (none) => 32482
CC: (none) => marja11

Comment 53 Jose Manuel López 2023-11-04 07:49:39 CET
I am currently working on my computer with this kernel version, as I mentioned in two other bugs (I only remember 32082), it works well for me at the moment.

I have had no sleep issues and everything works as expected, applications, audio, video, restart, sleep.
Marja Van Waes 2023-11-04 12:36:53 CET

See Also: https://bugs.mageia.org/show_bug.cgi?id=32482 => (none)

Comment 54 Morgan Leijström 2023-11-17 19:51:22 CET
It is now decided 6.4.16-5 or later 6.4 will not be released.
We shift to 6.5, already in backports.
Before closing this bug, have you checked if latest 6.5 in backports works?
Comment 55 Giuseppe Ghibò 2023-11-17 21:29:47 CET
wait the 6.5.11(12) in updates_testing before closing.
Comment 56 Jose Manuel López 2023-11-20 07:16:04 CET
Hi,

I am testing 6.5.11-5. Works fine for now. Sleep ok.

Greetings!
Marja Van Waes 2023-11-20 23:19:13 CET

Depends on: 32482 => 32537

Comment 57 Jose Manuel López 2023-11-21 10:26:50 CET
I think that we close this bug.
Comment 58 Marja Van Waes 2023-11-29 21:34:04 CET
(In reply to Jose Manuel López from comment #57)
> I think that we close this bug.

Indeed, thanks for mentioning it.

Status: NEW => RESOLVED
Resolution: (none) => FIXED


Note You need to log in before you can comment on or make changes to this bug.