Bug 29701 - AMD graphic card fans don't rotate at all with kernel 5.15.4
Summary: AMD graphic card fans don't rotate at all with kernel 5.15.4
Status: RESOLVED WONTFIX
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: 8
Hardware: All Linux
Priority: Normal normal
Target Milestone: ---
Assignee: Kernel and Drivers maintainers
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-11-26 13:02 CET by christian barranco
Modified: 2023-08-27 20:21 CEST (History)
1 user (show)

See Also:
Source RPM: kernel-desktop-latest-5.15.4-1.src.mga8
CVE:
Status comment:


Attachments
GPU fan speed and temperature with kernel 5.15.4 (459.90 KB, image/jpeg)
2021-11-26 13:05 CET, christian barranco
Details
video of the fan not working with kernel 5.15.4 (658.25 KB, video/mp4)
2021-11-26 13:09 CET, christian barranco
Details
GPU fan speed and temperature with kernel 5.10.78 (476.41 KB, image/jpeg)
2021-11-26 13:12 CET, christian barranco
Details
video of the fan spinning with kernel 5.10.78 (464.03 KB, video/mp4)
2021-11-26 13:15 CET, christian barranco
Details
journalctl of the boot and login with kernel 5.15 (216.21 KB, text/plain)
2021-11-26 14:59 CET, christian barranco
Details
5.10.78 boot journal (218.01 KB, text/plain)
2021-11-26 16:33 CET, christian barranco
Details
fan starts at some point when the GPU temp increases (96.35 KB, image/jpeg)
2021-11-26 20:33 CET, christian barranco
Details

Description christian barranco 2021-11-26 13:02:34 CET
Description of problem:
After an update of the kernel to 5.15.4, the fans of my AMD RX570 graphic card don't rotate anymore. They do with 5.10.78.

I was in fact alerted my psensor when I saw the fan speed was stuck at 1791rpm.
I restarted with kernel 5.10.78 and the speed was around 930rpm.
Then, to make sure it is not a wrong information sent by psensor, I opened the side of case to confirm the fans were like crazy. But, in fact, the fans were not rotating at all! 

I will attach psensor graphs + pictures of the fan.
My DE is Plasma, but I am not sure it has an impact.

Version-Release number of selected component (if applicable):
kernel-userspace-headers-5.15.4-1

How reproducible: always


Steps to Reproduce:
1.Update kernel to latest 5.15.4
2.Log in and look at the fan
3.

Information on my hardware:
Machine:   Type: Desktop System: ASUS product: N/A v: N/A serial: <superuser required> 
           Mobo: ASUSTeK model: TUF GAMING B550M-PLUS v: Rev X.0x serial: <superuser required> UEFI: American Megatrends 
           v: 2423 date: 08/10/2021 
CPU:       Info: 12-Core AMD Ryzen 9 5900X [MT MCP] speed: 4529 MHz min/max: 2200/3700 MHz 
Graphics:  Device-1: Advanced Micro Devices [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] driver: amdgpu 
           v: kernel 
           Display: x11 server: Mageia X.org 1.20.12 driver: amdgpu,v4l resolution: 2560x1440~60Hz 
           OpenGL: renderer: Radeon RX 570 Series (POLARIS10 DRM 3.40.0 5.10.78-desktop-1.mga8 LLVM 11.0.1) v: 4.6 Mesa 21.2.4 

I know this motherboard is impacted by a kernel bug: https://bugzilla.kernel.org/show_bug.cgi?id=204807
Maybe it could give some hints?

Grub options, per CCM: 
splash quiet noiswmd resume=UUID=5da5b07f-0b10-431c-9012-1562f1bb3dfb audit=0
Comment 1 christian barranco 2021-11-26 13:05:30 CET
Created attachment 13006 [details]
GPU fan speed  and temperature with kernel 5.15.4

Same load on the processor. Same applications active.
It looks like a speed demand is sent to the graphic card but the fans don't react?
Comment 2 christian barranco 2021-11-26 13:09:51 CET
Created attachment 13007 [details]
video of the fan not working with kernel 5.15.4

Clearly, the fan doesn't rotate.
Comment 3 christian barranco 2021-11-26 13:12:04 CET
Created attachment 13008 [details]
GPU fan speed  and temperature with kernel 5.10.78

GPU fan speed (purple curve) is at a usual value around 930rpm, when not heavy loaded.
Comment 4 christian barranco 2021-11-26 13:15:18 CET
Created attachment 13009 [details]
video of the fan spinning with kernel 5.10.78

With kernel 5.10.78, under the same load, the fan is clearing spinning, as intended.
Comment 5 Morgan Leijström 2021-11-26 14:06:52 CET
Do you see something in system journal that may be relevant?

FWIW i see kernel 5.15.5 in testing repo

Assigning to maintainers

Assignee: bugsquad => kernel
CC: (none) => fri

Comment 6 christian barranco 2021-11-26 14:59:29 CET
Created attachment 13010 [details]
journalctl of the boot and login with kernel 5.15

Nothing strikes me but I might not be knowledgeable enough on what to look for.
christian barranco 2021-11-26 15:01:14 CET

Source RPM: kernel-userspace-headers-5.15.4-1.src.mga8 => kernel-desktop-latest-5.15.4-1.src.mga8

Comment 7 christian barranco 2021-11-26 16:14:14 CET
Hi
just tested the 5.15.5 kernel from Testing repo; same issue, sadly.
Comment 8 Thomas Backlund 2021-11-26 16:22:55 CET
please provide journal from booting with 5.10 series too
Comment 9 christian barranco 2021-11-26 16:33:38 CET
Created attachment 13011 [details]
5.10.78 boot journal

journaltcl output while booting with 5.10.78, upon Thomas' request
Comment 10 Thomas Backlund 2021-11-26 17:37:22 CET
are you sure this is actually a problem ?

one difference between 5.10 and 5.15 is that 5.15 does runtime power management so technically the fans dont need to spin unless the load and temperature rises high enough

does the fan spin if you add amdgpu.runpm=0 on kernel command line ?

as for the referenced motherboard "issue", I will backport the support that has landed in 5.16-rc2 to next kernel build..
Comment 11 christian barranco 2021-11-26 18:11:14 CET
(In reply to Thomas Backlund from comment #10)
> are you sure this is actually a problem ?
> 
> one difference between 5.10 and 5.15 is that 5.15 does runtime power
> management so technically the fans dont need to spin unless the load and
> temperature rises high enough
> 
> does the fan spin if you add amdgpu.runpm=0 on kernel command line ?
> 
> as for the referenced motherboard "issue", I will backport the support that
> has landed in 5.16-rc2 to next kernel build..

* amdgpu.runpm=0 doesn't change anything. Fan spins with 5.10 and doesn't with 5.15

* regarding the power management, what I don't explain then is why the fan speed shows ~1800rpm and nothing happens. 
The psensor snapshot shows as well the GPU temperature. It starts at 35C and keeps increasing till 40C (I took the snapshot at that time). Still no reaction from the fan; where could I find when the fan should really then start cranking up to find out whether it could be a power management thing?
Comment 12 Morgan Leijström 2021-11-26 18:16:12 CET
Is it addressing the wrong fan?
Comment 13 Thomas Backlund 2021-11-26 18:23:52 CET
(In reply to christian barranco from comment #11)

> * amdgpu.runpm=0 doesn't change anything. Fan spins with 5.10 and doesn't
> with 5.15
> 

ok...

> * regarding the power management, what I don't explain then is why the fan
> speed shows ~1800rpm and nothing happens. 

That might simply be the sensor not being to read correct value, so it shows some default value...

> The psensor snapshot shows as well the GPU temperature. It starts at 35C and
> keeps increasing till 40C (I took the snapshot at that time). Still no
> reaction from the fan; where could I find when the fan should really then
> start cranking up to find out whether it could be a power management thing?

For example on my MSI GPU, by design the fans dont start until the gpu hits 60C  in order to provide a "silent mode / experience"
Comment 14 christian barranco 2021-11-26 19:12:20 CET
(In reply to Morgan Leijström from comment #12)
> Is it addressing the wrong fan?

(In reply to Thomas Backlund from comment #13)
>That might simply be the sensor not being to read correct value, so it shows some default value...
It is the same sensor code between 5.10 and 5.15
In that case, something would have changed with 5.15 in the "directory" connecting the fan to an entry. 
I re-ran sensors-detect with 5.15. So, I assume the fan should be the right one and I don't have any other proposal, anyway.

>For example on my MSI GPU, by design the fans dont start until the gpu hits 60C  in order to provide a "silent mode / experience"
In that case, does it mean kernel 5.10 would be overulling the hardware logic and be starting the fan?
Comment 15 christian barranco 2021-11-26 20:32:13 CET
I just did another, stressing my GPU.
Actually, the fans started to spin when the GPU temp reached about 52C.
They stopped again when the temperature went down to about 43C.
I did the test twice, and you will see on the picture I will attach.

However, the fan speed is completely off and remains incoherent though.
Could it be connected to the bug you will patch?

Note: by the way, thank you so much Thomas for applying this patch at the next kernel release! :)
Comment 16 christian barranco 2021-11-26 20:33:01 CET
Created attachment 13012 [details]
fan starts at some point when the GPU temp increases
Comment 17 Morgan Leijström 2021-11-26 20:41:48 CET
I would guess that the fan go autonomously in case the software does not work.

Interestingly the GPU fan rpm reading diagram react when fan go on/off. It may show correct speed when fan is on, but some fake value when fan is off.
Comment 18 Thomas Backlund 2021-11-26 21:15:23 CET
(In reply to christian barranco from comment #14)
> (In reply to Morgan Leijström from comment #12)
> > Is it addressing the wrong fan?
> 
> (In reply to Thomas Backlund from comment #13)
> >That might simply be the sensor not being to read correct value, so it shows some default value...
> It is the same sensor code between 5.10 and 5.15
> In that case, something would have changed with 5.15 in the "directory"
> connecting the fan to an entry. 
> I re-ran sensors-detect with 5.15. So, I assume the fan should be the right
> one and I don't have any other proposal, anyway.
> 


It's probably the fact that acpi is getting stricter, so the sensor code cant access it unless booting with "acpi_enforce_resources=lax"


> >For example on my MSI GPU, by design the fans dont start until the gpu hits 60C  in order to provide a "silent mode / experience"
> In that case, does it mean kernel 5.10 would be overulling the hardware
> logic and be starting the fan?


It basically means the amdgpu in 5.10 did not fully support your hw regarding runtime pm, and in that case it runs the fans all the time, as otherwise it would fry the hw if fans would never start when really needed...
Comment 19 sturmvogel 2021-11-26 21:23:52 CET
(In reply to Thomas Backlund from comment #18)

> It basically means the amdgpu in 5.10 did not fully support your hw
> regarding runtime pm, and in that case it runs the fans all the time, as
> otherwise it would fry the hw if fans would never start when really needed...

This is backed by following information that Kernel 5.15 provides better support for several AMD products:
https://www.phoronix.com/scan.php?page=news_item&px=Linux-5.15-AMD

Having the fans run all the time at a GPU (which supports the switch off of fans when below a defined temperature) looks like bad support of driver/kernel. That means, with Kernel 5.15 the fan control at this graphic card works now how it should be.
Comment 20 Morgan Leijström 2021-11-26 22:53:43 CET
The control seem OK, but not the rpm readout.
Comment 21 christian barranco 2021-11-27 08:30:55 CET
(In reply to Morgan Leijström from comment #20)
> The control seem OK, but not the rpm readout.

Yes, I agree. 
I will monitor the release of the kernel with the patch. 
Should this report be renamed « wrong sensor value read out with kernel 5.15 »?
Comment 22 Thomas Backlund 2021-11-27 08:54:41 CET
There is now a kernel-5.15.5-2.mga8 in updates testing with the added nct6775 patches
Comment 23 christian barranco 2021-11-27 13:43:19 CET
(In reply to Thomas Backlund from comment #22)
> There is now a kernel-5.15.5-2.mga8 in updates testing with the added
> nct6775 patches

Thanks, you rock!

I have now system fan speeds and other motherboard temperatures I was missing with 5.10.x
Unfortunately, still, the AMD GPU fan speed exhibits the same awkward behavior…
christian barranco 2021-11-27 14:33:18 CET

Severity: major => normal

Comment 24 christian barranco 2023-03-05 22:14:08 CET
I guess this report can be closed now?
Comment 25 christian barranco 2023-08-27 20:21:51 CEST
Normal behavior. The inacurate fan speed is another story.

Resolution: (none) => WONTFIX
Status: NEW => RESOLVED


Note You need to log in before you can comment on or make changes to this bug.