Bug 23804

Summary: AMD-A10 7860 Kaveri; amdgpu; VCE not responding (kernel-4.19* should solve it)
Product: Mageia Reporter: Richard Walker <richard.j.walker>
Component: RPM PackagesAssignee: Kernel and Drivers maintainers <kernel>
Status: RESOLVED OLD QA Contact:
Severity: normal    
Priority: Normal CC: marja11, thierry.vignaud
Version: Cauldron   
Target Milestone: ---   
Hardware: x86_64   
OS: Linux   
See Also: https://bugs.freedesktop.org/show_bug.cgi?id=107910
Whiteboard:
Source RPM: kernel-desktop-4.18.16-2.mga7 CVE:
Status comment:
Attachments: output from journalctl -b
boot log from kernel 4.19

Description Richard Walker 2018-11-04 14:05:28 CET
Description of problem:
The system log first reports that VCE is initialised successfully, then reports ten failures to get a response from it. It tries to reset the ECPU and then gives up.

There is no record in the log that /usr/lib/firmware/amdgpu/kaveri_vce.bin has been sought or loaded. No other firmware file is mentioned in the log. Available firmware files are:

/usr/lib/firmware/amdgpu/kaveri_ce.bin
/usr/lib/firmware/amdgpu/kaveri_me.bin
/usr/lib/firmware/amdgpu/kaveri_mec.bin
/usr/lib/firmware/amdgpu/kaveri_mec2.bin
/usr/lib/firmware/amdgpu/kaveri_pfp.bin
/usr/lib/firmware/amdgpu/kaveri_rlc.bin
/usr/lib/firmware/amdgpu/kaveri_sdma.bin
/usr/lib/firmware/amdgpu/kaveri_sdma1.bin
/usr/lib/firmware/amdgpu/kaveri_uvd.bin
/usr/lib/firmware/amdgpu/kaveri_vce.bin

The system, log also reports that sensord can no longer get the APU temperature data.

These issues appear to be common to other distributions using the 4.18 kernel. Kernel version 4.19 is reported to resolve them.

Version-Release number of selected component (if applicable):
kernel-desktop-4.18.16-2.mga7

How reproducible:
just boot.

Steps to Reproduce:
1.
2.
3.
Comment 1 Richard Walker 2018-11-04 14:09:12 CET
Created attachment 10449 [details]
output from journalctl -b

Boot log shows all amdgpu and other system errors. The boot commandline is not fully recorded. 

These are the boot options used:
splash quiet noiswmd noresume pti=off radeon.si_support=0 radeon.cik_support=0 amdgpu.si_support=1 amdgpu.cik_support=1 amdgpu.fw_load_type=-1 amdgpu.exp_hw_support=1 amdgpu.dpm=1 amdgpu.dc=1 amdgpu.dc_log=1 drm.debug=0x4 nvidia_drm.modeset=0
Comment 2 Richard Walker 2018-11-05 18:40:52 CET
It looks like I made a schoolboy error talking about missing firmware references without checking the log I submitted above. Last week when I had been exploring the problem in the log I found no reference to the string "firmware". Now I see that /usr/lib/firmware/amdgpu/kaveri_uvd.bin and /usr/lib/firmware/amdgpu/kaveri_vce.bin are indeed found, and presumably loaded.

Sorry for the red herring about firmware.
Comment 3 Marja Van Waes 2018-11-06 12:24:24 CET
(In reply to Richard Walker from comment #0)

> 
> These issues appear to be common to other distributions using the 4.18
> kernel. Kernel version 4.19 is reported to resolve them.
> 

You can put a link to a bug report from another distribution about those issues  in the "See Also:" field ;-)


(In reply to Richard Walker from comment #2)
>  Now I see that /usr/lib/firmware/amdgpu/kaveri_uvd.bin and
> /usr/lib/firmware/amdgpu/kaveri_vce.bin are indeed found, and presumably
> loaded.
> 
> Sorry for the red herring about firmware.

No problem. Thanks for the correction!

Summary: AMD-A10 7860 Kaveri; amdgpu; VCE not responding => AMD-A10 7860 Kaveri; amdgpu; VCE not responding (kernel-4.19* should solve it)
CC: (none) => marja11
Assignee: bugsquad => kernel

Comment 4 Richard Walker 2018-11-06 20:17:03 CET
(In reply to Marja Van Waes from comment #3)

Thank you for pointing out the "See also" field. I have added a reference to a bug report on fedora-28 which indicates that the problem emerged for that user after updating from kernel 4.17 to 4.18.

See Also: (none) => https://bugs.freedesktop.org/show_bug.cgi?id=107910

Comment 5 Richard Walker 2018-11-08 03:14:39 CET
Created attachment 10456 [details]
boot log from kernel 4.19

I am a bit slow. Must be old age. I have just discovered the kernel-linus-4.19 package. I installed it and got a much nicer result. A quick glance through the log suggests all the amdgpu errors have gone away. I'll take a closer look on Thursday evening.
Comment 6 Richard Walker 2018-11-09 01:47:02 CET
I have been using kernel-linus-4.19 most of this evening and it has performed very well. The single most obvious improvement is how quickly my system now boots. I hadn't realised how much those amdgpu errors were slowing things down.

The only tiny downsides are possibly imaginary. There seems to be an error related to HDMI audio, which I don't use, but may be the reason my auto-starting qasmixer won't start in the LXDE system tray. It displays on the screen and must have its settings (start in system tray) turned off and on again. 

The other trivial issue is that nvidia-persistenced is failing, but that's OK. I can just modprobe nvidia-current when I need th GPU for Blender or Gimp.
Comment 7 Richard Walker 2018-11-12 00:56:24 CET
The 4.18 kernel has been replaced by 4.19.1 so this bug is no longer at issue for Mageia 7 Cauldron

Status: NEW => RESOLVED
Resolution: (none) => INVALID

Comment 8 Thierry Vignaud 2018-12-21 12:16:45 CET
OLD is better resolution I think :-)

Resolution: INVALID => OLD
CC: (none) => thierry.vignaud