Bug 32066 - [radeon HD3470 rv620 1002:95c0][regression] GPU lockup, ring 0 stalled using modesetting DIX display driver; radeon: The kernel rejected CS
Summary: [radeon HD3470 rv620 1002:95c0][regression] GPU lockup, ring 0 stalled using ...
Status: RESOLVED FIXED
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: Cauldron
Hardware: x86_64 Linux
Priority: Normal normal
Target Milestone: ---
Assignee: Kernel and Drivers maintainers
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-07-05 07:31 CEST by Felix Miata
Modified: 2023-08-14 14:21 CEST (History)
2 users (show)

See Also:
Source RPM: x11-server-21.1.8-7.mga9.src.rpm
CVE:
Status comment:


Attachments
.zip of dmesg, journal & Xorg.0.log* (81.19 KB, application/octet-stream)
2023-07-05 07:31 CEST, Felix Miata
Details

Description Felix Miata 2023-07-05 07:31:12 CEST
Created attachment 13904 [details]
.zip of dmesg, journal & Xorg.0.log*

Original Summary:
[radeon HD3470 rv620 1002:95c0][regression] GPU lockup, ring 0 stalled using modesetting DIX display driver; radeon: The kernel rejected CS

Description of problem:
Attempting to use modesetting DIX with IceWM, SDDM or Plasma, display may produce compete corruption, or sessions may appear to start normally, but UI is non-responsive. Dmesg is flooded with GPU lockup messages. e.g.
kernel: radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000000e last fence id 0x0000000000000013 on ring 0).
Journal is flooded with:
Jul 04 01:23:55 big31 plasmashell[6549]: radeon: The kernel rejected CS, see dmesg for more information (-16).

Version-Release number of selected component (if applicable):
x11-server-common-21.1.8-7.mga9

Steps to Reproduce:
1-Uninstall x11-driver-video-ati, and/or configure use of modesetting_drv.so via /etc/X11/xorg.con*. 
2-Try to run Xorg

# inxi -SGaz
System:
  Kernel: 6.3.9-desktop-2.mga9 arch: x86_64 bits: 64 compiler: gcc v: 12.3.0
    parameters: root=LABEL=p096mga9 audit=0 ipv6.disable=1 net.ifnames=0
    noresume plymouth.enable=0 consoleblank=0 preempt=full mitigations=off
  Desktop: KDE Plasma v: 5.27.5 tk: Qt v: 5.15.7 wm: kwin_x11 vt: 1 dm: SDDM
    Distro: Mageia 9
Graphics:
  Device-1: AMD RV620 PRO [Radeon HD 3470] vendor: Dell C120D driver: radeon
    v: kernel alternate: amdgpu arch: TeraScale code: R6xx/RV6xx/RV7xx
    process: TSMC 55-65nm built: 2005-13 pcie: gen: 1 speed: 2.5 GT/s
    lanes: 16 ports: active: DP-1,DP-2 empty: none bus-ID: 01:00.0
    chip-ID: 1002:95c0 class-ID: 0300 temp: 81.0 C
  Display: x11 server: X.org v: 1.21.1.8 with: Xwayland v: 22.1.9
    compositor: kwin_x11 driver: X: loaded: radeon
    unloaded: fbdev,modesetting,vesa dri: swrast gpu: radeon display-ID: :0
    screens: 1
  Screen-1: 0 s-res: 4240x1440 s-dpi: 120 s-size: 897x304mm (35.31x11.97")
    s-diag: 947mm (37.29")
  Monitor-1: DP-1 mapped: DisplayPort-0 pos: primary,left
    model: Acer K272HUL serial: <filter> built: 2018 res: 2560x1440 hz: 60
    dpi: 109 gamma: 1.2 size: 598x336mm (23.54x13.23") diag: 686mm (27")
    ratio: 16:9 modes: max: 2560x1440 min: 720x400
  Monitor-2: DP-2 mapped: DisplayPort-1 pos: right model: Dell P2213
    serial: <filter> built: 2012 res: 1680x1050 hz: 60 dpi: 90 gamma: 1.2
    size: 473x296mm (18.62x11.65") diag: 558mm (22") ratio: 16:10 modes:
    max: 1680x1050 min: 720x400
  API: OpenGL v: 4.5 Mesa 23.1.3 renderer: llvmpipe (LLVM 15.0.6 128 bits)
    direct-render: Yes

I attempted reproduction on Radeon HD 4650 rv730 1002:9498 in a different PC, but failed to reproduce. I succeeded to reproduce the error messages in both dmesg and journal by replacing the HD 3470 with a HD 2400 rv610 1002:94c3, but SDDM and Plasma are marginally functional given enough patience.

Fedora 38 k6.3.11 plasma 5.27.6 KDM on same PC has vaguely similar trouble regardless of kernel, but openSUSE TW k6.3.9 plasma 5.27.6 KDM3 does not.

I've been using the modesetting DIX on most hardware it supports for several years. This is a rare failure in my experience. Last use without failure was 29 May using 6.2.12 kernel and Xorg 1.21.1.8. PC was not used with mga9 until auto-selecting 3 July. Behavior is roughly equivalent with 6.2.12, 6.3.9 & 6.4.1 kernel-desktops.

When the modesetting DIX worked as expected, SDDM's login window appeared only on the primary display. Using radeon DDX, SDDM login window & calendar unexpectedly appear on both displays, while I/O only appears on the primary.
Comment 1 Morgan Leijström 2023-07-05 11:43:03 CEST
Thank you for the detailed report.

Setting this directly to kernel and drivers.

Assignee: bugsquad => kernel

Comment 2 Felix Miata 2023-07-06 07:23:28 CEST
(In reply to Felix Miata from comment #0)
> Fedora 38 k6.3.11 plasma 5.27.6 KDM on same PC has vaguely similar trouble
> regardless of kernel, but openSUSE TW k6.3.9 plasma 5.27.6 KDM3 does not.

In review, F38 differs little, so I reported there too:
https://bugzilla.redhat.com/show_bug.cgi?id=2220717
while Tumbleweed is OK:
# inxi -SGaz
System:
  Kernel: 6.3.9-1-default arch: x86_64 bits: 64 compiler: gcc v: 13.1.1
    parameters: root=LABEL=p096stw5 ipv6.disable=1 net.ifnames=0 noresume
    consoleblank=0 preempt=full mitigations=off
  Desktop: KDE Plasma v: 5.27.6 tk: Qt v: 5.15.10 wm: kwin_x11 vt: 7 dm:
    1: KDM 2: XDM Distro: openSUSE Tumbleweed 20230629
Graphics:
  Device-1: AMD RV620 PRO [Radeon HD 3470] vendor: Dell C120D driver: radeon
    v: kernel alternate: amdgpu arch: TeraScale code: R6xx/RV6xx/RV7xx
    process: TSMC 55-65nm built: 2005-13 pcie: gen: 1 speed: 2.5 GT/s
    lanes: 16 ports: active: DP-1,DP-2 empty: none bus-ID: 01:00.0
    chip-ID: 1002:95c0 class-ID: 0300 temp: 81.0 C
  Display: x11 server: X.Org v: 21.1.8 with: Xwayland v: 23.1.2
    compositor: kwin_x11 driver: X: loaded: modesetting unloaded: fbdev,vesa
    gpu: radeon display-ID: :0 screens: 1
  Screen-1: 0 s-res: 4240x1440 s-dpi: 120 s-size: 897x304mm (35.31x11.97")
    s-diag: 947mm (37.29")
  Monitor-1: DP-1 pos: primary,left model: Acer K272HUL serial: <filter>
    built: 2018 res: 2560x1440 hz: 60 dpi: 109 gamma: 1.2
    size: 598x336mm (23.54x13.23") diag: 686mm (27") ratio: 16:9 modes:
    max: 2560x1440 min: 720x400
  Monitor-2: DP-2 pos: right model: Dell P2213 serial: <filter> built: 2012
    res: 1680x1050 hz: 60 dpi: 90 gamma: 1.2 size: 473x296mm (18.62x11.65")
    diag: 558mm (22") ratio: 16:10 modes: max: 1680x1050 min: 720x400
  API: OpenGL v: N/A renderer: N/A direct-render: N/A

So is Bookworm:
# inxi -GS
System:
  Host: big31 Kernel: 6.1.0-9-amd64 arch: x86_64 bits: 64 Desktop: Trinity
    v: R14.1.0 Distro: Debian GNU/Linux 12 (bookworm)
Graphics:
  Device-1: AMD RV620 PRO [Radeon HD 3470] driver: radeon v: kernel
  Display: x11 server: X.Org v: 1.21.1.7 driver: X: loaded: modesetting
    dri: r600 gpu: radeon resolution: 1: 2560x1440~60Hz 2: 1680x1050~60Hz
  API: OpenGL v: 3.3 Mesa 22.3.6 renderer: AMD RV620 (DRM 2.50.0 /
    6.1.0-9-amd64 LLVM 15.0.6)
Comment 3 Guillaume Bedot 2023-07-30 17:16:05 CEST
Latest mesa fixes some regressions regarding r600.

Now I can use gnome/x11, but i still can't connect a gnome/wayland session

juil. 30 16:55:39 <hostname> gnome-shell[5018]: (EE) could not connect to wayland server
juil. 30 16:55:39 <hostname> kernel: traps: gnome-shell[4723] general protection fault ip:7f5dab9f104a sp:7ffdddc5f850 error:0 in libgobject-2.0.so.0.7600.3[7f5dab9c7000+32000]

CC: (none) => guillaume.bedot

Comment 4 Felix Miata 2023-07-30 20:51:03 CEST
No fault found any longer with RV620:
# dmesg | grep lockup
# dmesg | grep stalled
# journalctl -b | grep rejected
# rpm -qa | grep mesa
lib64mesaegl1-23.1.4-2.mga9
lib64mesagl1-23.1.4-2.mga9
lib64mesaglu1-9.0.2-3.mga9
lib64mesavulkan-drivers-23.1.4-2.mga9
mesa-23.1.4-2.mga9
# uname -r
6.4.7-desktop-3.mga9
Comment 5 Guillaume Bedot 2023-08-14 13:34:35 CEST
with mesa 23.1.5 (and radeon hd 3200), it works now
Comment 6 Thomas Backlund 2023-08-14 14:21:51 CEST
.

Resolution: (none) => FIXED
Status: NEW => RESOLVED


Note You need to log in before you can comment on or make changes to this bug.