Bug 33128 - Desktop / kernel driver i915 crashes when Composite is on and GPU under load
Summary: Desktop / kernel driver i915 crashes when Composite is on and GPU under load
Status: RESOLVED FIXED
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: 9
Hardware: All Linux
Priority: Normal normal
Target Milestone: ---
Assignee: Kernel and Drivers maintainers
QA Contact:
URL: https://alt.os.linux.mageia.narkive.c...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-04-22 16:44 CEST by Markus Robert Keßler
Modified: 2025-05-29 11:00 CEST (History)
1 user (show)

See Also:
Source RPM: kernel-6.6.22-1.mga9.src.rpm
CVE:
Status comment:


Attachments

Description Markus Robert Keßler 2024-04-22 16:44:35 CEST
Symptoms:

- Sporadic crashes during viewing of mp4 in firefox, any version
- Sporadic crashes during Teams(tm) conferences in chromium, any version
- Sporadic crashes when xscreensaver starts
- Reproducable crashes in Supertuxkart under load, esp. track "black forest"

==> In Supertuxkart desktop freezes, clock freezes, but music keeps playing
==> no control over machine, reboot by power-off needed


Machine according to neofetch:

OS: Mageia 9 x86_64
Host: LIFEBOOK E8310
Kernel: 6.6.22-desktop-1.mga9
Packages: 2974 (rpm)
Shell: bash 5.2.15
Resolution: 1920x1080
DE: Xfce 4.18
WM: Xfwm4
WM Theme: Silverado
Theme: Adwaita [GTK2/3]
Icons: Adwaita [GTK2/3]
CPU: Intel Core 2 Duo T7100 (2) @ 1.801GHz
GPU: Intel Mobile GM965/GL960
Memory: 1519MiB / 3919MiB

Kernel driver, according to lspci:

$ lspci -k
00:00.0 Host bridge: Intel Corporation Mobile PM965/GM965/GL960 Memory Controller Hub (rev 03)
        Subsystem: Fujitsu Limited. Device 13f2
00:02.0 VGA compatible controller: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller (primary) (rev 03)
        Subsystem: Fujitsu Limited. Device 13f5
        Kernel driver in use: i915
        Kernel modules: i915, intelfb
00:02.1 Display controller: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller (secondary) (rev 03)

Additional info:

1. Tested with 2 machines with exactly the same hardware.
   Same outcome both, no hardware defect

2. When Composite is turned off in Xfce4, (up to now) no crashes occured

3. Since Xfce4 and all its settings are user side (UID!=0),
   it looks like kernel driver not supporting given hardware properly

4. According to advice from David H., hence filing bug report against kernel

If further info would be helpful, please drop me a line.

Thanks in advance,
best regards,

Markus
Comment 1 katnatek 2024-04-22 19:57:59 CEST
if you think is a kernel issue, perhaps you could try the 6.6.28 in testing https://bugs.mageia.org/show_bug.cgi?id=33107 and/or the mesa packages
Comment 2 katnatek 2024-04-22 19:58:26 CEST
(In reply to katnatek from comment #1)
> if you think is a kernel issue, perhaps you could try the 6.6.28 in testing
> https://bugs.mageia.org/show_bug.cgi?id=33107 and/or the mesa packages

https://bugs.mageia.org/show_bug.cgi?id=33108
Comment 3 Markus Robert Keßler 2024-04-22 20:42:33 CEST
Hi, installed 6.6.28 and crash occured there too.
Meaning, this issue is not yet solved in that release.
Anyway, thanks!
Comment 4 Lewis Smith 2024-04-22 21:36:01 CEST
Thanks to you both for the advice & test.

@Markus
Can you clarify that these problems arose with the kernel version you cite, but not with the previous kernel (kernel-desktop-6.6.18-1.mga9 I think). You should be able to still chose the older kernel at boot time to check this.

CC: (none) => lewyssmith

Comment 5 Markus Robert Keßler 2024-04-22 23:02:33 CEST
Thanks!

Short: Issue started from first MGA9 kernel.

Regarding the history:
Until last summer, I worked with MGA7-x64 on same hardware. Composite did not crash, but desktop recording as well as desktop sharing were heavily buggy.
So, I upgraded to MGA9-x64.

One of the first kernel versions was 6.4.x and the crashes were there from the beginning.

First I expected that these issues will get fixed soon, and in the meantime I saw, that on a different hardware there is no such crash. See narkive link:
One of my other machines, ESPRIMO Mobile with  GPU: Intel Mobile 4 Series, uses the same kernel driver and does not crash.

So, i915 seems to have trouble with GPU: Intel Mobile GM965/GL960.
B.t.w., at least, desktop recording as well as desktop sharing are working now.

Best regards
Comment 6 Morgan Leijström 2024-04-22 23:12:31 CEST
Markus, which is the latest desktop kernel where this worked?
6.6.18 ?

Please also try the server and linux flavours of the kernels.

After test results lets assign this to kernel & driver maintainers.

CC: (none) => fri

Comment 7 Markus Robert Keßler 2024-04-22 23:32:42 CEST
@Morgan:
Since I skipped MGA8, I think it was 5.10.46 or so.
Comment 8 Lewis Smith 2024-04-24 21:06:42 CEST
"i915 seems to have trouble with GPU: Intel Mobile GM965/GL960"
looks a good conclusion.

But please do try the another kernel flavour if you can - especially 'kernel-linus', which is closer to the native thing.

Assigning anyway to kernel/drivers.

CC: lewyssmith => (none)
Assignee: bugsquad => kernel

Comment 9 Markus Robert Keßler 2024-04-25 16:22:57 CEST
Hi, tried 'kernel-linus' (which shows up as 6.6.28-1) with same result:
As soon as compositing is turned on, during the next lap of Supertuxkart the OS will crash. Anyway. It was worth a try, thanks!
Comment 10 Markus Robert Keßler 2024-05-04 08:48:14 CEST
Meanwhile I switched to non-compositing view, and hence, there was no more crash in the past two weeks :-) But still any userside process with UID>0 can crash the OS at any time. So, this is just a workaround and not a solution.

Can I do some debugging, tracing, or something like that in the meantime?
Maybe someone knows how to "look into the GPU" to access the registers etc.?

Thanks!
Comment 11 Markus Robert Keßler 2024-06-04 14:16:40 CEST
Any update so far?
Comment 12 sturmvogel 2024-06-04 22:27:02 CEST
According this information, the support in kernel for this hardware ended with kernel 5.17
https://linux-hardware.org/?id=pci:8086-2a03-17aa-20b5&page=2
https://cateee.net/lkddb/

The Intel support for this ancient hardware already ended in 2016
https://www.intel.com/content/www/us/en/support/articles/000005733/graphics.html

As compositing needs some system resources (actual OS have higher requirements as ancient ones), it is quite possible that this 18 years old hardware can no longer satisfy the requirements for modern compositors/software/OS...
Comment 13 Markus Robert Keßler 2024-06-04 22:37:19 CEST
Maybe this hardware is old, but in Xfce4 as desktop compositing is enabled by default (!) Besides this, compositing on this hardware never worked. Even in MGA7 video conferencing, screen recording etc. never worked properly.

Now in MGA9, simplescreenrecoring works, but the OS crashes under load
Comment 14 sturmvogel 2024-06-04 22:45:29 CEST
(In reply to Markus Robert Keßler from comment #0)
> Symptoms:
> - Reproducable crashes in Supertuxkart under load, esp. track "black forest"
> 
> ==> In Supertuxkart desktop freezes, clock freezes, but music keeps playing
> ==> no control over machine, reboot by power-off needed


The Intel Mobile GM965/GL960 does not even come near the minimum hardware requirements for Supertuxkart.

So it seems quite logic that the system locks up/crashes when modern applications/features, which heavily depend on the GPU hardware, are involved. Browsers, games, video conferencing tools nowadays depend on features which are not available on 18 years old hardware...
Comment 15 Morgan Leijström 2024-06-04 22:54:35 CEST
Sidenote / on a higher level:
It is still bad that the system, also for current hardware and software can not handle this type of overload gracefully.  For example on my workstation with a GTX750 and nvidia drivers, using 4Kscreen, with some applications up and launch teapot and drag and resize the teapot to full screen and back, Plasma desktop manager sometimes dies when i test that - i can only use applications on visible desktop, so for example can not switch between the virtual desktops.  It have always been like that on this system...
Comment 16 Morgan Leijström 2024-06-04 22:56:16 CEST
...and we can not possible fix that.

Status: NEW => RESOLVED
Resolution: (none) => WONTFIX

Comment 17 Markus Robert Keßler 2024-06-06 10:19:29 CEST
Hm, it is not only about Supertuxkart.

As statet som time ago, even firefox crashes the OS when playing mpeg movies, and M$ Teams meetings in chromium webbrowser can also crash the OS, when compositing is (accidentally) left active (which is the default in Xfce4).

So, if compositing is the culprit, then (in my opinion) it would be best to disable this feature in the kernel driver itself.

Any better idea?
Comment 18 sturmvogel 2024-06-07 06:52:59 CEST
That would mean punishing 95% of users with decent hardware…

As software developement evolves, hardware requirements increases. If you would keep backwards compatibility forever, there wouldn‘t be any actual developement. Ne software dependends on featureswhich are not available on museum pieces…
Comment 19 Markus Robert Keßler 2024-06-07 08:52:26 CEST
> museum pieces

LOL -- I am doing my daily business with machines like that.

Always keep in mind: One of the reasons behind deploying Linux rather that M$ is that one does NOT have to replace the whole IT every 2 years
Comment 20 Markus Robert Keßler 2024-06-08 21:34:55 CEST
Reopened because this is a severe bug which has the potential to crash the whole OS if any user has switched on compositing. Xfce4 for example has compositing on as default.
This must not be.

At least and if no other solution feasible, compositing for exactly this (hardware) GPU should be disabled. Please try to fix, at least try to implement such workaround.

Thanks!

Resolution: WONTFIX => (none)
Status: RESOLVED => REOPENED

Comment 21 Morgan Leijström 2024-06-08 22:08:38 CEST
Old is good yes my bookkeeping laptop is from 2007, similar age of "music player" laptop.

I think we could consider an errata entry for mga9.

This is not my cup of tea, butt it seems to me that it is not our tools that should try to anticipate when systems break, but the kernel/driver should not break like this at load.
Comment 22 sturmvogel 2024-06-08 23:50:02 CEST
(In reply to Markus Robert Keßler from comment #20)
> At least and if no other solution feasible, compositing for exactly this
> (hardware) GPU should be disabled. Please try to fix, at least try to
> implement such workaround.
Downstream fixing no longer supported hardware is no solution. If you want such a change, you need to open upstream bugreports at the relevant projects: XFCE, kernel, ...

This is a waste of time of the scarce recources left at Mageia.
Comment 23 sturmvogel 2024-06-09 00:00:21 CEST
Additionally, there is no "fix" needed to do this. There is enough documentation available how to disable the compositor on Xfce by default. So you only need to change a setting and you should be able to run this ancient hardware without compositor on Xfce...
Comment 24 Markus Robert Keßler 2024-06-09 09:20:03 CEST
Yes, it is possible to tweak the defaults to prevent such crashes. But
to say it clearly:

Any OS has to work inherently stable. Without intervention from userside
Comment 25 Markus Robert Keßler 2025-05-28 13:39:20 CEST
Hello all,

after recent update around 3..4 days ago, I cannot make my mga9x64 machines crash anymore: Supertuxkart runs with same speed as before updating and with no crash, and when composite is on, i.e. windows elements transparent, the desktop is slow but also stable.

Config:

$ uname -a
Linux mga9x64-lb1 6.6.88-desktop-3.mga9 #1 SMP PREEMPT_DYNAMIC Sat Apr 26 22:17:20 UTC 2025 x86_64 GNU/Linux
$ rpm -qa | grep -i mesa
lib64mesaglu1-9.0.2-3.mga9
libmesagl1-23.1.5-2.mga9
lib64mesavulkan-drivers-25.0.6-2.mga9.tainted
lib64mesagl1-25.0.6-2.mga9.tainted
mesa-25.0.6-2.mga9.tainted
lib64mesaegl1-25.0.6-2.mga9.tainted
lib64osmesa8-25.0.6-2.mga9.tainted

I don't know if a kernel related issue has been fixed, or if some of the mesa drivers are repaired. Issue seems gone anyway.

Thanks!
Comment 26 Markus Robert Keßler 2025-05-29 11:00:57 CEST
--

Resolution: (none) => FIXED
Status: REOPENED => RESOLVED


Note You need to log in before you can comment on or make changes to this bug.