Bug 29704 - Kernel 5.15.4 + Nouveau = flickering Plasma DE
Summary: Kernel 5.15.4 + Nouveau = flickering Plasma DE
Status: RESOLVED FIXED
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: 8
Hardware: All Linux
Priority: Normal normal
Target Milestone: ---
Assignee: Kernel and Drivers maintainers
QA Contact:
URL: https://bbs.archlinux.org/viewtopic.p...
Whiteboard:
Keywords:
Depends on: 29777
Blocks:
  Show dependency treegraph
 
Reported: 2021-11-27 01:07 CET by John L. ten Wolde
Modified: 2021-12-22 00:30 CET (History)
4 users (show)

See Also:
Source RPM: kernel-5.15.4-1.mga8.src.rpm
CVE:
Status comment:


Attachments
'kscreen' grep'ed from journal (14.93 KB, text/plain)
2021-12-09 07:29 CET, John L. ten Wolde
Details

Description John L. ten Wolde 2021-11-27 01:07:25 CET
Hey all.  Here's a heads-up that the new Kernel 5.15.x series doesn't play nice with our current* Nouveau driver.

TL;DR WORKAROUND:  For now, revert back to using Kernel 5.10.78.

Kernel 5.15.x (presently up to 5.15.7 according to info on the 'Net) interacts badly with the Nouveau driver leading to desktop flickering on mouse-over of various widgets (scroll bars, spin boxes, drop-down menus etc.), when the mouse cursor morphs (e.g. from the arrow to the hand), while tab switching between applications or desktops; and with constant-motion full-screen apps, such as video players or games.

In my experience the flickering is either a split-second blink to complete blackness or that the top-most (active) window (especially one that's full-screen) becomes momentarily (semi-)transparent so it reveals the other windows and the desktop beneath it.  It's not a crippling situation but definitely annoying.

I've included a link to an Arch forum thread which, from the descriptions of the symptoms given, most closely match my own experience.

What I've also gathered from various discussions at other distros' forums is that this issue is Plasma, GL, and/or compositor related; it adversely effects X11 but Wayland appears exempt; and that users with Intel GPUs might also be adversely effected, specifically those with an i915 and *especially* those with NVidia+Intel hybrid laptops.

According to other reports, this problem first manifested with a change to the 5.12.x series of kernels; 5.11.x series kernels and earlier are exempt.  The advice given universally is to revert back to a pre-5.12 kernel.  In our case, as already stated above, that's to continue using Kernel 5.10.78 until (presumably) an update to the Nouveau drivers becomes available.


Well, that's it.  I hope this report was informative or useful.  Thanks to everyone for all their hard work.



----
* "current" Nouveau packages I had installed at the time I reported this bug:

    lib64drm_nouveau2-2.4.107-3.mga8
    libdrm_nouveau2-2.4.107-3.mga8
    lib64vdpau-driver-nouveau-21.2.4-2.mga8
    x11-driver-video-nouveau-1.0.17-1.mga8
John L. ten Wolde 2021-11-27 01:41:39 CET

Summary: Kernel 5.15 + Nouveau = flickering Plasma DE => Kernel 5.15.4 + Nouveau = flickering Plasma DE

Comment 1 Dimitrios Glentadakis 2021-11-27 08:28:56 CET
I have the same problem with LXQt + kwin

CC: (none) => dglent

Comment 2 Thomas Backlund 2021-11-27 10:02:59 CET
does it work better if you switch from nouveau to modesetting driver ?
Comment 3 John L. ten Wolde 2021-11-27 23:48:59 CET
(In reply to Thomas Backlund from comment #2)
> does it work better if you switch from nouveau to modesetting driver ?

Somewhat.

I gave modesetting a try for a few hours (something I'd never done before).  It's behaviour is significantly more stable, but not completely.  Full-screen apps and videos no longer flicker, but certain other apps and the desktop still do.  When it happens in a non-maximized window the flicker goes to all black (rather than transparent) and remains entirely confined *within* the window's borders as opposed to flashing across the whole screen as happens with nouveau.  Strangely, the worst culprit I could find among all the flickering non-maximized apps was KDE's own System Settings.

It seems to me that when using nouveau the flicker is caused by *any* sudden update to any part of the display (so shifting frames in a video or game for example).  But when using modesetting it's caused almost exclusively by mouse-over: when the mouse morphs; when sub-menus or sub-windows spawn (I'm using Folder View mode for my desktop, so an example is when sub-folders open by clicking on the parent folder's icon); and especially when a tooltip or preview is raised.  With either driver, it seems that the more windows are stacked over top of one another the greater the likelihood of flicker, but the odds are far higher with nouveau than with modesetting.

Also, with modesetting, I noticed that most of the flickering only happens once:  the first time a tooltip or sub-window was spawned but not if it had to respawn again.  Perhaps because it was already in the frame buffer or VRAM (or however this stuff works)?  The glaring exception to this was the System Settings window, which continued to flicker no matter what.  This might be true using nouveau as well, but the overall flickering is so much worse that I probably didn't notice.

I should perhaps mention that the hardware I'm dealing with here is pretty old:  a GeForce 9600M (NV96 Tesla) from circa 2007 and that I experience a fairly perceptible performance hit by switching to modesetting.  The mouse cursor seems sluggish, and panning large, hi-resolution images in Gwenview was incredibly choppy (slow redraw/refresh with tearing).  I wonder if the overall lag with modesetting only makes it *appear* more stable than nouveau but it isn't actually any better.

Given that switching didn't eliminate the flicker, plus the performance hit, I've reverted the kernel and switched back to nouveau for Xorg again.
John L. ten Wolde 2021-11-27 23:49:16 CET

CC: (none) => johnltw

Comment 4 Dimitrios Glentadakis 2021-11-28 08:11:45 CET
With modesetting i didn't have the flickering (LXQt DE)
I remarked some slowness in youtube, videos, Gwenview too
Comment 5 Frank Griffin 2021-11-29 16:31:06 CET
I've seen this recently in Plasma, but using the Intel driver, not nouveau.  It's extremely sporadic, and can be remedied by switching to a tty and back again. 

Assigning to kernel group to start.

CC: (none) => ftg
Assignee: bugsquad => kernel

Comment 6 rexy 2021-12-02 09:37:31 CET
Hi,
Same problem detected on all Lenovo laptop with i915 video driver. Work fine when booting on previous kernel (5.10.78-desktop-1).

See journalctl with 2 events :
09:18:18 localhost.localdomain kernel: Asynchronous wait on fence 0000:00:02.0:Xorg[1752]:9ce0 timed out (hint:intel_atomic_commit_ready [i915])
09:18:18 localhost.localdomain kernel: Asynchronous wait on fence 0000:00:02.0:Xorg[1752]:9ce0 timed out (hint:intel_atomic_commit_ready [i915])
09:18:23 localhost.localdomain kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:85dffffb, in plasmashell [3973]
09:18:23 localhost.localdomain kernel: i915 0000:00:02.0: [drm] Resetting rcs0 for stopped heartbeat on rcs0
09:18:23 localhost.localdomain kernel: i915 0000:00:02.0: [drm] plasmashell[3973] context reset due to GPU hang
09:18:38 localhost.localdomain plasma_waitforname[4015]: org.kde.knotifications: WaitForName: Service was not registered within timeout


09:19:52 localhost.localdomain kernel: Asynchronous wait on fence 0000:00:02.0:Xorg[1752]:caa8 timed out (hint:intel_atomic_commit_ready [i915])
09:19:52 localhost.localdomain kernel: Asynchronous wait on fence 0000:00:02.0:Xorg[1752]:caa8 timed out (hint:intel_atomic_commit_ready [i915])
09:19:57 localhost.localdomain kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:85dffffb, in plasmashell [3973]
09:19:57 localhost.localdomain kernel: i915 0000:00:02.0: [drm] Resetting rcs0 for stopped heartbeat on rcs0
09:19:57 localhost.localdomain kernel: i915 0000:00:02.0: [drm] plasmashell[3973] context reset due to GPU hang

CC: (none) => richard

Comment 7 John L. ten Wolde 2021-12-04 03:01:18 CET
I'm getting the impression that the symptoms being seen with the Intel (i915) driver (Frank in Comment 5 and rexy in Comment 6) are *very* different from what I'm experiencing and might be better off placed into their own dedicated bug.

I tried Frank's suggestion of switching to a TTY and back (Comment 5) not long after he posted it, but it did nothing for my situation.  That's too bad really, because I remember exactly that being the workaround to a graphics glitch on this same machine about a decade ago when both it and KDE4 were new.

Anyway, I've been running on Kernel 5.15.4 -- putting up with the flickering -- exclusively since Frank posted his suggestion to see if I could get a better understanding of what's going on with my machine, and over the last 3 or 4 days have collected a raft of new observations.

First off, after having used it for several days straight, it seems to me that, despite the glitchiness, nouveau is much faster and more responsive now than with the older kernels.  I think that once this bug gets resolved, I'm going to be very pleased with the results. :)

@Thomas Backlund :  It turns out that the behaviour of the flickering I see with nouveau is, in fact, *identical* to what I described in Comment 3 with the modesetting driver.  The only difference is that with noueveau everything is so much faster that it gave me the erroneous impression it was behaving differently. For example, it turns out the flickering is *always* confined to the inside of the active window.

With nouveau, just as with the modesetting driver, the worst culprit for the flickering is KDE's own System Settings window.  Exploring this further, I began to realize that the only apps that flicker are (almost exclusively) Qt5 ones. Yet, many Qt5 apps *don't* flicker at all.  Okular, for example is perfectly usable.

Videos.  The only videos that flicker during playback are ones larger than my monitor's native resolution.  And not even that is consistent.  For fun, I tried playing a 4K video.  4K is *way* beyond the capabilities of this old laptop, and yet that video ran faster and with less choppiness than under any previous kernel version.  I'm completely baffled.  (These experiments were conducted using mplayer from the command line.)

Games.  The link to the Arch Linux forums I provided above leads, via another link, to a complaint that Sauerbraten suffers from this same bug.  Well, I don't have that game installed, but I do have Minetest handy, so I thought, a 3D game is a 3D game.  As it turns out, Minetest runs *perfectly*.  And more smoothly.  And faster than I've ever seen it run before.  Weird.

So then I fired up Endless-Sky (a simple 2D, top-down space trucker game).  It should run fine and faster too, right?  Wrong!  This one flickers to the point of being completely unplayable (and potentially seizure inducing).  Normally, when you jump your ship into hyperspace, the star field in the background blurs, there's a bright flash and the sound of a thunder clap.  Currently, when it comes time for the bright flash, the screen completely destabilizes -- flickering white-black-white-black repeatedly.  From this point onward, the game's UI components flicker to black with *every* slight movement of the mouse ultimately rendering the game completely unplayable.  Sad. :(

I'm really lost on what the common factor with the flickering is, but I'm still leaning toward believing it involves layered graphics (stacked windows, interface components, etc.).  I said above that the flickering appears to affect Qt5 apps *almost exclusively*.  I say "almost" because Firefox (GTK3) never flickers except on certain sites that also use layered components.  The most consistent culprit I've noticed so far has been Reddit threads that open and hover above the list of other topics in the background.  If the hovering thread contains an image, quickly mouse-wheeling the thread up and down will cause the inside of Firefox's wondow to flicker to white when the edge of the image first comes into view.  Like I said in Comment 3 regarding tool-tips and sub-windows, once the image has loaded into VRAM (or whatever) the flickering stops and won't happen again.  And I'm guessing Firefox flickers to white because that's its default background colour, while KDE's Plasma tools flicker to black because that's the background colour my desktop is set to (?).

But wait!  I haven't even touched on the "funnest" of the bad news yet.  As it turns out, I can no longer allow this machine to go to screensaver or use the screen locker.  The moment the monitor powers down to standby, something horrible happens to plasmashell.  What it is that happens exactly, I'm not sure, but when the monitor powers back up, there's no screen locker and no longer any desktop.  Just a black background and my (working) mouse cursor.  X11 is clearly still running, and according to top and ps, so is plasmashell.  And as far as I can tell, there's nothing out of the ordinary reported in the journal or dmesg about any of this.  At this point, my only recourse is to kill X and log in again for a new session.

The last craziest thing I've experienced so far was yesterday when Mageia was performing updates (perl and then bluez).  I had multiple apps open on multiple desktops.  I was selecting some text to edit with the shift+arrow keys when suddenly *BOOM!* -- the X server died and I was thrown back to the login screen.  I have no idea what prompted the crash but the journal from that boot is just weird.  Within a span of two seconds leading up to it, a myriad of KDE apps (from kaccess to krunner) all began repeating these same complaints:

  ┌────
  │ Qt: failed to retrieve the virtual modifier names from XKB
  │ Qt: failed to retrieve the virtual modifier map from XKB
  │ qt.qpa.xkeyboard: failed to compile a keymap
  │ The X11 connection broke: I/O error (code 1)
  │ The X11 connection broke (error 1). Did the X11 server die?
  └────

Then there's a red line with what looks like kded5 complaining my Compose Key couldn't be assigned to my Right Alt:

  ┌────
  │ kded5[2075]: org.kde.kcm_keyboard: Failed to run "/usr/bin/setxkbmap \
  │                -layout us,gb -option -option compose:ralt" return code: 1
  └────

This all appears to culminate with:

  ┌────
  │ kdeinit5[2058]: kdeinit5: Fatal IO error: client killed
  │ kdeinit5[2058]: kdeinit5: sending SIGHUP to children.
  │ kdeinit5[2058]: kdeinit5: sending SIGTERM to children.
  │ kdeinit5[2058]: kdeinit5: Exit.
  └────

So my keyboard "broke"?  WTF?

I think the above was just a weird coincidence and had nothing to do with the nouveau flickering, but it's the first time I've ever experienced X crapping out this way.  Oddly, for that same boot, but hours later, the journal also records this:

  ┌────
  │ kernel: nouveau 0000:01:00.0: fifo: CACHE_ERROR - ch 5 [Xorg[6333]] \
  │           subc 0 mthd 0060 data beef0201
  └────

I don't recall noticing anything out of the ordinary at the time this occurred, and it's the *only* unusual "red line" error concerning the nouveau driver to appear in my journals so far since 5.15.4 arrived.

Weird kernel bugs are just weird...
Comment 8 John L. ten Wolde 2021-12-05 02:09:21 CET
Aha!  A workaround to stop the paralyzing flickering in Endless-Sky is to block compositing, either with a flag in its specific Application Settings or by activating "Allow applications to block compositing" in System Settings » Display and Monitor » Compositing.
Comment 9 John L. ten Wolde 2021-12-06 01:05:23 CET
I just rebooted after updating to Kernel 5.15.6 along with all its accompanying goodies.  I was so hopeful, but unfortunately there's no improvement.  The flickering continues.
Comment 10 Dimitrios Glentadakis 2021-12-07 05:20:43 CET
Tried with kernel 5.15.6 and LXQt - Openbox - Picom and the flickering is also reproducible.
Comment 11 rexy 2021-12-07 08:35:13 CET
Hi,
It's ok for me, no more untimely blockages with the kernel 5.16.6 (i915 driver)
Thx
Comment 12 Thomas Backlund 2021-12-07 18:08:32 CET

I've submitted a kernel-5.15.6-3.mga8 (currently building) to Mga8 Core Updates Testing with a revert of a commit that is suspected to cause this regression...

Please test when it's available (in about 3-5 hours or so...)
Comment 13 John L. ten Wolde 2021-12-08 02:08:43 CET
(In reply to Thomas Backlund from comment #12)
> 
> I've submitted a kernel-5.15.6-3.mga8 (currently building) to Mga8 Core
> Updates Testing with a revert of a commit that is suspected to cause this
> regression...
> 
> Please test when it's available (in about 3-5 hours or so...)


Partial success.

The good:  The flickering is gone!  At least I can no longer wilfully reproduce it.  I removed the compositor block from Endless-Sky and it runs as it did pre 5.15.x again.  So far so good.

The bad:  I still can't go to screensaver or activate the screen locker.  After the monitor powers back up after having gone to standby, plasmashell still goes into that weird limbo leaving nothing but a black screen but working mouse cursor.  Killing X and logging in with a new session is the only way around it.  Perhaps this is an unrelated issue?

Also, though I've only played around with this test kernel for at most a quarter hour so far, I'm a bit worried that with the good-riddance to the flickering I've also lost my newly acquired graphics hyper-acceleration.  Seriously, these last few days have been like having a shiny new GPU in this old machine.  I'm understandably curious, what was the commit that causes the flickering intended to fix or improve?

Anyway, thank you for your continued efforts working toward resolving this bug.
Comment 14 Dimitrios Glentadakis 2021-12-08 07:19:03 CET
Yes it is ok now with this kernel, no flickering any more
Comment 15 Thomas Backlund 2021-12-08 21:34:33 CET
please also test with kernel-5.15.6-4.mga8 (currently building), as it has an alternative fix for this issue
Comment 16 John L. ten Wolde 2021-12-09 07:29:23 CET
Created attachment 13029 [details]
'kscreen' grep'ed from journal

(In reply to Thomas Backlund from comment #15)
> please also test with kernel-5.15.6-4.mga8 (currently building), as it has
> an alternative fix for this issue

From my perspective 6-4 behaves *identically* to 6-3.  No flickering, but plasmashell still craps out the second the screen locker powers down the monitor.

I don't know if this will be helpful, but I've grep'ed the journal of the current boot (test-kernel 6-4) for 'kscreen' and attached the results.  I have PowerDevil set to power the monitor down after 3 minutes.  If I read my own results correctly, I locked the screen at 22:42:52 and woke the monitor up again at 22:48:27.  The first thing asked is indeed, "Did the X11 server die?"
Comment 17 Dimitrios Glentadakis 2021-12-09 17:14:33 CET
No flickering with the 5.15.6-desktop-4.mga8
I don't have a use case to test other things as John has noticed.
I am in a LXQt DE with Openbox and picom compositor
Thomas Backlund 2021-12-17 20:55:16 CET

Depends on: (none) => 29777

Comment 18 Thomas Backlund 2021-12-22 00:30:21 CET
An update for this issue has been pushed to the Mageia Updates repository.

https://advisories.mageia.org/MGASA-2021-0574.html

Resolution: (none) => FIXED
Status: NEW => RESOLVED


Note You need to log in before you can comment on or make changes to this bug.