Bug 32930

Summary: kernel-6.6.18-1.mga9 causes cores to hit 100%
Product: Mageia Reporter: aguador <waterbearer54>
Component: RPM PackagesAssignee: Kernel and Drivers maintainers <kernel>
Status: RESOLVED INVALID QA Contact:
Severity: normal    
Priority: Normal CC: fri, ghibomgx, kde, lewyssmith
Version: 9   
Target Milestone: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Source RPM: kernel-6.6.18-1.mga9.src.rpm CVE:
Status comment:
Attachments: Results of kernel tests
CPU_frequency_tests
System xorg.conf
System information

Description aguador 2024-03-05 10:56:19 CET
This kernel, like 6.6.14-2, sends at least one core to 100% use (visible in a conky monitor) from time to time.

Both kernels appeared to work well in Cauldron on my 2nd gen i5 laptop, but will not yield a usable system on my 10th gen i7 machine as it causes "freezes" so frequently. I have also tested 6.6.14-2 on a Pentium 4200 Mga9 system and after running a long time it too will cause the system to freeze for several seconds from time to time, even with only one program active.

Here is the system information on the i7 machine when running the 6.5.13 core (which seems to run flawlessly):

$ inxi -v 1
System:
  Host: localhost Kernel: 6.5.13-desktop-6.mga9 arch: x86_64 bits: 64
    Desktop: Enlightenment v: 0.25.4 Distro: Mageia 9
CPU:
  Info: quad core Intel Core i7-10510U [MT MCP] speed (MHz): avg: 802
    min/max: 400/4900
Graphics:
  Device-1: Intel CometLake-U GT2 [UHD Graphics] driver: i915 v: kernel
  Device-2: NVIDIA GP108M [GeForce MX250] driver: nouveau v: kernel
  Device-3: Chicony HD Webcam type: USB driver: uvcvideo
  Display: x11 server: X.org v: 1.21.1.8 with: Xwayland v: 22.1.9 driver: X:
    loaded: intel,v4l dri: i965 gpu: i915 resolution: 1920x1080~60Hz
  API: OpenGL v: 4.6 Mesa 23.3.5 renderer: Mesa Intel UHD Graphics (CML GT2)
Drives:
  Local Storage: total: 953.88 GiB used: 220.84 GiB (23.2%)
Info:
  Processes: 240 Uptime: 5m Memory: 7.63 GiB used: 1.23 GiB (16.1%)
  Shell: Bash inxi: 3.3.26

Note that at present I am using only Intel graphics.
Comment 1 Morgan Leijström 2024-03-05 11:22:45 CET
Thank you for reporting

Can you see what process that seem to use the core to 100%

Do you see something interesting in journal?

CC kernel maintainer

CC: (none) => fri, ghibomgx

Comment 2 Morgan Leijström 2024-03-05 11:23:58 CET
You could try our less patched linus kernel flavour and see if that is a difference.
Comment 3 aguador 2024-03-05 14:46:04 CET
1) Linus kernel has the same issue
2) kworker/7:2pm is grabbing the processor (gst-plugins-scanner does in the early phases of startup on this kernel, not the 6.5.13 one)

Two more observations:

In the journal there is a note about enabling i915 for the GPU as I do not have backlight control on this machine. Questions:

  How?
  Could this be the problem or part of it?

(Of my 3 machines, this is the only CPU with UHD graphics.)

Also noted is that neither the conky nor E's evisum monitor always capture when a process has grabbed the CPU, which does show in top.

Even when no monitor shows the CPU at 100%, the touchpad does not function, although the keyboard does seem to respond all/most of the time.
Comment 4 Giuseppe Ghibò 2024-03-05 15:37:06 CET
i7-10510U seems one of those mini-PC or NUC based hardware.

Try to disable hyperthreading booting with 'nosmt'.

For backlight keywords might be

i915.enable_dpcd_backlight=1

and/or

acpi_osi=...
acpi_backlight=linux|vendor|native

There is one pre-test 6.6.20-1.mga9 kernel in copr. you might want to see whether it shows the same problems.
Comment 5 aguador 2024-03-05 17:13:07 CET
Thanks. Disabling threading did not help. In reality I DID have backlight control after login, so that is not the problem. I may give the kernel in copr a try...tomorrow.

I am on the 6.6.18 kernel at the moment and after about 5 minutes, the system begins to behave normally. I will need to try a long session later to see if this holds. Just maybe this is a problem with a very slow loading of services.
Comment 6 Lewis Smith 2024-03-05 20:54:43 CET
Thank everyone for being helpful.
As Giuseppe is already looking at this, assigning to kernel team (which I would have done anyway), and removing G CC. Re-assign if you see fit.

(In reply to aguador from comment #3)
> 2) kworker/7:2pm is grabbing the processor (gst-plugins-scanner does in the
> early phases of startup on this kernel, not the 6.5.13 one)
kworker; hmmm. 

6.6.14-desktop-2.mga9
 $ ps ax | grep kworker | wc -l
67

This looks mad. CC'ing KDE. I am not even running Plasma!
kworker shows frequently with 'top' (also, as always, akonadi).

CC: ghibomgx => kde, lewyssmith
Assignee: bugsquad => kernel

Comment 7 aguador 2024-03-06 09:52:44 CET
The "k" in this case, as I have learned, refers to kernel.

Poking around a bit I see that the 6.6 kernel introduced a new scheduler:

https://lwn.net/Articles/925371/

Given that the system seems to settle down later, I am wondering if that change in the 6.6 kernel is the root of the problem. That change might explain why gst-plugins-scanner can grab the CPU in this kernel but not the 6.5 kernel. There is also this from Arch (although I am not running Networkmanager):

https://bbs.archlinux.org/viewtopic.php?id=290976&p=2

I am sure the Giuseppe knows how to monitor this, but see the answer here and reference to RH docs:

https://superuser.com/questions/1684585/how-to-find-out-what-a-kworker-does

I DK if I will get to try the 6.6.20 kernel in copr, but given that my system does seem to settle down satisfactorily, perhaps time is better spent working on later kernels (including 6.7) as they may have addressed the problem -- especially since no one else has reported the problem.
Comment 8 Giuseppe Ghibò 2024-03-06 22:11:27 CET
Actually there are no plan for 6.7.x.

Is your bios version already up-to-date there?

kworker does a lot of stuff and any distro has 50-70 process of it under ps. Looking around, 100% CPU on kworker was pretty common and popping from time to time, since even kernel 2.6.32... Apparently seems related to interrupts or some I/O driver.

Can you try also disabling all cores, booting with parameter maxcpus=1 (or maxcpus=2...N).

I wonder if it might depends on a kernel-firmware(free|nonfree) update that it's still missed.

Does also change something if you change the frequency governor? What shows

 cpupower -c all frequency-info

? Does it shows intel_pstate?

On the last, try one of these kernels (just install -desktop and -desktop-devel if you have dkms modules, they are using oldnamingscheme so easy to uninstall).

- 6.6.21-1.mga9: this is a pre-candidate for 6.6.21: https://download.copr.fedorainfracloud.org/results/ghibo/mageia9-bonus/mageia-9-x86_64/07113225-kernel/

- an alternative 6.1.81 (just to see whether has a different behavior)
https://download.copr.fedorainfracloud.org/results/ghibo/mageia9-bonus/mageia-9-x86_64/07113288-kernel/

CC: (none) => ghibomgx

Comment 9 aguador 2024-03-07 22:26:55 CET
(In reply to Giuseppe Ghibò from comment #8)
> Actually there are no plan for 6.7.x.
> 
> Is your bios version already up-to-date there?
Just saw that there is an update available. I will flash it tomorrow.
> 
> kworker does a lot of stuff and any distro has 50-70 process of it under ps.
> Looking around, 100% CPU on kworker was pretty common and popping from time
> to time, since even kernel 2.6.32... Apparently seems related to interrupts
> or some I/O driver.
It is running up to 100% for so long that is a problem. This evening in tests, the 6.6.x kernels take about 2'25" to settle down. The mouse/trackpad is basically unusable during that time, but navegation with the keyboard is possible.
> 
> Can you try also disabling all cores, booting with parameter maxcpus=1 (or
> maxcpus=2...N).
I will attach a file to this report, but I tried n=1 and n=2. The only real change was that the trackpad was usable sooner, but the system still took over the time indicated above to settle down.
> 
> I wonder if it might depends on a kernel-firmware(free|nonfree) update that
> it's still missed.
BIOS apart, the system has been kept fully up to date.
> 
> Does also change something if you change the frequency governor? What shows
> 
>  cpupower -c all frequency-info
> 
> ? Does it shows intel_pstate?
See second file being attached.
> 
> On the last, try one of these kernels (just install -desktop and
> -desktop-devel if you have dkms modules, they are using oldnamingscheme so
> easy to uninstall).
> 
> - 6.6.21-1.mga9: this is a pre-candidate for 6.6.21:
> https://download.copr.fedorainfracloud.org/results/ghibo/mageia9-bonus/
> mageia-9-x86_64/07113225-kernel/
> 
> - an alternative 6.1.81 (just to see whether has a different behavior)
> https://download.copr.fedorainfracloud.org/results/ghibo/mageia9-bonus/
> mageia-9-x86_64/07113288-kernel/
Did not install the devel kernels as they wanted to bring in more dependencies, including devels for kernels already installed. The 6.6.21 kernel behaved just like the 6.6.18 kernel. I skipped the 6.1.18 kernel as the 6.5.13 and 6.4.16 kernel already installed both work without problems.
Comment 10 aguador 2024-03-07 22:39:58 CET
Created attachment 14439 [details]
Results of kernel tests

Full desktop shown on the last screenshot. The splashscreen for tomboy-ng normally appears with if not slightly before the wallpaper. With the 6.6.x kernels it appears 40s after login.
Comment 11 aguador 2024-03-07 22:41:07 CET
Created attachment 14440 [details]
CPU_frequency_tests
Comment 12 Giuseppe Ghibò 2024-03-07 22:45:08 CET
(In reply to aguador from comment #9)

> Did not install the devel kernels as they wanted to bring in more
> dependencies, including devels for kernels already installed. The 6.6.21
> kernel behaved just like the 6.6.18 kernel. I skipped the 6.1.18 kernel as
> the 6.5.13 and 6.4.16 kernel already installed both work without problems.

Devel kernel are required only.

The fact is that the 6.1.81 series is newer and in another LTS branch which is up to date while 6.5.X and 6.4.x stopped at that point. I cited only to see if that's is a problem introduced in a recent port. Many stuff is ported from main series to the LTS series.
Comment 13 Giuseppe Ghibò 2024-03-07 22:45:47 CET
I meant -devel kernel are required only if you need to build dkms stuff, otherwise are not required.
Comment 14 Giuseppe Ghibò 2024-03-07 23:02:58 CET
> > kworker does a lot of stuff and any distro has 50-70 process of it under ps.
> > Looking around, 100% CPU on kworker was pretty common and popping from time
> > to time, since even kernel 2.6.32... Apparently seems related to interrupts
> > or some I/O driver.
> It is running up to 100% for so long that is a problem. This evening in
> tests, the 6.6.x kernels take about 2'25" to settle down. The mouse/trackpad
> is basically unusable during that time, but navegation with the keyboard is
> possible.

What I mean is not that 100% load of kworker is normal, but rather that this kind of problems with kworker at 100% were not recent, and we found already described since kernel 2.6.32 (and of course fixed in next kernel releases), but from time to time it often pop ups (and the root of the cause changes)... 

> >  cpupower -c all frequency-info
> > 
> > ? Does it shows intel_pstate?
> See second file being attached.


try to boot with 

intel_pstate=disable

it should use acpi-cpufreq, and see how is behaving.
Comment 15 aguador 2024-03-07 23:45:23 CET
Giuseppe, Thanks for your patience and the clarification on kworker.

The only notable (and welcome) difference with intel_pstate=disable is a brighter login screen!

The behaviour is the same whether booting normally, with intel_pstate disabled, or cpus limited: 40s to bring up the tomboy-ng splash screen, touchpad is responsive for about the first minute then non-responsive for about the next 1m 25s.

It's getting late, so I will deal with the BIOS tomorrow and see if that solves the issue...
Comment 16 aguador 2024-03-08 12:41:42 CET
OK, BIOS updated -- and, sorry to say, the problem persists. :-(
Comment 17 Giuseppe Ghibò 2024-03-18 23:55:12 CET
Still needs to be completed with other packages (and furthermore there isn't yet the curresponding bug#), but you might try in the meanwhile kernel-desktop-6.6.22-1.mga9 in updates_testing.

Try also some cli monitoring util which is less CPU hungry than conky. E.g. top, htop, btop. And also try to see whether the same problems occurs when you don't start in graphics mode, (just append "3" to booting kernel cmdline so to avoid starting systemd in graphical.target).

Also try to see whether there is some weird I/O (you might see with iotop) process when the problem occurs.
Comment 18 Morgan Leijström 2024-03-19 08:46:06 CET
Sidenote: running kernel-desktop-6.6.22-1.mga9 here now. Ready for QA?
Comment 19 aguador 2024-03-19 10:14:58 CET
(In reply to Giuseppe Ghibò from comment #17)

> Try also some cli monitoring util which is less CPU hungry than conky. E.g.
> top, htop, btop. And also try to see whether the same problems occurs when
> you don't start in graphics mode, (just append "3" to booting kernel cmdline
> so to avoid starting systemd in graphical.target).
> 
Sorry I missed this response earlier. Starting at run level 3, startx brings the GUI up normally/without delay.

Having seen Morgan's comment earlier, this was tested on the 6.6.22 kernel from updates-testing -- which behaves as badly as the prior 6.6.x kernels when starting at run level 5.

Hope this helps/gives a clue as to what is going on.
Comment 20 Giuseppe Ghibò 2024-03-19 10:26:43 CET
(In reply to aguador from comment #19)
> (In reply to Giuseppe Ghibò from comment #17)
> 
> > Try also some cli monitoring util which is less CPU hungry than conky. E.g.
> > top, htop, btop. And also try to see whether the same problems occurs when
> > you don't start in graphics mode, (just append "3" to booting kernel cmdline
> > so to avoid starting systemd in graphical.target).
> > 
> Sorry I missed this response earlier. Starting at run level 3, startx brings
> the GUI up normally/without delay.

So, to resume, if we understand correctly:

1) in non-graphics mode, it won't show anomalies, at any activities, even later after a certain amount of time.

2) running X11 from startx from cli works fine too.

3) starting desktop from the displaymanager brings delay and high load later that persists.

What kind of graphical server is it running? Xorg or Xwayland? Or maybe both are started? Are you running with an xorg.conf file or without? Does it uses intel or modesetting driver?
Comment 21 aguador 2024-03-19 13:05:50 CET
Created attachment 14471 [details]
System xorg.conf

Initially installed from a beta of Mga9 XFCE; set to plug&play
Comment 22 aguador 2024-03-19 13:07:54 CET
Created attachment 14472 [details]
System information

nVidia card installed, but only using Intel.
Comment 23 aguador 2024-03-19 13:18:28 CET
> 1) in non-graphics mode, it won't show anomalies, at any activities, even
> later after a certain amount of time.
Correct, no delays in starting and, for example, can execute inxi.
> 
> 2) running X11 from startx from cli works fine too.
Yes, E starts normally and tomboy-ng splash window arrives at about the same time as the wallpaper.
> 
> 3) starting desktop from the displaymanager brings delay and high load later
> that persists.
No, the tomboy splash does not load when the DE comes up except for a tiny window showing that the machine is working to load it. The conky comes up normally. For about the first minute the cursor is functional and the system somewhat responsive. It then "freezes" until about 2 min 25 secs, after which the system "settles" and is fully functional.
> 
> What kind of graphical server is it running? Xorg or Xwayland? Or maybe both
> are started? Are you running with an xorg.conf file or without? Does it uses
> intel or modesetting driver?
See the system and xorg information attached.

I have also tested without tomboy-ng and the conky, and there is essentially no difference in behavior. When booting directly to IceWM there is about a 30 sec delay in bringing up the WM, after which everything seems to function normally.
Comment 24 Giuseppe Ghibò 2024-03-19 13:24:23 CET
(In reply to Morgan Leijström from comment #18)

> Sidenote: running kernel-desktop-6.6.22-1.mga9 here now. Ready for QA?

I opened #32985 for that. Still to be made kernel-linus stuff (package, etc.).
Comment 25 Giuseppe Ghibò 2024-03-19 13:39:04 CET
(In reply to aguador from comment #23)
> > 1) in non-graphics mode, it won't show anomalies, at any activities, even
> > later after a certain amount of time.
> Correct, no delays in starting and, for example, can execute inxi.
> > 
> > 2) running X11 from startx from cli works fine too.
> Yes, E starts normally and tomboy-ng splash window arrives at about the same
> time as the wallpaper.
> > 
> > 3) starting desktop from the displaymanager brings delay and high load later
> > that persists.
> No, the tomboy splash does not load when the DE comes up except for a tiny
> window showing that the machine is working to load it. The conky comes up
> normally. For about the first minute the cursor is functional and the system
> somewhat responsive. It then "freezes" until about 2 min 25 secs, after
> which the system "settles" and is fully functional.
> > 
> > What kind of graphical server is it running? Xorg or Xwayland? Or maybe both
> > are started? Are you running with an xorg.conf file or without? Does it uses
> > intel or modesetting driver?
> See the system and xorg information attached.
> 
> I have also tested without tomboy-ng and the conky, and there is essentially
> no difference in behavior. When booting directly to IceWM there is about a
> 30 sec delay in bringing up the WM, after which everything seems to function
> normally.

IceWM weird delay is probably another (broken?) stuff. E.g. I get weird behaviour on IceWM since mga8, e.g. when I click on the terminal button in the toolbar I get that the terminal is opened (at least at cold boot) 30 seconds after I push the button. Apparently this seems related to some reverse IP lookup timeout or something like that. What about Plasma?

What about switching from "intel" to "modesetting" in xorg.conf in the Device Section?

Check also /etc/nsswitch.conf and move where there is hosts: ... the word "files" to the first position after ":".

Also check there is only one process Xorg or Xwayland. Sometimes both are started (they can work at the same time) but in the end one doesn't know which one exactly is running.
Comment 26 aguador 2024-03-19 18:12:03 CET
> What about Plasma?
Will pass on that -- too many files to install and clean up later. There are reasons I use E...

> What about switching from "intel" to "modesetting" in xorg.conf in the
> Device Section?
THAT seems to have done the trick (tested with both 6.6.18 and 6.6.22)! A tiny, unfocused piece of the tomboy-ng notes splash still appears in the center of the screen for 2-3 secs once the GUI appears, then the full splash appears in its customary position -- but that is no big deal!

> Check also /etc/nsswitch.conf and move where there is hosts: ... the word
> "files" to the first position after ":".
Changing:

hosts:		mdns4_minimal files nis dns mdns4 myhostname

to:

hosts:		files mdns4_minimal nis dns mdns4 myhostname

had no effect.

> Also check there is only one process Xorg or Xwayland. Sometimes both are
> started (they can work at the same time) but in the end one doesn't know
> which one exactly is running.
Only Xorg is running.

Thank you, Guiseppe, for your all your patience and perseverance. At least we know the machine is not completely "allergic" to the 6.6 kernel. :-)
Comment 27 Giuseppe Ghibò 2024-03-19 18:56:13 CET
(In reply to aguador from comment #26)
> > What about Plasma?
> Will pass on that -- too many files to install and clean up later. There are
> reasons I use E...

Alternative is to try with 'openbox' instead of 'icewm'.

> > What about switching from "intel" to "modesetting" in xorg.conf in the
> > Device Section?
> THAT seems to have done the trick (tested with both 6.6.18 and 6.6.22)! A
> tiny, unfocused piece of the tomboy-ng notes splash still appears in the
> center of the screen for 2-3 secs once the GUI appears, then the full splash
> appears in its customary position -- but that is no big deal!
> 

these artifacts might be related to 3D hardware acceleration.

Is the installation you are running "clean" or comes with many "debris"?

For testing try to disable hardware acceleration (just go in drakx11 and select Option then under select Disable Hardware Acceleration). It will rely on software llvmpipe driver for 3D, though slower.

Another possibility is that tomboy (or tomboy-ng) is broken. Where did you get it from? On mga9 there isn't a tomboy bin package anymore. And the current package hasn't been rebuild since mga8:

http://svnweb.mageia.org/packages/cauldron/tomboy/current/SPECS/tomboy.spec?view=log

And doesn't even rebuild due to missed deps.
Comment 28 aguador 2024-03-19 20:13:28 CET
> Is the installation you are running "clean" or comes with many "debris"?
Since I do not have a high-speed connection at the moment, this installation was done on a reformatted root directory from an XFCE 9 beta iso and DE-specific things eliminated later. If there is debris it is related to misc XFCE components I missed or config files in home.

> For testing try to disable hardware acceleration (just go in drakx11 and
> select Option then under select Disable Hardware Acceleration). It will rely
> on software llvmpipe driver for 3D, though slower.
Disabling acceleration with drakx11 did not change the behavior other than to prompt a warning from E that it was falling back to software rendering. There are finer settings available in E (Settings > Advanced > ConfigureElementary > Rendering), none of which affect that startup behaviour.

> Another possibility is that tomboy (or tomboy-ng) is broken. Where did you
> get it from? On mga9 there isn't a tomboy bin package anymore. And the
> current package hasn't been rebuild since mga8:
> And doesn't even rebuild due to missed deps.
Yep, I think I have not had Tomboy since Mga7or maybe even 6 when a mono update basically killed it.

tomboy-ng is backwardly compatible and mono free. I have done some testing of recently added features, as well as the Spanish translations of the interface and help notes:

https://github.com/tomboy-notes/tomboy-ng

I have run this, the latest Gtk2 version, on installed versions of Mageia 9 & 10, MX Linux XFCE and Artix, plus the Qt5 version on a live USB of MX Linux Plasma, and all without problems, not to mention prior releases on earlier OS releases, as well as tests of the Qt5 and Qt6 versions in Mageia under E. Plus a) I don't have to load the splash (but do so for testing purposes) and b) even without loading -ng at startup as noted above, there is still a delay.

Offtopic: I have never requested that -ng be added to Mga given staffing and the presumably increased work of creating packages from Lazarus/FPC source code. If I were to request something using Lazarus it would be to return to packaging Double Commander (which fails on the Slimbook for reasons unknown). Neither Krusader, Gnome-Commander nor the other dual pane GUI FM in the respositories can hold a candle to DC!
Comment 29 aguador 2024-06-08 17:00:24 CEST
Closing this as solved given that switching from "intel" to "modesetting" in xorg.conf eliminates virtually all the delay.

Resolution: (none) => INVALID
Status: NEW => RESOLVED