Bug 31080 - taskbar (panel) freezes
Summary: taskbar (panel) freezes
Status: RESOLVED INVALID
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: Cauldron
Hardware: All Linux
Priority: Normal normal
Target Milestone: ---
Assignee: Mageia Bug Squad
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-11-05 23:15 CET by Pierre Fortin
Modified: 2022-11-15 00:38 CET (History)
2 users (show)

See Also:
Source RPM:
CVE:
Status comment:


Attachments
journal at the time of taskbar freeze (4.56 KB, text/plain)
2022-11-06 00:11 CET, Pierre Fortin
Details
screenshot of frozen taskbar (118.41 KB, image/png)
2022-11-06 09:44 CET, Pierre Fortin
Details

Description Pierre Fortin 2022-11-05 23:15:40 CET
Description of problem: While doing some heavy processing -- at the moment, importing about 15 million records into a postgresql database which already contains about 2.3 terra-bytes of data; the task bar freezes. Even the digital clock stops updating.  This is the 4th time in the past 10 days.  The first two times this occurred, I rebooted to recover.  The third time a couple days ago, other than being stuck on one of my 10 desktops, I could continue to work. I was on a zoom call at the time, so stayed on. After a while, the taskbar unfroze and all was good again.  Worked fine after thaw...

The freeze occurs as I move the mouse over the taskbar to access something. This time, mouse over the emacs icon and the large emacs window representations popped up and froze. Now, the bottom third of my primary screen is unusable.  The database import job won't finish for hours...  

I was going to get the system info from systemsettings/systemsettings5; but their window opens (blank) and freeze too.  Both programs output this when invoked:
$ systemsettings5
kf.coreaddons: "Could not load plugin from kcm_kaccounts: The shared library was not found."
file:///usr/lib64/qt5/qml/org/kde/kirigami.2/ScrollablePage.qml:200:9: QML MouseArea: Binding loop detected for property "width"
file:///usr/lib64/qt5/qml/org/kde/kirigami.2/ScrollablePage.qml:200:9: QML MouseArea: Binding loop detected for property "width"
QQmlEngine::setContextForObject(): Object already has a QQmlContext
^C

The library not found is related to the freeze; the taskbar and systemsettings work fine otherwise.


Version-Release number of selected component (if applicable):
$ uname -a
Linux pf.pfortin.com 6.0.6-server-1.mga9 #1 SMP PREEMPT_DYNAMIC Sat Oct 29 09:37:13 UTC 2022 x86_64 GNU/Linux
$ uptime
 17:37:08 up 4 days,  3:12, 38 users,  load average: 2.74, 3.14, 3.18

Will have to update this report with other system info when taskbar thaws.

How reproducible:
no repeatable way that I've found, other than heavy processing with lots of disk I/O.

On a whim, decided to pause the database job; it took several minutes for the HD to settle down (apparently lots of queued I/O). Resumed the job after about 10 minutes; but the taskbar is still frozen...  So this must be the taskbar waiting for memory...?

$ top -bn1
top - 17:58:30 up 4 days,  3:34, 38 users,  load average: 3.36, 3.38, 2.97
Tasks: 749 total,   2 running, 746 sleeping,   0 stopped,   1 zombie
%Cpu(s):  4.3 us,  0.6 sy,  0.0 ni, 84.1 id, 11.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem : 128546.4 total,   1012.2 free,  42626.8 used,  84907.4 buff/cache
MiB Swap:  36095.0 total,  33762.3 free,   2332.8 used.  83231.1 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  52923 pfortin   20   0  931404  99612  10560 S  35.3   0.1   7:09.19 python


The taskbar has been frozen for an hour now (since 17:07:13)...  will update if it thaws or requires a reboot...

$ top -bn1
top - 18:13:08 up 4 days,  3:48, 38 users,  load average: 3.95, 3.75, 3.45
Tasks: 748 total,   2 running, 745 sleeping,   0 stopped,   1 zombie
%Cpu(s):  4.2 us,  2.8 sy,  0.0 ni, 77.1 id, 15.6 wa,  0.0 hi,  0.3 si,  0.0 st
MiB Mem : 128546.4 total,    784.5 free,  42358.6 used,  85403.2 buff/cache
MiB Swap:  36095.0 total,  33504.0 free,   2591.0 used.  83500.8 avail Mem 


Steps to Reproduce:
1. heavy processing
2. move mouse over taskbar
3.
Comment 1 Pierre Fortin 2022-11-06 00:11:05 CET
Created attachment 13485 [details]
journal at the time of taskbar freeze

Looking into the journal, it appears other issues occurred just before the taskbar freeze...  Clock froze at 17:07:53 while I was mousing over causing emacs representations (tooltips?) to popup -- still there...  and this got logged at that time:

Nov 05 17:07:53 pf.pfortin.com plasmashell[62317]: file:///usr/share/plasma/plasmoids/org.kde.plasma.taskmanager/contents/ui/ToolTipDelegate.qml:86:9: QML ScrollView: Binding loop detected for property "bot>
Comment 2 Pierre Fortin 2022-11-06 09:44:07 CET
Created attachment 13486 [details]
screenshot of frozen taskbar

Taskbar still frozen; am able to limp along until job finishes.

Database import is almost done -- running 23 hours now.  Next import is much larger; looks like I need to seriously consider installing new SSD before starting that one. :)
Comment 3 Pierre Fortin 2022-11-06 09:47:08 CET
$ top -bn1
top - 03:44:39 up 4 days, 14:20, 40 users,  load average: 1.93, 2.26, 2.41
Tasks: 675 total,   1 running, 673 sleeping,   0 stopped,   1 zombie
%Cpu(s):  1.1 us,  1.3 sy,  0.0 ni, 91.7 id,  5.6 wa,  0.0 hi,  0.3 si,  0.0 st
MiB Mem : 128546.4 total,    748.8 free,  35510.0 used,  92287.6 buff/cache
MiB Swap:  36095.0 total,  27555.5 free,   8539.5 used.  90731.9 avail Mem 

Other than this issue; Cauldron is now looking solid. Nice!
Comment 4 Dave Hodgins 2022-11-06 15:46:43 CET
If you install htop and manage to run it in konsole or gnome terminal,
press f6 to select SortBy, and cursor down/up to highlight STATE.

Then, if you see a process with the S column (state) showing D, then it's
stuck in a kernel device wait. That will cause the gui to freeze until the
device wait is finished.

CC: (none) => davidwhodgins

Comment 5 Pierre Fortin 2022-11-06 17:08:13 CET
That doesn't help in this situation...  A postgresql job has been running since 17:40 (now 12:05); the 101st/last file is at this stage:
/mnt/ftp/ftp/dl.ncsbe.gov/data/ncvhis_Statewide-20221105-110106.zip
       Dedup          NCID         Data         New        MtoM         Time mins:secs
      Dict Size     Dict Size     Records     Records     Records  Dedup   Data   MtoM  Commit Total  Activity
  2,684,354,656             0  33,861,071           0  19,740,000   0:00   5:52 784:37   0:00 790:29 Loading mtom 
   
I'm seeing 1 to 4 bouncing processes each randomly D waiting their turn and the HD has been noisy the whole time.
Maybe there'll be a clue when postgresql finishes.  I have another even bigger job to run; but I definitely intend to install the new 4TB SSD first.  This DB had to be migrated to HD since the 2TB SSD was too small; that has caused this issue to surface.
Comment 6 Dave Hodgins 2022-11-06 17:45:31 CET
The kernel gets stuck in a device wait while it's waiting for the sata
controller's queue to have room for the kernel to write to.

While the kernel is stuck, nothing else can happen. There's nothing the
software can do about it.
Comment 7 Dave Hodgins 2022-11-06 18:09:06 CET
I suspect postgresql may have some tuning available in it, to improve
the situation. That would involve using smaller buffers.

While it would slow down the writing of the database, it would give the kernel
more opportunity to process other tasks.

I haven't looked to see what tuning postgresql has available, but have had
to do tuning in other situations back in when I was working with IMS and DB2
database systems on IBM mainframes.
Comment 8 Pierre Fortin 2022-11-06 19:04:37 CET
I've done some tuning; but looks like a deeper dive is needed. Was hoping pausing the job would have cleared the queue; but that didn't help. Maybe the taskbar deadlocked itself...

Not having built my own kernel for several years now, gotta wonder if there's now a way to limit applications to a portion of disk queues...?
Comment 9 Dave Hodgins 2022-11-06 23:51:43 CET
Not that I've ever heard of.

For the kind of volume you're dealing with, it's normal to use a dedicated
server system for storing the db, not a desktop system with a user sitting
at the screen waiting for responses.
Comment 10 Pierre Fortin 2022-11-07 06:22:31 CET
:)  It's a pretty beefy system. Besides; can't afford a dedicated server, being retired on fixed income doing volunteer work.

Anyway, taskbar (guessing plasma) must've deadlocked itself. It only self-healed the 3rd time it happened (4 now). Besides the screenshot, desktop didn't respond to right-clicks, double-clicking icons did nothing. I was able to start apps from the CLI, even a zoom meeting while waiting for the DB job to finish. Just swapped 2TB nvme with 4TB so I can move the growing DB back to an SSD; was working well on SSD until hitting the inevitable space limit... 

Thanks for your responses; let's see how this goes.
Comment 11 Lewis Smith 2022-11-09 22:01:18 CET
Just keeping an eye on this, but with no idea what to suggest! Thanks Dave for your input.

CC: (none) => lewyssmith

Comment 12 Dave Hodgins 2022-11-09 22:39:31 CET
I'm closing this as invalid, as this is due to the very high volume of data
being processed on a desktop system that isn't designed to handle very high
volumes and handle user interaction at the same time.

To do both at the same time, the database would have to be on a separate
hard drive controller or on another computer, not the same hard drive
controller used by programs interacting with the user.

Feel free to add more comments or questions, or to reopen if I'm mis-reading
the situation.
Comment 13 Dave Hodgins 2022-11-09 22:40:02 CET
Oops. Forgot to actually close.

Status: NEW => RESOLVED
Resolution: (none) => INVALID

Comment 14 Pierre Fortin 2022-11-09 23:07:04 CET
No issue with closing it for now...  Maybe someone will hit it with some other software; maybe processing video or somesuch...   For the record, now that historical data is loaded, future jobs will be run overnight while I sleep; so hopefully the taskbar won't be impacted this way...  
Thanks for your feedback David!
Comment 15 Lewis Smith 2022-11-14 21:00:05 CET
Pierre
I wondered whether your huge database jobs can be run from the command line. If yes, then they might be better off running without X at all (run level 3); overnight while you sleep - as you note.
Comment 16 Pierre Fortin 2022-11-14 22:27:19 CET
Lewis...  there's never a clean-cut way to determine what actually triggers an issue.  The more times an issue occurs, the more opportunities there are to gather data.  I'm now getting this freeze at times where little is happening.  Last night, all was quiet; I was writing a short document and taking screenshots for illustrations -- just a few pages showing how to setup an application: what to put in various fields, what the next dialog should look like...  simple stuff that would have barely moved the needle on an original 4.77MHz IBM PC.  The system was essentially idle.

My phone is always connected via KDEconnect and I get notifications popping up occasionally; mostly when my dogs go out or come in (security system motion detection).  When it froze, a screen "patch" appeared, but never filled in -- right where the notifications normally appear (upper-right). To me, it appears to be some critical bit of code that should not be interrupted. That's conjecture; but the "patch" had all the earmarks of a KDEconnect notification: same location/size, on top, etc.  Anyway, I've opened a report with KDE at https://bugs.kde.org/show_bug.cgi?id=461688.  This KDEconnect notification is the first time I noticed a locked screen "patch"; but maybe KDEconnect was involved the other times...(?)  I could disable KDEconnect; but I'd rather help nail it, than have it bite me later at a worse time.  

BTW, all the flickering I was suffering with for months all cleared up about the time kernel 6.0 got loaded.  Other than this freeze issue; Cauldron is rock solid for me, and I think you know I drive it hard.  My database is now 2TB; I had to install a 4TB NVMe M.2 SSD to speed it up -- on the 18TB HD, the seeking during table loads was rattling the system...  :) :)

With the digital clock set to show seconds, it's easy to get short journal logs right when the clock freezes...  The really weird thing was that saving the screenshots (using "screengrab") while the taskbar was frozen, each save took 50 seconds before closing the save dialogs after completing the save...  :?

Oh... almost forgot to respond to your CLI suggestion.  The DB loads are done via python scripts; it was unfortunate the first freeze occurred while one was running.  All the others occurred when there was very little going on. Last night's was the best example.  Also, the only way I could do level 3 would be to buy another system so this one could be a headless server -- at my age and income, that's highly unlikely. About 75% of my team (currently 15) are in my situation; the rest are mid-life with jobs. Having the most capable system, and running Linux, there's little chance of doing what we do on Windows or MacOS the others use...  We're all volunteers with no financial support; just like y'all I'd guess.  :)

HTH...
Pierre Fortin 2022-11-14 22:27:45 CET

Summary: taskbar freezes during heavy processing => taskbar freezes

Pierre Fortin 2022-11-15 00:38:03 CET

Summary: taskbar freezes => taskbar (panel) freezes

Comment 17 Pierre Fortin 2022-11-15 00:38:43 CET
see also https://bugs.kde.org/show_bug.cgi?id=429211

Note You need to log in before you can comment on or make changes to this bug.