Description of problem: While doing some heavy processing -- at the moment, importing about 15 million records into a postgresql database which already contains about 2.3 terra-bytes of data; the task bar freezes. Even the digital clock stops updating. This is the 4th time in the past 10 days. The first two times this occurred, I rebooted to recover. The third time a couple days ago, other than being stuck on one of my 10 desktops, I could continue to work. I was on a zoom call at the time, so stayed on. After a while, the taskbar unfroze and all was good again. Worked fine after thaw... The freeze occurs as I move the mouse over the taskbar to access something. This time, mouse over the emacs icon and the large emacs window representations popped up and froze. Now, the bottom third of my primary screen is unusable. The database import job won't finish for hours... I was going to get the system info from systemsettings/systemsettings5; but their window opens (blank) and freeze too. Both programs output this when invoked: $ systemsettings5 kf.coreaddons: "Could not load plugin from kcm_kaccounts: The shared library was not found." file:///usr/lib64/qt5/qml/org/kde/kirigami.2/ScrollablePage.qml:200:9: QML MouseArea: Binding loop detected for property "width" file:///usr/lib64/qt5/qml/org/kde/kirigami.2/ScrollablePage.qml:200:9: QML MouseArea: Binding loop detected for property "width" QQmlEngine::setContextForObject(): Object already has a QQmlContext ^C The library not found is related to the freeze; the taskbar and systemsettings work fine otherwise. Version-Release number of selected component (if applicable): $ uname -a Linux pf.pfortin.com 6.0.6-server-1.mga9 #1 SMP PREEMPT_DYNAMIC Sat Oct 29 09:37:13 UTC 2022 x86_64 GNU/Linux $ uptime 17:37:08 up 4 days, 3:12, 38 users, load average: 2.74, 3.14, 3.18 Will have to update this report with other system info when taskbar thaws. How reproducible: no repeatable way that I've found, other than heavy processing with lots of disk I/O. On a whim, decided to pause the database job; it took several minutes for the HD to settle down (apparently lots of queued I/O). Resumed the job after about 10 minutes; but the taskbar is still frozen... So this must be the taskbar waiting for memory...? $ top -bn1 top - 17:58:30 up 4 days, 3:34, 38 users, load average: 3.36, 3.38, 2.97 Tasks: 749 total, 2 running, 746 sleeping, 0 stopped, 1 zombie %Cpu(s): 4.3 us, 0.6 sy, 0.0 ni, 84.1 id, 11.0 wa, 0.0 hi, 0.0 si, 0.0 st MiB Mem : 128546.4 total, 1012.2 free, 42626.8 used, 84907.4 buff/cache MiB Swap: 36095.0 total, 33762.3 free, 2332.8 used. 83231.1 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 52923 pfortin 20 0 931404 99612 10560 S 35.3 0.1 7:09.19 python The taskbar has been frozen for an hour now (since 17:07:13)... will update if it thaws or requires a reboot... $ top -bn1 top - 18:13:08 up 4 days, 3:48, 38 users, load average: 3.95, 3.75, 3.45 Tasks: 748 total, 2 running, 745 sleeping, 0 stopped, 1 zombie %Cpu(s): 4.2 us, 2.8 sy, 0.0 ni, 77.1 id, 15.6 wa, 0.0 hi, 0.3 si, 0.0 st MiB Mem : 128546.4 total, 784.5 free, 42358.6 used, 85403.2 buff/cache MiB Swap: 36095.0 total, 33504.0 free, 2591.0 used. 83500.8 avail Mem Steps to Reproduce: 1. heavy processing 2. move mouse over taskbar 3.
Created attachment 13485 [details] journal at the time of taskbar freeze Looking into the journal, it appears other issues occurred just before the taskbar freeze... Clock froze at 17:07:53 while I was mousing over causing emacs representations (tooltips?) to popup -- still there... and this got logged at that time: Nov 05 17:07:53 pf.pfortin.com plasmashell[62317]: file:///usr/share/plasma/plasmoids/org.kde.plasma.taskmanager/contents/ui/ToolTipDelegate.qml:86:9: QML ScrollView: Binding loop detected for property "bot>
Created attachment 13486 [details] screenshot of frozen taskbar Taskbar still frozen; am able to limp along until job finishes. Database import is almost done -- running 23 hours now. Next import is much larger; looks like I need to seriously consider installing new SSD before starting that one. :)
$ top -bn1 top - 03:44:39 up 4 days, 14:20, 40 users, load average: 1.93, 2.26, 2.41 Tasks: 675 total, 1 running, 673 sleeping, 0 stopped, 1 zombie %Cpu(s): 1.1 us, 1.3 sy, 0.0 ni, 91.7 id, 5.6 wa, 0.0 hi, 0.3 si, 0.0 st MiB Mem : 128546.4 total, 748.8 free, 35510.0 used, 92287.6 buff/cache MiB Swap: 36095.0 total, 27555.5 free, 8539.5 used. 90731.9 avail Mem Other than this issue; Cauldron is now looking solid. Nice!
If you install htop and manage to run it in konsole or gnome terminal, press f6 to select SortBy, and cursor down/up to highlight STATE. Then, if you see a process with the S column (state) showing D, then it's stuck in a kernel device wait. That will cause the gui to freeze until the device wait is finished.
CC: (none) => davidwhodgins
That doesn't help in this situation... A postgresql job has been running since 17:40 (now 12:05); the 101st/last file is at this stage: /mnt/ftp/ftp/dl.ncsbe.gov/data/ncvhis_Statewide-20221105-110106.zip Dedup NCID Data New MtoM Time mins:secs Dict Size Dict Size Records Records Records Dedup Data MtoM Commit Total Activity 2,684,354,656 0 33,861,071 0 19,740,000 0:00 5:52 784:37 0:00 790:29 Loading mtom I'm seeing 1 to 4 bouncing processes each randomly D waiting their turn and the HD has been noisy the whole time. Maybe there'll be a clue when postgresql finishes. I have another even bigger job to run; but I definitely intend to install the new 4TB SSD first. This DB had to be migrated to HD since the 2TB SSD was too small; that has caused this issue to surface.
The kernel gets stuck in a device wait while it's waiting for the sata controller's queue to have room for the kernel to write to. While the kernel is stuck, nothing else can happen. There's nothing the software can do about it.
I suspect postgresql may have some tuning available in it, to improve the situation. That would involve using smaller buffers. While it would slow down the writing of the database, it would give the kernel more opportunity to process other tasks. I haven't looked to see what tuning postgresql has available, but have had to do tuning in other situations back in when I was working with IMS and DB2 database systems on IBM mainframes.
I've done some tuning; but looks like a deeper dive is needed. Was hoping pausing the job would have cleared the queue; but that didn't help. Maybe the taskbar deadlocked itself... Not having built my own kernel for several years now, gotta wonder if there's now a way to limit applications to a portion of disk queues...?
Not that I've ever heard of. For the kind of volume you're dealing with, it's normal to use a dedicated server system for storing the db, not a desktop system with a user sitting at the screen waiting for responses.
:) It's a pretty beefy system. Besides; can't afford a dedicated server, being retired on fixed income doing volunteer work. Anyway, taskbar (guessing plasma) must've deadlocked itself. It only self-healed the 3rd time it happened (4 now). Besides the screenshot, desktop didn't respond to right-clicks, double-clicking icons did nothing. I was able to start apps from the CLI, even a zoom meeting while waiting for the DB job to finish. Just swapped 2TB nvme with 4TB so I can move the growing DB back to an SSD; was working well on SSD until hitting the inevitable space limit... Thanks for your responses; let's see how this goes.
Just keeping an eye on this, but with no idea what to suggest! Thanks Dave for your input.
CC: (none) => lewyssmith
I'm closing this as invalid, as this is due to the very high volume of data being processed on a desktop system that isn't designed to handle very high volumes and handle user interaction at the same time. To do both at the same time, the database would have to be on a separate hard drive controller or on another computer, not the same hard drive controller used by programs interacting with the user. Feel free to add more comments or questions, or to reopen if I'm mis-reading the situation.
Oops. Forgot to actually close.
Status: NEW => RESOLVEDResolution: (none) => INVALID
No issue with closing it for now... Maybe someone will hit it with some other software; maybe processing video or somesuch... For the record, now that historical data is loaded, future jobs will be run overnight while I sleep; so hopefully the taskbar won't be impacted this way... Thanks for your feedback David!
Pierre I wondered whether your huge database jobs can be run from the command line. If yes, then they might be better off running without X at all (run level 3); overnight while you sleep - as you note.
Lewis... there's never a clean-cut way to determine what actually triggers an issue. The more times an issue occurs, the more opportunities there are to gather data. I'm now getting this freeze at times where little is happening. Last night, all was quiet; I was writing a short document and taking screenshots for illustrations -- just a few pages showing how to setup an application: what to put in various fields, what the next dialog should look like... simple stuff that would have barely moved the needle on an original 4.77MHz IBM PC. The system was essentially idle. My phone is always connected via KDEconnect and I get notifications popping up occasionally; mostly when my dogs go out or come in (security system motion detection). When it froze, a screen "patch" appeared, but never filled in -- right where the notifications normally appear (upper-right). To me, it appears to be some critical bit of code that should not be interrupted. That's conjecture; but the "patch" had all the earmarks of a KDEconnect notification: same location/size, on top, etc. Anyway, I've opened a report with KDE at https://bugs.kde.org/show_bug.cgi?id=461688. This KDEconnect notification is the first time I noticed a locked screen "patch"; but maybe KDEconnect was involved the other times...(?) I could disable KDEconnect; but I'd rather help nail it, than have it bite me later at a worse time. BTW, all the flickering I was suffering with for months all cleared up about the time kernel 6.0 got loaded. Other than this freeze issue; Cauldron is rock solid for me, and I think you know I drive it hard. My database is now 2TB; I had to install a 4TB NVMe M.2 SSD to speed it up -- on the 18TB HD, the seeking during table loads was rattling the system... :) :) With the digital clock set to show seconds, it's easy to get short journal logs right when the clock freezes... The really weird thing was that saving the screenshots (using "screengrab") while the taskbar was frozen, each save took 50 seconds before closing the save dialogs after completing the save... :? Oh... almost forgot to respond to your CLI suggestion. The DB loads are done via python scripts; it was unfortunate the first freeze occurred while one was running. All the others occurred when there was very little going on. Last night's was the best example. Also, the only way I could do level 3 would be to buy another system so this one could be a headless server -- at my age and income, that's highly unlikely. About 75% of my team (currently 15) are in my situation; the rest are mid-life with jobs. Having the most capable system, and running Linux, there's little chance of doing what we do on Windows or MacOS the others use... We're all volunteers with no financial support; just like y'all I'd guess. :) HTH...
Summary: taskbar freezes during heavy processing => taskbar freezes
Summary: taskbar freezes => taskbar (panel) freezes
see also https://bugs.kde.org/show_bug.cgi?id=429211