| Summary: | xfce4 session occasionally freezes | ||
|---|---|---|---|
| Product: | Mageia | Reporter: | Juergen Harms <juergen.harms> |
| Component: | RPM Packages | Assignee: | Jani Välimaa <jani.valimaa> |
| Status: | RESOLVED INVALID | QA Contact: | |
| Severity: | normal | ||
| Priority: | Normal | CC: | davidwhodgins, fri, ouaurelien, shybluenight |
| Version: | Cauldron | ||
| Target Milestone: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Source RPM: | xfce4-session-4.16.0-2.mga8.src.rpm | CVE: | |
| Status comment: | |||
| Attachments: |
Contents of .xsession errors (obtained via ssh from another system)
.xsession_errors after another freeze inxi -F as requested in comment 4 /var/log/Xorg.0.log immediately after freeze, before reboot. .xsession_errors immediately after freeze, before reboot |
||
|
Description
Juergen Harms
2021-01-06 12:51:26 CET
Created attachment 12186 [details]
Contents of .xsession errors (obtained via ssh from another system)
Morgan Leijström
2021-01-06 13:12:22 CET
CC:
(none) =>
fri Created attachment 12187 [details]
.xsession_errors after another freeze
I just had my 2nd freeze of today, adding a dump of .xsession-errors. Both freezes of today happened while I read was reading the online web page of my journal (NZZ) Hi, thanks reporting this. I don't know how you installed Cauldron. (From Beta2 classic install ISO ? from updating M7? from the xfce live iso ?) Also, .xsession_errors file does not matter when there is a total system freeze. Note that XFCE has been updated to his latest upstream version since M8 beta 2 iso were released, but packages were pushed as long as they are built. It is possible that one of necessary one is missing on your system and was uninstalled while upgrading one other. I suggest you: 1) inxi -F to see your hardware specifications, adding here as attachment. 2) Even wait for real Mageia 8 RC1 iso to wipe your / system partition and reinstall a good set of packages. CC:
(none) =>
ouaurelien Created attachment 12196 [details] inxi -F as requested in comment 4 > I don't know how you installed Cauldron. (From Beta2 classic install ISO ? from updating M7? from the xfce live iso ?) Beta 1 classic install ISO (cdrom, + update packages installed as they arrive - I verified that the list of my xfce the packages corresponds to that enumerated in the list that figured on the mirror when I submmitted the bug report) >Also, .xsession_errors file does not matter when there is a total system freeze. It is not a total system freeze - but a freeze of all session I/O (keyboard, and mouse I/O dead) (if the freeze were total, access to the log files via ssh would not be possible) > ... but packages were pushed as long as they are built. ... Using Cauldron only makes sense if new update packages are installed as soon they become available - which I do I suggest you: 1) inxi -F done. But I have these freezes both on my PC and a laptop, both with totally different hardware specifications; to avoid confusion, I limited the data included into this bugzilla report to data from the PC - retrieved via ssh from the laptop). 2) Even wait for real Mageia 8 RC1 iso to wipe your / system partition and reinstall a good set of packages. will do, waiting for RC1. With clean installs, my system partition is always totally wiped out. OK, so the freeze is rather in X session than a total system one. Good. So, when such freeze appears, can you give us the system log from: # journalctl -f if you can catch it over a ssh connection. If there is a video driver issue, we can even see it. .xsession_errors file is for graphical application. X doesn't log here. Also, please attach the /var/log/X.0.org.log file here of the freeze X session. If rebooted, it will be renamed with a 1 instead of 0 in file name. I decided to stop maintaining a Mageia OS partition, there is not much sense in keeping this bug open This is sad. It would be nicer to pin it down. There are plenty of people who use Xfce routinely, so the problem is not general. And you could try LXDE, which is very similar in appearance & use to Xfce (I use both, and often wonder which one I am actually using). Thank you for your own investigations; and the system info. > While the session is frozen, the system can be accessed via ssh from > another system Did you ever try accessing a virtual console, Ctl/Alt/F2-6 ? And if that worked, whether going back to the GUI Ctl/Alt/F1 had any effect? > The only way back to normal is to reboot (power-cycle). Re-starting X usually works in such cases, & is easier & faster: Ctrl/Alt/Bksp/Bksp CC:
(none) =>
lewyssmith Reporter, could you please reply to the previous question? If you don't reply within two weeks from now, I will have to close this bug as OLD. Thank you. Keywords:
(none) =>
NEEDINFO > Reporter, could you please reply Sorry, working with Cauldron has become quite painful - I am not much on Mageia any more. Previous question: > Did you ever try accessing a virtual console, Ctl/Alt/F2-6 ? > And if that worked, whether going back to the GUI Ctl/Alt/F1 had any effect? Would be great - but: how do I type Ctrl/Alt/something on a system with a frozen keyboard (and mouse)? I realize that I posted this as a bug on XFCE. I now have doubts whether the bug is primarily an XFCE issue - or whether a lower level problem just makes itself seen via XFCE. Reason for this doubt: 1. Significant log entries only appear in .xsession_errors and are generated by applications, not directly by XFCE. 2. Doing production usage on fully customized XFCE environments on Fedora and Debian never produces a freeze. Why drop XFCE under these conditions? its the the smoothness of the synapses in my fingers that makes me reluctant. Re /var/log/X.0.org.log: I am confused: do you mean /var/log/Xorg.0.log (no /var/log/X.0.org.log on my Mageia filesystems) - and, old log files get an "old" suffix, 0, 1 etc specifying a specific display; is it really Xorg.0.log that you want? I will switch back to my Mageia partition and create a copy of /var/log/X.0.org.log as soon as the next freeze happens (and as the related damage is not so serious that immediate repair is more urgent then dumping Xorg logs) - waiting for a suitable occasion will take some time. I dont mind if you simply close the bug. But, as said in comment 9, there exists objective interest to clarify the toppic - that also is the reason why I posted this bug in absence of confirmation from other users. Thank you for returning to this.
> how do I type Ctrl/Alt/something on a system with a frozen keyboard
I did say 'try'. If it does nothing, it is easy to say so.
Similarly with Ctrl/Alt/Bksp/Bksp .
Re the Xorg log file, doubtless /var/log/Xorg.0.log was the one.
If it is very big, better to make a copy and compress *that* with (say) xz.
Created attachment 12263 [details]
/var/log/Xorg.0.log immediately after freeze, before reboot.
Created attachment 12264 [details]
.xsession_errors immediately after freeze, before reboot
Waiting for a new freeze was much shorter than I had anticipated. I added attachements with xz-compressed copies of /var/log/Xorg.0.log, and also of .xsession_errors to provide a consistent view. The freeze happened on the machine with the properties documented in the 3rd attachement (inxi, 2021-01-08). The copies were drawn via ssh from another machine while I/O of the target machine was still in its frozen state (hence Xorg.0.log and not ... .old). The freeze happened while I was using emacs for editing a perl file. Sorry, I missed copying the output of journalctl - will do this at the next freeze. Thank you for the log information. That should do. "(EE) event21 - Logitech USB Keyboard: client bug: event processing lagging behind by 12ms, your system is too slow" x n "(EE) event21 - Logitech USB Keyboard: WARNING: log rate limit exceeded (5 msgs per 60min). Discarding future messages." Looks relevant. Assigning to wally for Xfce4; but it may be more general - Xorg; pass it on where you see fit. Keywords:
NEEDINFO =>
(none) Thank you, nice progress. Just now I had my next freeze - this time after a mouse event (hitting a GtkButton) (Logitech M325 Mouse). I pulled corresponding dumps and will keep them available - up to the bug team to say if you want me to provide them as attachments Jürgen, for me this looks like your mouse and/or keyboard are loosing power. Autosuspend mode for the usb ports? Powermanager settings? Tlp? Batteries low of the mouse? An usb hub maybe? Or a usb-c to usb adapter? Not sure if this is an xfce bug. But you can try to disable in the xfce settings. Settings - Settings and Startup - Power Manager. Or a setting in the bios/efi firmware, relating to the power of the usb ports? CC:
(none) =>
shybluenight Yes, that is a possible explanation. I have already planned to explore this (e.g. connect the keyboard directly to the machine, rather than via hub, use an alternate keyboard ...). However, periphery as the cause is unlikely, because - freezes occur at randon both on my PC and my laptop - freezes dont occur when I use an OS partition with (identically - script generated - XFCE customization) on Debian or Fedora (but dont take "dont" too literally where incidents arrive only every couple of days). I will post if I find something significant. Thanks for your help. I have finally accomplished all tests for/against local problems I could imagine (one change by one, I did not try permutations on combinations) - changed the battery of my mouse - connecting the keyboard and the mouse dongle directly to USB plugs on the computer rather than via a hub - using another USB keyboard All tests negative: the freezes still happen, at random, at irregular but longish intervals (several days in the average); writing this, yet another alternative comes to mind: use another mouse, not the M325 wireless one. Being somewhat repetitive: the following arguments contribute to / against the likelyhood that this problem is / is not local: - I have not observed a single freeze when I was running these machines under Debian and Fedora - no other Cauldron user has reported hitting this issue, and I am not the only one using Cauldron in production (do others have the same kind of non-ending sessions?) - freezes happen on machines with radically different architecture (a powerfull PC and a Laptop) - but these 2 machines have one thing in common: the way XFCE is set up, using a script - but precisely the same script also sets up XFCE on my Fedora and Debian OS partitions Probably the best approach is now to wait until Mageia-8 has been released and usage goes to the common public. Seems that it is mageia specific. Is it a Logitech unifying receiver? Is solaar installed, special for these kind of receivers? It could be an issue with the way mageia is detecting and configuring hardware (drakx, udev ...). That would be the main difference to Debian and Fedora. (I had a minor hardware problem with Mageia 7 (and only with Mageia) on a laptop and the initializing of its touchpad, randomly it wouldn't detect the touchpad at boot time, only after suspend and resume it would detect it. Magically that is solved with Mageia 8 ;-) >Is it a Logitech unifying receiver? Is solaar installed, special for these kind >of receivers?
It is a Logitech M325 with a unifying receiver (and yes, solaar is installed) - your comment is motivation to make yet another test with a wired mouse ... ughhh
Now I have also made the test to replace the wireless mouse by a wired mouse. Worked so long, that I already thought that it might really be the mouse. But I just had a typical freeze - so it is not the mouse. Install htop, open a terminal, use "su -" to become root, run htop, press f6 (sort by), select STATE, and leave the terminal such that enough of it is visible to see the columns S (state) and the command. I expect that when the mouse freezes, there will be one or more commands shown in the D (device wait) state. We need to id which commands are causing that, and then see what can be done to minimize the impact. Hopefully it will be commands that can be disabled. CC:
(none) =>
davidwhodgins Thanks, that makes good sense. And since top and htop can be run via ssh, it is practicable even with a frozen mouse - will do, but will take time waiting for events. The freezes happen both on my laptop and the desktop - to keep things simple, so far I did not report on laptop freezes. But, just now a freeze occured on my laptop, an occasion to rapidly apply the suggestion made by Dave Hodgins. The frozen process (state D) had PID 1715 (PPID is 1701 = /usr/sbin/lightdm. I used ps -l for the benefit of being able to copy/paste the command from my console window: /usr/libexec/Xorg :0 -seat seat0 -auth /var/run/lightdm/root/:0 -nolisten tcp vt1 -novtswitch The other remarkable fact is that the process appears to loop: cpu percentage varies around 95% Not being an expert on this, but it looks like it would be interesting to try another login manager. Yes, that started as a dull exercise to distinguish between noise and not noise - it starts to become interesting. Different display manager: there I need expert advice. lxdm would appear a simple alternative, but I would like to avoid losing time finding out how make lxdm used (but I found several google hits, enough to make an initial try) For the moment, I wait for the next incident on the desktop machine to verify whether the scenario is similar to that on the laptop. In MCC tab "Start" there is a icon for Display Manager, where you can select amongst installed ones. So first install one you want, then select it there, reboot :) Just now my desktop got stuck - precisely the same situation as on the laptop: a process with status D, identical contents of the triggering command line. Having a closer look at ps -lA, I also found a kworker process (not shown by htop) with the D status, its command line is [kworker/7:1H+events_highpri] I am perfectly willing now to try using lxdm - but shouldnt priority go into exploring why lightdm is upset? No need to install a different display manager. You can start xfce4 without any DM. At the grub kernel command line (type 'e' to get it when starting your machine), add '3' (without the quote), ctrl x to start, login as user, then type: startxfce4 Make sure, lightdm is not running, not started. Htop or top. See what happens. Several QA Mageia users use xfce and lightdm, nobody so far reported this problem. Your 'special' script you are running to configure xfce might be intersting as well. Also, a fresh installation, without any personal scripts for xfce, could be an option. The xfce version in M8 btw is xfce 4.16.x, fully gtk3, no more gtk2. The kworker thread, visible in htop if you press the "K" (note: uppercase) to toggle showing threads on/off is a kernel thread that handles i/o operations for a device. It's stuck in the device wait state as the kernel is waiting for some i/o operation to complete. When the kernel is stuck, it can not respond to user space applications. It may be due to swapiness. See https://rudd-o.com/linux-and-free-software/tales-from-responsivenessland-why-linux-feels-slow-and-how-to-fix-that It may be due to partition alignment. The kernel reads/write 4KB per logical i/o. Older disk drives used 512 byte logical and physical sectors, so 8 sectors per 4KB block. Newer hard drives either use 512 byte logical sectors with 4KB physical sectors or 4KB logical sectors with 4KB physical sectors. There were some really bad drives at the beginning of the new drives that lie to the kernel and claim to be using 512 byte physical sectors even though they really use 4KB physical sectors. This was done for compatibility with windows software that wasn't ready to handle the 4KB sector sizes. The problem with those drives is that when the kernel writes a 4KB block in an i/o request, if those 8 512 byte sectors overlap two 4KB blocks in the hard drive, the firmware in the hard drive (much slower than a cpu) has to translate the one write request into two reads, a merge of the updated 4KB write with the two 4KB sectors, and two writes. This drastically slows down writes that are not aligned on 4KB boundaries. Diskdrake uses 1MB boundaries for partitions (a multiple of 4KB), but not all partitioning software does that. Depending on what software was used, the partitions may not be aligned on 4KB boundaries. The command "sfdisk -luS /dev/sda" will show the start sector of each partition for that drive. If any of those sectors start at a number that is not evenly divisible by 8, that's a problem (with the exception of the Extended partition which it's technically ok not to have aligned, since it's just a sector with a partition table). Another cause of device waits is bad hardware, such as a sata cable that has a dirty connector forcing the kernel to retry operations several times to get a successful read/write. Though it does work, it's slow. That should be visible in dmesg output. There are also some sata controllers that are known to be very poor, though I'm not clear why. Whether it's due to swappiness, partition alignment, or bad hardware, this is not a software problem. It's system tuning and/or hardware fixing/replacing. The fact Juergen see this on two systems speak against hardware fault (or is a low probability coincidence) The fact Juergen see this on two systems speak against hardware fault (or is a low probability coincidence) I think that a wise decision would be to close this bug now - to be reopened, or re-filed in case other users hit this kind of problem, and preferably if means are found to reproduce the events that trigger the freeze by explicit action. I had originally opened the bug because such unexplained problems should not be left simply hanging around. There is now a clear idea on the mechanism. If both systems have similar spinning rust drives, ram/swap usage, and application usage, I'm not surprised by both systems showing the same problem. The swappiness default settings could be changed. They are a tradeoff between the best settings for servers, and the best settings for desktop systems. Currently it's set roughly in the middle, slightly in favour of desktops. Mageia is intended to be useful for both servers and desktops, so I'm not in favour of changing the defaults, but that would be up to our kernel admins. That may be a factor that exacerbates the problem, but is not the cause. The cause of device waits is hardware, and hardware only. There is nothing that can be done in software to solve the waits. They can be reduced by system tuning (swapiness settings, removing any services that are not deemed essential, etc.), but that is up to the system admin as what's suitable for one user is not going to be the same as what's suitable for another. My recommendation is to get an ssd drive, and only use the spinning rust drives for bulk storage. The difference in speed with an ssd drive, is impressive. I have the same problem with the device waits with an AMD/ATI SB7x0/SB8x0/SB9x0 SATA Controller [AHCI mode], and a WDC WD10EZEX-00RKKA0 hard drive. I now use it for bulk storage (iso images, video files, etc.). Everything else I keep on ssd drives which do not have the problem to anywhere near the same extent. It still happens, but much less often and for much shorter periods of time, and is only noticeable if I have htop running. As per comment 36, closing as invalid, however if future reports do come in, they should be closed as a duplicate of this bug. This bug should not be reopened. Status:
NEW =>
RESOLVED Both laptop and desktop have their root partition on SSD devices, but some shared data resides on a hard drive and might have been accessed - difficult to verify post festum. |