Description of problem: Previously in MGA5 without this pb, skipping MGA6 I installed MGA7/Plasma with the DVD formatting only / system partition Encountering many problems with Plasma and Nvidia driver resulting in repeated freeze. Suspecting old configuration files from MGA4.1/MGA5, I reinstalled the same way a second time formatting also the home partition. Unfortunately even freezes, so I switched to the Nouvau driver. this time no freezes but a recurring loss of GUI, subject of this report! Not believing too much, I still changed the graphics card by exactly the same one without any improvement. Nevertheless the PC remains accessible via ssh. Everything seems normal, none services in fault! But every time GUI is lost, journalctl -b always shows the same error: kernel: nouveau 0000:18:00.0: DRM: core notifier timeout Version-Release number of selected component (if applicable): 5.7.19-desktop-1.mga7 x11-driver-video-nouveau-1.0.16-3.mga7 lib64drm_nouveau2-2.4.101-2.mga7 Attached files about last loss: journalctl-b2020-10-03XfceNouveauNotComposite.log Xorg.0.log2020-10-03XfceNouveauNotComposite.log xorg.conf2020-10-03XfceNouveauNotComposite.log dmesg2020-10-03XfceNouveauNotComposite.log systemctlStatus2020-10-03XfceNouveauNotComposite.log systemctlListUnitFiles2020-10-03XfceNouveauNotComposite.log Hardware used and installed RPM inxi-Fm2020-10-03XfxeNouveauNotComposite.log rpmInstalled2020-10-03XfceNouveauNotComposite.log For information, journalctlRecurringCoreNotifierTimeoutSince2020-10-22.log summary of core notifier timeout since September 22, 2020 thanks to the command: journalctl |grep -E "Reboot --|nouveau 0000:18:00.0: DRM|nouveaudrm"
Created attachment 11900 [details] journalctl -b
Created attachment 11901 [details] Xorg.0.log
Created attachment 11902 [details] xorg.conf
Created attachment 11903 [details] dmesg
Created attachment 11904 [details] systemctl status
Created attachment 11905 [details] systemctl list-unit-files
Created attachment 11906 [details] inxi -Fm My hardware: HPxw9400 station
Created attachment 11907 [details] rpm -qa --last Installed RPM since second MGA7 installation
Created attachment 11908 [details] journalctl |grep -E "Reboot --|nouveau 0000:18:00.0: DRM|nouveaudrm" Summary of core notifier timeout since September 22, 2020
Exchanges on the MLO forum about my freeze and GUI loss problems with MGA7 https://www.mageialinux-online.org/forum/topic-27878-5+freeze-mga7.php#m274291 None pb previously with same hardware on MGA5! And about the specific loss of GUI already seen with Nouveau https://bugzilla.redhat.com/show_bug.cgi?id=1618906 https://bugzilla.redhat.com/show_bug.cgi?id=1655788
URL: (none) => https://www.mageialinux-online.org/forum/topic-27878-5+freeze-mga7.php#m274291
Hi, thanks for reporting this bug. Your graphic card is a Quadro FX 5600 which is see as a NV50 (G80) Nvidia's Tesla family card. Current nouveau driver from freedesktop.org supports for this card seems at least OK. Moreover, nvidia nonfree drivers upstream recommands are 340.108 and Mageia 7 provides this: nvidia340-340.108-7.mga7.nonfree Is this machine UEFI or BIOS? Do you want to use nonfree drivers or nouveau? Suggestion: Remove "vga" options on kernel command line, and try again. Try passing GRUB this value in /etc/default/grub: GRUB_GFXPAYLOAD_LINUX=keep GRUB_GFXMODE={your screen resolution here} This is done to help framebuffer / Console to have good option to start. Try also to add "nouveau.modeset=1" to kernel command line.
CC: (none) => ouaurelienComponent: Release (media or process) => RPM PackagesStatus comment: (none) => x11-driver-video-nouveau-1.0.16-3.mga7Keywords: (none) => NEEDINFO
Hello Aurelien, yes this is a Quadro FX 5600 and machine is BIOS. Previously I used the nvidia340-340.108-7.mga7.nonfree driver under Plasma but it was worse and the PC froze with NVRM: Xid ... Graphics Exception in the journalctl. So I switched to XFCE, always with the nvidia driver, but also problems with! I reported this on the MLO forum https://www.mageialinux-online.org/forum/topic-27878-1+mga7-freeze-ou-perte-gui.php It is from there that I passed to the Nouveau driver. Alas under Plasma full of artifacts making the use almost impossible. I have posted below 2 screenshots showing this https://www.mageialinux-online.org/forum/topic-27103-2+plantages-recurrents.php#m274425 So, a new time I switched to XFCE...to fall into the problem subject of this bug!!! For information, I have the impression that the last pb occurs when a virtualBox machine is launched. Indeed the 2 previous boots went well...chance???? Thanks for your suggestions, I will try!
Thank you for all the documentation you provided. Just to support Aurelien' notes. I have tried to establish the relationship between 'Quadro FX 5600' as reported correctly by inxi, and 'G80GL' as seen partially (G80) in dmesg and the X log (GeForce 8 (G8x)). To ensure that they are talking about the same thing. Wikipedia: "The 8800 series, codenamed G80". The 'G80' seems to be the GPU, for which the Nvidia driver 340.108 says (2007) "Added support for Quadro FX 4600 and Quadro FX 5600", clear enough. As for Nouveau: NOUVEAU driver for NVIDIA chipset families : GeForce 8 (G8x) which, academic though it may be, seems to point the same way. It all goes back a long time. > I have the impression that the last pb occurs when a virtualBox machine > is launched. > Indeed the 2 previous boots went well... Does this imply that you had *not* launched VB? Can you please test this specifically? It should be easy. Perusing the long MLO forum thread, I am dubious about a hardware cause: - It all worked under Mageia 5. - "I still changed the graphics card by exactly the same one without any improvement" (which looks like "I changed the card for a similar one"). There is one thing I have not seen mentioned (even if it was), but it is very easy - except you loose all work running; which you seemed willing to do when you say in the forum you remotely killed various applications: try killing the X-server with: Ctl/Alt/Bksp/Bksp which should return you to the display manager login screen. And see whether that restores the GUI. It is much quicker than re-booting.
CC: (none) => lewyssmith
(In reply to Lewis Smith from comment #13) > I have tried to establish the relationship between 'Quadro FX 5600' as > reported correctly by inxi, and 'G80GL' as seen partially (G80) in dmesg and > the X log (GeForce 8 (G8x)). To ensure that they are talking about the same > thing. > Wikipedia: "The 8800 series, codenamed G80". The 'G80' seems to be the GPU, > for which the Nvidia driver 340.108 says (2007) "Added support for Quadro FX > 4600 and Quadro FX 5600", clear enough. As for Nouveau: > NOUVEAU driver for NVIDIA chipset families : GeForce 8 (G8x) > which, academic though it may be, seems to point the same way. It all goes > back a long time. I agree your relationship, to avoid mistake, I had done the same thing for the correspondence... > > I have the impression that the last pb occurs when a virtualBox machine > > is launched. > > Indeed the 2 previous boots went well... > Does this imply that you had *not* launched VB? Can you please test this > specifically? It should be easy. Yes, for these 2 previous boot, VB was not launched, You are true for test (always with lightdm/XFCE/Nouveau) at each time I launch VB...and also Firefox, Thunderbird and terminal! These are my test conditions. For boot in progress in these conditions...I am waiting for GUI loss or not? (core notifier timeout in journalctl) > There is one thing I have not seen mentioned (even if it was), but it is > very easy - except you loose all work running; which you seemed willing to > do when you say in the forum you remotely killed various applications: try > killing the X-server with: > Ctl/Alt/Bksp/Bksp > which should return you to the display manager login screen. And see whether > that restores the GUI. It is much quicker than re-booting. On the PC without GUI, neither Ctrl+Alt+F2 or Ctrl+Alt+Bksp (once or twice) were operating. So on another remote PC, by ssh, I tried to restart the server with no result except once, by relaunching the lightdm service then restoring the login screen! At end, I stop with a shutdown -h now
Hello, > You are true for test (always with lightdm/XFCE/Nouveau) at each time I > launch VB...and also Firefox, Thunderbird and terminal! > These are my test conditions. > For boot in progress in these conditions...I am waiting for GUI loss or not? > (core notifier timeout in journalctl) Yesterday this boot in progress in these conditions ended without pb. None error from Nouveau. But today, same conditions, the "core notifier timeout" occurs but without loss of GUI!!! So, I write from the PC not with ssh!!! This is a new behavior. journalctl shows several errors of this type and then one call to "ifplugd invoked oom-killer" followed by a dump then several other errors "core notifier timeout"! This call/dump seems to be linked to a problem of memory... I can't say if there is a relationship between this and the "core notifier timeout" from Nouveau.
Created attachment 11916 [details] journalctl -b with "ifplugd invoked oom-killer" Line at the end of dump: Out of memory: Killed process 3477 (montage) total-vm:53805684kB, anon-rss:17697676kB, file-rss:0kB, shmem-rss:11716372kB, UID:1000 pgtables:71336kB oom_score_adj:0
Created attachment 11917 [details] dmesg with "ifplugd invoked oom-killer"
The memory pb, above, appears to be related to the swap that was filled in! free -h total utilisé libre partagé tamp/cache disponible Mem: 31Gi 4,0Gi 14Gi 11Gi 12Gi 15Gi Partition d'échange: 8,0Gi 935Mi 7,1Gi This free command shows that the swap is almost still at the limit while a lot of RAM is free. Would there be a liberation that would hurt?
Hi, having understood that oss-killer is the way for the system in case of saturated memory to get out of it by killing the process that would cause the saturation. I could see that this killed process (montage) was linked to a script that I had launched in // of my test conditions. > journalctl -b with "ifplugd invoked oom-killer" > > Line at the end of dump: > Out of memory: Killed process 3477 (montage) total-vm:53805684kB, > anon-rss:17697676kB, file-rss:0kB, shmem-rss:11716372kB, UID:1000 > pgtables:71336kB oom_score_adj:0 I was even able to reproduce it live by relaunching this same script. from journalctl ... oct. 07 20:04:40 HPxw9400 kernel: [ 24193] 1000 24193 6214209 5978556 48037888 0 0 montage oct. 07 20:04:40 HPxw9400 kernel: [ 25468] 1000 25468 67167682 600 208896 0 0 baloo_file_extr oct. 07 20:04:40 HPxw9400 kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0-1,global_oom,task_memcg=/user.slice/user-1000.slice/session-2.scope,task=montage,pid=24193,uid=1000 oct. 07 20:04:40 HPxw9400 kernel: Out of memory: Killed process 24193 (montage) total-vm:24856836kB, anon-rss:23914224kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:46912kB oom_score_adj:0 oct. 07 20:04:40 HPxw9400 kernel: oom_reaper: reaped process 24193 (montage), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB oct. 07 20:05:28 HPxw9400 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout oct. 07 20:05:47 HPxw9400 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout oct. 07 20:06:10 HPxw9400 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout From my script, same stopped 24193 PID 2020-10-07 16:46:25 != 2020-10-07 16:46:25 => CREATION planche contact sur "/mnt/E/patrick/Documents/Thunderbird/Profiles/patrick.default/cache2/entries" PAS DU TYPE: aaaa-mm-jj-t_ traitement de 1839 photos ./creeIndexPhotos.bash : ligne 110 : 24193 Processus arrêté montage -size 500x500 *_thumb.png -geometry 200x+0+0 -title "$1" -tile 6x -quality 100 "$1"_000_index.jpg 2>> "$1"_000_montage.error Now this script, even though it causes core timeouts, does not cause any loss of GUI, which is not the case so far! By the way, we can see that the baloo PID consumes more memory (total_vm) than the montage PID. journalctl -b |grep -E "baloo_file|montage|uid tgid total_vm" oct. 07 17:06:09 HPxw9400 kernel: [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name oct. 07 17:06:11 HPxw9400 kernel: [ 5926] 1000 5926 67167403 588 442368 1125 0 baloo_file oct. 07 17:06:11 HPxw9400 kernel: [ 3477] 1000 3477 13451421 7353512 73048064 1744714 0 montage oct. 07 17:06:11 HPxw9400 kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0-1,global_oom,task_memcg=/user.slice/user-1000.slice/session-2.scope,task=montage,pid=3477,uid=1000 oct. 07 17:06:11 HPxw9400 kernel: Out of memory: Killed process 3477 (montage) total-vm:53805684kB, anon-rss:17697676kB, file-rss:0kB, shmem-rss:11716372kB, UID:1000 pgtables:71336kB oom_score_adj:0 oct. 07 17:06:11 HPxw9400 kernel: oom_reaper: reaped process 3477 (montage), now anon-rss:0kB, file-rss:0kB, shmem-rss:11716372kB ... oct. 07 20:04:37 HPxw9400 kernel: [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name oct. 07 20:04:39 HPxw9400 kernel: [ 5926] 1000 5926 67167403 671 442368 1074 0 baloo_file oct. 07 20:04:40 HPxw9400 kernel: [ 24193] 1000 24193 6214209 5978556 48037888 0 0 montage oct. 07 20:04:40 HPxw9400 kernel: [ 25468] 1000 25468 67167682 600 208896 0 0 baloo_file_extr oct. 07 20:04:40 HPxw9400 kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0-1,global_oom,task_memcg=/user.slice/user-1000.slice/session-2.scope,task=montage,pid=24193,uid=1000 oct. 07 20:04:40 HPxw9400 kernel: Out of memory: Killed process 24193 (montage) total-vm:24856836kB, anon-rss:23914224kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:46912kB oom_score_adj:0 oct. 07 20:04:40 HPxw9400 kernel: oom_reaper: reaped process 24193 (montage), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB So, I am thinking that these cases with oos-killer are not the root cause of the GUI!!! PS: montage is a tool from Imagemagick, I use it in the script to make a "contact sheet" (planche contact in french) of a photo directory.
Created attachment 11921 [details] journalctl -p4 -b -1 (222 "core notifier timeout")
Hi, End of the boot story (comment 15 to comment 19) with the 2 oom-killer calls: It ended up freezing in the evening. The ssh of the other PC showed a 100% CPU occupation for the VirtualBoxVM PID and a big memory occupation. Tâches: 233 total, 2 en cours, 231 en veille, 0 arrêté, 0 zombie %Cpu0 : 2,3 ut, 97,3 sy, 0,0 ni, 0,3 id, 0,0 wa, 0,0 hi, 0,0 si, 0,0 st %Cpu1 : 3,0 ut, 2,4 sy, 0,0 ni, 85,1 id, 9,5 wa, 0,0 hi, 0,0 si, 0,0 st %Cpu2 : 2,7 ut, 1,0 sy, 0,0 ni, 92,7 id, 3,7 wa, 0,0 hi, 0,0 si, 0,0 st %Cpu3 : 2,4 ut, 2,4 sy, 0,0 ni, 89,6 id, 5,7 wa, 0,0 hi, 0,0 si, 0,0 st MiB Mem : 29,4/32165,1 [|||||||||||||||||||||||||||||| ] MiB Éch : 93,3/8206,0 [||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ] PID UTIL. PR NI VIRT RES SHR S %CPU %MEM TEMPS+ COM. 27038 patrick 20 0 5059356 2,2g 2,2g R 103,6 7,1 196:36.09 VirtualBoxVM free -h total utilisé libre partagé tamp/cache disponible Mem: 31Gi 4,4Gi 17Gi 4,5Gi 9,9Gi 22Gi Swap: 8,0Gi 7,5Gi 551Mi This occupation ended when I succeeded in switching off (in fact recording the state of) the remote virtual machine (W10). I then directly regained access to the PC. The series of errors "core notifier timeout" started a little before the 1st oom-killer call to continue until the 2nd (this time caused) oom-killer call. The oom-killer repairs did not make them disappear. The PC was not then frozen, only a certain slowness to change the window appeared, until it turned into a real freeze. After recovery the PC these errors were still emitted and this until the shutdown. (222 in total, cf the journalctl -p4 -b -1 in attachment 11921 [details] above).
Thank you for all your dogged research. Going back to comment 14: > On the PC without GUI, neither Ctrl+Alt+F2 or Ctrl+Alt+Bksp (once or twice) > were operating. Not being able to get to a virtual console (Ctrl+Alt+F2-7) or re-start X (Ctrl+Alt+Bksp ... twice) implies to me a problem deeper than the graphics. * Can youy say, from comment 0: > Nevertheless the PC remains accessible via ssh. Everything seems normal whether if you re-start X via SSH, the GUI + login screen re-appears? From comment 15: > But today, same conditions, the "core notifier timeout" occurs but without > loss of GUI!!! So, I write from the PC not with ssh!!! Again this points away from the video driver messages - rather, their cause - being the source of the problem. Perhaps it too is suffering from lack of memory, and these messages are another symptom of the memory problems you noted subsequently. * Can you say with confidence that the loss of the GUI definitely follows the use of (a) particular application(s), memory-hungry, notably VBox? [In fact you have a huge amount of memory, 32Gb, so swap seems superfluous. It is unusual to make it (8Gb) less than real memory, though.] In the last attachment (I think we have enough journals now, thank you) from comment 20, is: oct. 07 17:06:09 HPxw9400 kernel: Free swap = 0kB oct. 07 17:06:09 HPxw9400 kernel: Total swap = 8402908kB which is crazy. Immediately followed by: oct. 07 17:06:11 HPxw9400 kernel: Out of memory: Killed process 3477 (montage) total-vm:53805684kB, anon-rss:17697676kB, file-rss:0kB, shmem-rss:11716372kB, UID:1000 pgtables:71336kB oom_score_adj:0 oct. 07 17:08:23 HPxw9400 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout the last repeated many times. Which supports the idea that nouveau itself is suffering from lack of memory. Your investigations point to a memory usage problem, which you are pursuing. Await your further conclusions.
Hi, > Thank you for all your dogged research. I like to understand if it can help the team...and I'm learning! > Going back to comment 14: > > On the PC without GUI, neither Ctrl+Alt+F2 or Ctrl+Alt+Bksp (once or twice) > > were operating. > Not being able to get to a virtual console (Ctrl+Alt+F2-7) or re-start X > (Ctrl+Alt+Bksp ... twice) implies to me a problem deeper than the graphics. Maybe, but why not under MGA5 I didn't encounter these problems? (described in Description and Comment 12) > * Can youy say, from comment 0: > > Nevertheless the PC remains accessible via ssh. Everything seems normal > whether if you re-start X via SSH, the GUI + login screen re-appears? Under ssh, I launched startx as user and as root, failure! > From comment 15: > > But today, same conditions, the "core notifier timeout" occurs but without > > loss of GUI!!! So, I write from the PC not with ssh!!! > Again this points away from the video driver messages - rather, their cause > - being the source of the problem. Perhaps it too is suffering from lack of > memory, and these messages are another symptom of the memory problems you > noted subsequently. Same remark as above, under MGA5 with KDE and the Nvidia driver, none of the problems encountered under MGA7 with Plasma and the Nvidia driver, to the point that I switched to XFCE and the Nouveau driver! To fall also in this pb of loss of GUI. > * Can you say with confidence that the loss of the GUI definitely follows > the use of (a) particular application(s), memory-hungry, notably VBox? My test conditions are ligthdm/XFCE/New with Thunderbird, Firefox, 1 or 2 terminals and Virtualbox launched, I am not able to confirm my impression that this is the cause or something else. > In the last attachment (I think we have enough journals now, thank you) from > comment 20, is: > oct. 07 17:06:09 HPxw9400 kernel: Free swap = 0kB > oct. 07 17:06:09 HPxw9400 kernel: Total swap = 8402908kB > which is crazy. Immediately followed by: > oct. 07 17:06:11 HPxw9400 kernel: Out of memory: Killed process 3477 > (montage) total-vm:53805684kB, anon-rss:17697676kB, file-rss:0kB, > shmem-rss:11716372kB, UID:1000 pgtables:71336kB oom_score_adj:0 > oct. 07 17:08:23 HPxw9400 kernel: nouveau 0000:18:00.0: DRM: core notifier > timeout > the last repeated many times. Which supports the idea that nouveau itself > is suffering from lack of memory. > > Your investigations point to a memory usage problem, which you are pursuing. > Await your further conclusions. The novelty is that with my script involving montage (Imagemagick) I am able to reproduce an oom-killer call despite 32+8GB. Under MGA5 and MGA7 I also used this same script but not in the condition that generated the Out of memory. A grep on journalctl (since the MGA7 installation) shows only these 2 calls to oom-killer journalctl |grep oom-killer oct. 07 17:06:02 HPxw9400 kernel: ifplugd invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0 oct. 07 20:04:26 HPxw9400 kernel: gpm invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0 Thus, the GUI losses encountered did not involve this particular case. Nevertheless now I launch top to monitor in real time the memory consumption. Yesterday (without script in use) no problems even when the VirtualBoxVM PID is 100% CPU (the second top à 22:10) top - 15:27:19 up 6:03, 1 user, load average: 1,35, 1,06, 1,02 Tâches: 221 total, 1 en cours, 220 en veille, 0 arrêté, 0 zombie %Cpu0 : 10,6 ut, 11,6 sy, 0,0 ni, 76,8 id, 1,0 wa, 0,0 hi, 0,0 si, 0,0 st %Cpu1 : 8,8 ut, 5,4 sy, 0,0 ni, 84,1 id, 1,7 wa, 0,0 hi, 0,0 si, 0,0 st %Cpu2 : 7,9 ut, 19,1 sy, 0,0 ni, 72,3 id, 0,7 wa, 0,0 hi, 0,0 si, 0,0 st %Cpu3 : 7,4 ut, 4,7 sy, 0,0 ni, 85,6 id, 2,3 wa, 0,0 hi, 0,0 si, 0,0 st MiB Mem : 17,7/32165,1 [|||||||||||| ] MiB Éch : 0,0/8206,0 [ ] PID UTIL. PR NI VIRT RES SHR S %CPU %MEM TEMPS+ COM. 26260 patrick 20 0 5041728 2,3g 2,2g S 24,2 7,4 108:54.64 VirtualBoxVM free -h total utilisé libre partagé tamp/cache disponible Mem: 31Gi 4,9Gi 16Gi 240Mi 10Gi 25Gi Swap: 8,0Gi 0B 8,0Gi top - 22:10:28 up 12:46, 1 user, load average: 0,74, 0,59, 0,57 Tâches: 225 total, 2 en cours, 223 en veille, 0 arrêté, 0 zombie %Cpu0 : 4,7 ut, 2,7 sy, 0,0 ni, 91,7 id, 1,0 wa, 0,0 hi, 0,0 si, 0,0 st %Cpu1 : 4,4 ut, 3,1 sy, 0,0 ni, 89,5 id, 3,1 wa, 0,0 hi, 0,0 si, 0,0 st %Cpu2 : 2,3 ut, 97,7 sy, 0,0 ni, 0,0 id, 0,0 wa, 0,0 hi, 0,0 si, 0,0 st %Cpu3 : 10,0 ut, 3,0 sy, 0,0 ni, 83,6 id, 3,3 wa, 0,0 hi, 0,0 si, 0,0 st MiB Mem : 18,5/32165,1 [|||||||||||| ] MiB Éch : 0,0/8206,0 [ ] PID UTIL. PR NI VIRT RES SHR S %CPU %MEM TEMPS+ COM. 26260 patrick 20 0 5053840 2,4g 2,2g S 104,0 7,5 225:48.23 VirtualBoxVM 2 free -h total utilisé libre partagé tamp/cache disponible Mem: 31Gi 5,1Gi 14Gi 217Mi 11Gi 25Gi Swap: 8,0Gi 0B 8,0Gi
Thank you for this further information. > The novelty is that with my script involving montage (Imagemagick) > I am able to reproduce an oom-killer call despite 32+8GB. > PS: montage is a tool from Imagemagick, I use it in the script to > make a "contact sheet" (planche contact in french) of a photo directory. > Yesterday (without script in use) no problems > even when the VirtualBoxVM PID is 100% CPU I am wondering now about the 'montage' script; it looks as if VBox is in the clear. Imagemagick has certainly changed between MGA5 & 7. * Can you please attach the script? * Is it operating on the same image collection as under MGA5? * Has the number of images dealt with by the script grown significantly? Your use of 'top' was a good idea. Could you try it while running the script to see how the 'MiB Mem' (libre, utilisée) evolves - whether it stays more-or-less the same, or grows? Also keep an eye on the %MEM for the script process(es) line(s) (Imagemagick ?) --- Did you ever try the kernel suggestions from comment 11?
Hello, > I am wondering now about the 'montage' script; it looks as if VBox is in the > clear. Imagemagick has certainly changed between MGA5 & 7. imagemagick-desktop-6.9.5.2-1.mga5 imagemagick-7.0.8.62-1.mga7.tainted > * Can you please attach the script? yes and 2 include files > * Is it operating on the same image collection as under MGA5? yes > * Has the number of images dealt with by the script grown significantly? no By default the script scans recursively a directory "Photos". If new directory or modification of one of the subdirectories then creation of an index file (involving convert and mount, both of imagemagick) Always by default are excluded directories that have nothing to do with pictures. Nevertheless I can re-insert some of these excluded files, this is what happened when oom-killer was called. The reintroduced directory (thunderbird's cache) contained 1839 images (cf Comment 19) whereas the biggest of the real "Photos" directories contains 402! > > Your use of 'top' was a good idea. Could you try it while running the script > to see how the 'MiB Mem' (libre, utilisée) evolves - whether it stays > more-or-less the same, or grows? > Also keep an eye on the %MEM for the script process(es) line(s) (Imagemagick > ?) To do this, I deleted all the index files in the "Photos" directory to force a total recreation that lasted 258mm (4h18mmn). Result: even if from time to time 100% CPU occupation by virtualBoxVM, the top bargraph never exceeded 23.9% /9 bars for RAM while the swap has always remained at 0!!!! The PC has never frozen or lost its GUI, and the journalctl does not show any core notifier timeout alarm. > Did you ever try the kernel suggestions from comment 11? Yes, I forgot to say that because I don't really see the connection with the alrames arriving hours after the launch... Modification removing vga and GRUB_GFXMODE=1900x1200 GRUB_GFXPAYLOAD_LINUX=keep since oct. 06 09:12:37 kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-5.7.19-desktop-1.mga7 root=UUID=8ebeda33-9710-49a7-93a4-809d13e2809f ro splash quiet noiswmd resume=UUID=2fdb0460-b082-496e-b748-e904514ac886 audit=0
URL: https://www.mageialinux-online.org/forum/topic-27878-5+freeze-mga7.php#m274291 => https://www.dailymotion.com/video/x7wrcmv
Created attachment 11924 [details] creeIndexPhotos.bash My script to create a "contact sheet" for each photo directory
Created attachment 11925 [details] communFunctions.include File embedding functions shared by several scripts
Created attachment 11926 [details] communDefine.include File embedding define shared by several scripts
Created attachment 11927 [details] example of contact sheet obtained with the script creeIndexPhotos
Hi, in the Comment 25 no pb detected... But the next day (saturday), same conditions + digikam launched, with several use of my script involving convert and montage. At the end of the evening, without any problem so far, I was alerted by a slowdown when passing from one window to another...(the script was not in action), quickly made an eye with journalctl...to see the now well known errors of "core notifier timeout"... I then had, as if to monitor the memory with top, the idea of monitoring the log with journalctl -f And to my surprise I was able to generate "core notifier timeout" by simply clicking on any task in the taskbar!!!! So much so that I took a video of it https://www.dailymotion.com/video/x7wrcmv (also in URL at beginning) Pay attention to the concomitance between the clicks of the cursor at the bottom and the generation of errors in the left terminal. On the right you can also see the top terminal, which does not indicate anything special... I didn't try to drown the "core notifier timeout" system to see if I lost the GUI...disappointed I went to bed...
Sooner than me! Once again, thank you for your painstaking research, and this intelligent testing & report - which does point the finger at X. In the light of which, I change tack and ask: The next time you get a loss of GUI - but still have remote access - please do as quickly as possible (and post or attach the ps & top outputs): $ ps -e $ top # journalctl -b --no-hostname > a_file Confirm that Ctrl/Alt/F2-6, or Ctrl/Alt/Bksp/Bksp, has no effect on the crippled box. Edit the saved journal file to note the exact point (as best as you can judge) when you lost the GUI. Then compress it with xz, and attach that, saying what you were doing at the time (applications, scripts). It is often difficult to pin down events in journals extracted retrospectively. Also please compress & attach /var/log/Xorg.0.log Regret asking similar things again. Trying to document it *when it happens*. Aurelien is following this. Do you have any other ideas?
I do read all of this and it mâles me feel there is memory leak somewhere. Perhaps in Nouveau driver, perhaps in script. Seems OP lost GUI control by getting Out Of Memory.
Hi, sorry, I write in live...report a segfault in libgobject-2 all day long, I "played" with my script, also recovering in exceptional cases that involved oom-killer. Which one did happen. Unlike the 2 "exceptional" cases already mentioned, here no "core notifier timeout" fault, no freeze. I run this script one last time always in the exceptional case but with a limit memory in montage command (montage -limit memory 16GiB...) so oom-killer not called ...............................19:37 1054 2020-10-12 19:37:26 : time /home/patrick/Photos/scriptPhoto/creeIndexPhotos.bash ...............................20:01 turn off the screen, go to eat, 2020-10-12 20:01:29 : h ...............................20:39 turn on the screen, logging and and look at the result of the script look at the result of the script 2020-10-12 20:39:58 : ls -rtl 2020-10-12 20:40:10 : more entries_000_montage.error which is error more entries_000_montage.error mount: unable to write pixel cache '/tmp/magick-12060i3U61rLnnOpK': No device space available @ error/cache.c/WritePixelCachePixels/5830. mount: unable to extend cache 'entries_000_INDEX_2070.jpg': No device space available @ error/cache.c/OpenPixelCache/3888. mount: unable to extend cache 'entries_000_INDEX_2070.jpg': No device space available @ error/cache.c/OpenPixelCache/3888. mounting: Maximum supported image dimension is 65500 pixels `entries_000_INDEX_2070.jpg' @ error/jpeg.c/JPEGErrorHandler/343. and I discover a 1st "core notifier timeout". Oct. 12 20:12:01 HPxw9400 kernel: new 0000:18:00.0: DRM: core notifier timeout the script ended at ll entries_000_montage.error -rwxr-xr-x 1 patrick patrick 575 oct. 12 20:19 entries_000_montage.error* This means that this 1st "core notifier timeout" occurred during the script. A cause and effect relationship???? From there, I went back to the generator of this type of alarm by clicking on the icons in the taskbar (see previous video) but not only because leaving emacs, manipulating Thunderbird, Firefox, Thunar and the CCM also produces it ...with for this last a segfault... Oct. 12 21:20:05 HPxw9400 kernel: new 0000:18:00.0: DRM: core notifier timeout Oct. 12 21:20:14 HPxw9400 drakrpm [8564]: ### Program is exiting ### Oct. 12 21:20:18 HPxw9400 drakrpm-update [32201]: ### Program is starting ### Oct. 12 21:20:20 HPxw9400 kernel: new 0000:18:00.0: DRM: core notifier timeout Oct. 12 21:20:23 HPxw9400 kernel: new 0000:18:00.0: DRM: core notifier timeout Oct. 12 21:20:25 HPxw9400 drakrpm-update [32201]: opening the RPM database Oct. 12 21:20:25 HPxw9400 kernel: new 0000:18:00.0: DRM: core notifier timeout Oct. 12 21:20:34 HPxw9400 kernel: new 0000:18:00.0: DRM: core notifier timeout Oct. 12 21:20:34 HPxw9400 drakrpm-update [32201]: opening the RPM database Oct. 12 21:20:34 HPxw9400 drakrpm-update [32201]: opening the RPM database Oct. 12 21:20:34 HPxw9400 drakrpm-update [32201]: opening the RPM database Oct. 12 21:20:37 HPxw9400 sensord [1348]: Chip: k8temp-pci-00cb Oct. 12 21:20:37 HPxw9400 sensord [1348]: Adapter: PCI adapter Oct. 12 21:20:37 HPxw9400 sensord [1348]: Core0 Temp: 40.0 C Oct. 12 21:20:37 HPxw9400 sensord [1348]: Core1 Temp: 41.0 C Oct. 12 21:20:37 HPxw9400 sensord [1348]: Chip: new-pci-1800 Oct. 12 21:20:37 HPxw9400 sensord [1348]: Adapter: PCI adapter Oct. 12 21:20:37 HPxw9400 sensord[1348]: temp1: 42.0 C (limit = 95.0 C, hysteresis = 3.0 C) Oct. 12 21:20:37 HPxw9400 sensord [1348]: Chip: k8temp-pci-00c3 Oct. 12 21:20:37 HPxw9400 sensord [1348]: Adapter: PCI adapter Oct. 12 21:20:37 HPxw9400 sensord [1348]: Core0 Temp: 40.0 C Oct. 12 21:20:37 HPxw9400 sensord [1348]: Core1 Temp: 40.0 C Oct. 12 21:20:56 HPxw9400 kernel: new 0000:18:00.0: DRM: core notifier timeout Oct. 12 21:21:01 HPxw9400 kernel: new 0000:18:00.0: DRM: core notifier timeout Oct. 12 21:21:01 HPxw9400 drakrpm-update [32201]: ### Program is exiting ### Oct. 12 21:21:39 HPxw9400 drakconf[6353]: modified file /etc/mcc.conf Oct. 12 21:21:39 HPxw9400 drakconf[6353]: ### Program is exiting ### Oct. 12 21:21:39 HPxw9400 kernel: drakconf[6353]: segfault at 51 ip 00007ff97a770130 sp 00007ffee7c6d300 error 4 in libgobject-2.0.so.0.6000.2[7ff97a758000+31000]. Oct. 12 21:21:39 HPxw9400 kernel: Code: 89 44 24 5c 41 f6 44 24 18 10 74 56 89 c1 48 8b 05 3d 45 03 00 48 85 c0 74 48 48 89 da eb 0b 0f 1f 00 48 8b 00 48 85 c0 74 38 <48> 3b 50 08 75 f2 3b 48 10 75 ed 8b 74 24 58 3b 70 14 75 e4 c7 40 Oct. 12 21:21:41 HPxw9400 kernel: new 0000:18:00.0: DRM: core notifier timeout Switching from a window or a work area is slow but there is no loss of GUI top does not show anything abnormal in RAM and swap Xorg.0.log either
Created attachment 11931 [details] journalctl --no-hostame -b
Created attachment 11932 [details] Xorg.0.log 2020-10-12
Hello, after writing the Comment33 I shut down the PC not as usual by the "Eteindre" button in the top right corner but by logging out with "Mageia->Déconnexion" button à left... then I lost the GUI!!!! I tried the: > Confirm that Ctrl/Alt/F2-6, or Ctrl/Alt/Bksp/Bksp, has no effect on the crippled box. I confirm without result! via ssh I didn't see anything abnormal, except that there was another Xorg log...normal???? I restarted lightm.service without finding the GUI again. So in root shutdown -h now the PC... And this morning looking at the journal log I discovered some broken X11 and Nouveau errors mixed with Xorg. > Edit the saved journal file to note the exact point (as best as you can judge) I mark this with POINT I JUDGE TO LOSS GUI in the new attached journal, so just after you can see oct. 12 23:53:09 kdeinit5[29214]: kdeinit5: Fatal IO error: client killed oct. 12 23:53:10 at-spi-bus-launcher[9974]: X connection to :0 broken (explicit kill or server shutdown). oct. 12 23:53:09 klauncher[29215]: The X11 connection broke (error 1). Did the X11 server die? oct. 12 23:53:09 kactivitymanagerd[29186]: The X11 connection broke (error 1). Did the X11 server die? oct. 12 23:53:10 kglobalaccel5[29195]: The X11 connection broke (error 1). Did the X11 server die? oct. 12 23:53:10 unknown[9991]: mate-screensaver: Fatal IO error 11 (Ressource temporairement non disponible) on X server :0. Then the mixed errors oct. 13 00:26:10 kernel: nouveau 0000:18:00.0: Xorg[3305]: failed to idle channel 3 [Xorg[3305]] occurs after remote ssh login Also, below the commands typed under ssh (ended with su for shutdown) 968 2020-10-13 00:04:47 : top 969 2020-10-13 00:05:10 : more /var/log/Xorg.0.log 970 2020-10-13 00:06:31 : ll /var/log/Xorg.0.log 971 2020-10-13 00:06:49 : more /var/log/Xorg.0.log.old 972 2020-10-13 00:07:20 : cd memos 973 2020-10-13 00:07:26 : cd pbInstallMageia7/ 974 2020-10-13 00:07:54 : ls -rtl 975 2020-10-13 00:09:15 : more /var/log/Xorg.0.log > Xorg.0.log.deconnect2020-10-12.log 976 2020-10-13 00:12:50 : Ctrl/Alt/Bksp/Bksp 977 2020-10-13 00:13:04 : systemctl status 978 2020-10-13 00:15:38 : systemctl is-system-running 979 2020-10-13 00:16:04 : systemctl |grep -i failed 980 2020-10-13 00:16:41 : systemctl status network.service 981 2020-10-13 00:17:30 : journalctl -no-hostname -b 982 2020-10-13 00:17:38 : journalctl --no-hostname -b 983 2020-10-13 00:19:30 : ll 984 2020-10-13 00:20:00 : ls -rtl 985 2020-10-13 00:20:19 : journalctl --no-hostname -xe 986 2020-10-13 00:21:30 : ls -rtl 987 2020-10-13 00:22:08 : journalctl --no-hostname -b > journalctl-b.deconnect2020-10-12.log 988 2020-10-13 00:22:27 : journalctl --no-hostname -b 989 2020-10-13 00:23:19 : systemctl status 990 2020-10-13 00:23:50 : systemctl status |grep -i xorg 991 2020-10-13 00:25:12 : systemctl status lightdm.service 992 2020-10-13 00:25:42 : systemctl restart lightdm.service 993 2020-10-13 00:29:08 : systemctl status lightdm.service 994 2020-10-13 00:29:46 : systemctl status network.service 995 2020-10-13 00:30:53 : systemctl restart network.service 996 2020-10-13 00:31:16 : journalctl -xe 997 2020-10-13 00:32:01 : systemctl status network.service 998 2020-10-13 00:32:28 : top 999 2020-10-13 00:34:06 : ps -e 1000 2020-10-13 00:35:02 : su - The 992, 995, 1000 marked in the journal with COMMAND FROM SSH
Created attachment 11933 [details] journalctl --no-hostame -b -1 with loss GUI Modified with marking: more journalctl-b1lossGUI2020-10-12.log | grep -E "POINT|COMMAND " POINT I JUDGE TO LOSS GUI COMMAND FROM SSH REMOTE LOGIN COMMAND FROM SSH with sudo= 992 2020-10-13 00:25:42 : systemctl restart lightdm.service COMMAND FROM SSH with sudo= 995 2020-10-13 00:30:53 : systemctl restart network.service COMMAND FROM SSH with su 1000 2020-10-13 00:35:02 : su -
This was written *before* c36 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Thanks you for the last report and journal/log attachments. The journal particularly has things of interest. [The video said it was private, so could not view it] [Quickly: "drakconf[6353]: segfault" when it exits is regrettably common...] CC'ing Dave Hodgins in case he has any ideas. Similarly the kernel/drivers team. For the last journal attached, there are 2 out-of-memory (oom) in 15.54. For the rest, start at 20.00. I would be tempted to constantly monitor memory usage - when the programs of interest are running (which the script caters for, so you can leave it running all the time: it only outputs when an interesting program is running) - with a script along the lines:- ---------------------------------------------------------- # Script to monitor defined processes at defined interval # To save the output, use $ ./scriptname | tee logfile # Define here programs of interest as shown by 'ps' with leading space # This is to eliminate spurious other hits in the 'ps' output PROGS=' <prog1>| <prog2>| <etc>' clear while true do # Only log when something is running we are interested in # grep's own entry + argument always shows, so cut it out if ps ax | grep -v grep | grep -E "$PROGS" > /dev/null then echo date ps ax | grep -v grep | grep -E "$PROGS" # Limit top O/P (via head), otherwise it lists all processes # You can adjust the 'head' parameter to give an exact screenful echo top -b -n1 | head -n19 fi # Set the interval here, seconds sleep 60 done ------------------------------------------------------------- I have tried it, it works. Run on a spare terminal|virtual console as: $ ./scriptname | tee logfile to see & preserve the output. One can wonder whether this obsession with memory is relevant. Could it be *video memory* which was mentioned in other threads on Nouveau+nVidia ? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ P.S. Bug 27319 is about Xserver crashing, not the same thing I know, but it produced a patched x11-server-xorg currently in core/updates_testing pending release: x11-server-xorg-1.20.9-1.1.mga7 See https://bugs.mageia.org/show_bug.cgi?id=27319#c11 onwards. This includes several patches, so is worth trying even if it makes no difference. It should be out soon.
CC: (none) => davidwhodgins, kernel
Created attachment 11934 [details] Snapshot XFCE logout session involving: "The X11 connection broke (error 1). Did the X11 server die?" Snapshot XFCE showing: PC shutdown not directly with the "Shutdown" button, but first by logging off the user and then "Shutdown". This because directly with the "Shutdown" button the current session is not saved for the next start, with the second method it is!
Hi, for information Out of curiosity I turned off the PC as in the screenshot Comment 39, this time no GUI loss but the same X11 error messages: oct. 13 13:17:02 lightdm[4335]: Error opening audit socket: Protocol not supported oct. 13 13:17:02 lightdm[6376]: pam_unix(lightdm:session): session closed for user patrick oct. 13 13:17:02 systemd-logind[1093]: Session 2 logged out. Waiting for processes to exit. oct. 13 13:17:02 lightdm[6376]: pam_kwallet5(lightdm:session): pam_kwallet5: pam_sm_close_session oct. 13 13:17:02 lightdm[6376]: pam_kwallet(lightdm:session): pam_kwallet: pam_sm_close_session oct. 13 13:17:02 lightdm[6376]: pam_kwallet5(lightdm:setcred): pam_kwallet5: pam_sm_setcred oct. 13 13:17:02 lightdm[6376]: pam_kwallet(lightdm:setcred): pam_kwallet: pam_sm_setcred oct. 13 13:17:02 at-spi-bus-launcher[27178]: X connection to :0 broken (explicit kill or server shutdown). oct. 13 13:17:02 unknown[27195]: mate-screensaver: Fatal IO error 11 (Ressource temporairement non disponible) on X server :0. oct. 13 13:17:02 polkitd[1434]: Unregistered Authentication Agent for unix-session:2 (system bus name :1.74, object path /org/mate/Pol> oct. 13 13:17:02 klauncher[7607]: The X11 connection broke (error 1). Did the X11 server die? oct. 13 13:17:02 kglobalaccel5[7602]: The X11 connection broke (error 1). Did the X11 server die? oct. 13 13:17:02 kactivitymanagerd[7594]: The X11 connection broke (error 1). Did the X11 server die? oct. 13 13:17:02 kdeinit5[7606]: kdeinit5: Fatal IO error: client killed oct. 13 13:17:02 acpid[1069]: client 4345[0:0] has disconnected oct. 13 13:17:02 acpid[1069]: client connected from 19272[0:0]
Hi, No, you use XFCE but with some Plasma5 services. When XFCE starts his logout functions, it does not handle Plasma5 services and when X11 server dies, they complains. Lightdm seems to handle correctly the logout and waits for remaining process to quit. See second journal line in comment 40.
Hello Aurelien, > No, you use XFCE but with some Plasma5 services. When XFCE starts his logout > functions, it does not handle Plasma5 services and when X11 server dies, > they complains. > > Lightdm seems to handle correctly the logout and waits for remaining process > to quit. See second journal line in comment 40. I understand, till now, never I shutdown PC like this, it was always with the button "Eteindre". So, when I retrieved PC without GUI, it was after a long inactivity and the loggin box in normal case should have been presented to me, but it was not the case: gray screen what I call GUI loss. Thus changing the PC shutdown method, I wondered if this did not highlight a hidden problem with the "Shutdown" button shutdown method.
Hi, > [The video said it was private, so could not view it] sorry, now, it is public... > For the last journal attached, there are 2 out-of-memory (oom) in 15.54. For > the rest, start at 20.00. yes but only one call to oom-killer at 15.54, none at 20!!! I knew that there was going to be an out-of-memory because I had put my script in condition for that: working on a directory of ~2400 photos when normally it doesn't exceed 402 Before this oom there was no core notifier timeout and the first one that happens is about ~4h later oct. 12 20:12:01 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout nevertheless, my script had been launched again (with no oom condition) and ended at this moment like I described my actions in Comment 33 > I would be tempted to constantly monitor memory usage - when the programs of > interest are running (which the script caters for, so you can leave it > running all the time: it only outputs when an interesting program is > running) - with a script along the lines:- Thanks for your script! Like it is about PID and top, I have a top.log file (top > top.log) related to the same trace, but refused because it's over 1000k, I can truncate the not useful beginning to attach it. > One can wonder whether this obsession with memory is relevant. Could it be > *video memory* which was mentioned in other threads on Nouveau+nVidia ? > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ I am thinking like you, before this problem with Nouveau/XFCE, I was with Nvidia/Plasma but it was worse, hence my switch to Nouveau/XFCE less worse! See Comment 12 > P.S. Bug 27319 is about Xserver crashing, not the same thing I know, but it > produced a patched x11-server-xorg currently in core/updates_testing pending > release: x11-server-xorg-1.20.9-1.1.mga7 > See https://bugs.mageia.org/show_bug.cgi?id=27319#c11 onwards. > This includes several patches, so is worth trying even if it makes no > difference. It should be out soon. I read it, currently I am using x11-server-xorg-1.20.9-1.mga7
An update to x11-server is on its way. Feel free to add remark here. Assigning this to Kernel and Drivers Maintainers. I think we have done job to try to circumscribe the issue. Trying to sum up: This system with M5 was OK. M6 was skipped. M7 has issues: - Plasma + nvidia nonfree => instabilities. - Plasma + Nouveau driver => no freeze but sometimes, GUI is lost. - XFCE + Nouveau => idem Graphic card is a NVIDIA G80GL [Quadro FX 5600]. We know that Nvidia cards have several issues with Linux. No good documentation, closed source drivers. That I don't understand is that if M5 runs OK, what was the driver used? If it was Nvidia nonfree, what was his version?
Keywords: NEEDINFO => (none)Assignee: bugsquad => kernelCC: kernel => (none)
Hello Aurelien, I take the liberty of correcting your sum > Trying to sum up: > This system with M5 was OK -> with KDE+ nvidia nonfree, none pb > M6 was skipped. > M7 has issues: > - Plasma + nvidia nonfree => instabilities. -> freezes also > - Plasma + Nouveau driver => none test because impossible to use due too many artifacts, see Comment 12 for screenshots > - XFCE + Nouveau => no freeze but sometimes, GUI is lost.
(In reply to kalagani kalagani from comment #43) > I have a top.log file (top > top.log) related > to the same trace, but refused because it's over 1000k, I can truncate the > not useful beginning to attach it. Not only truncate it, but if you do attach it: - *annotate it* with interleaved comments about what is happening. It is very difficult relating attached journals etc to the events that matter at the time. - then use 'xz' to compress it: $ xz <filename> and upload the compressed file which ends in .xz
Hello, > Not only truncate it, but if you do attach it: > - *annotate it* with interleaved comments about what is happening. It is > very difficult relating attached journals etc to the events that matter at > the time. Sorry but I stopped the top log too early (end at top - 22:56:25),! I just realized it by comparing the end date of this one and the beginning date (POINT I JUDGE TO LOSS GUI oct. 12 23:53:06) of my annotations in the journalctl, see attachment 11933 [details] in Comment 37 So I think that this top log is useless.
Hi, I upgraded from x11-server-xorg-1.20.9-1 to x11-server-xorg-1.20.9-1.1 and in /etc/default/grub keeping none vga=791 return back to GRUB_GFXMODE=1024x768x32 GRUB_GFXPAYLOAD_LINUX=text instead of GRUB_GFXMODE=1900x1200 GRUB_GFXPAYLOAD_LINUX=keep So new core notifier timeout discovered (~19:13) when slowing down between windows without loss of GUI Then launched a top > top.log but first notifier timeout is at 17:52 (see also attached journalctl.log) none between this and the discovery...
Created attachment 11959 [details] journalctl -b with new x11-server-xorg-1.20.9-1.1
Created attachment 11960 [details] top > top.log with new x11-server-xorg-1.20.9-1.1 top.log launched dicovering slowing down when changing windows, but first core notifier time out error is before...
Hi, for informations, todays always the avril 30 19:27:27 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout avril 30 19:30:20 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout avril 30 19:37:44 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout avril 30 19:38:14 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout avril 30 19:38:47 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout avril 30 19:38:58 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout avril 30 19:39:02 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout avril 30 19:39:29 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout avril 30 19:39:31 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout avril 30 19:39:33 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout avril 30 19:39:37 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout avril 30 19:40:04 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout avril 30 19:40:09 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout avril 30 19:40:20 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout avril 30 19:40:24 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout avril 30 19:40:28 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout but also other times (since I wrote) it seems it happens when I get out of the screen saver. With or without virtualbox launched! Current configuration: uname -a Linux HPxw9400 5.10.27-desktop-1.mga7 #1 SMP Wed Mar 31 00:16:43 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux rpm -qa |grep nouveau x11-driver-video-nouveau-1.0.16-3.mga7 lib64drm_nouveau2-2.4.102-1.mga7 inxi -F ... Graphics: Device-1: NVIDIA G80GL [Quadro FX 5600] driver: nouveau v: kernel Display: x11 server: Mageia X.org 1.20.11 driver: nouveau,v4l unloaded: fbdev,modesetting,vesa resolution: 1920x1200~60Hz OpenGL: renderer: NV50 v: 3.3 Mesa 20.2.3 ...
CC: (none) => kalagani
Mageia 7 is EOL since July 1st 2021. There will not have any further bugfix for this release. You are encouraged to upgrade to Mageia 8 as soon as possible. @reporter, if this bug still apply with Mageia 8, please let us know it. @packager, if you work on the Mageia 7 version of your package, please check the Mageia 8 package if issue is also present. In this case, please fix the Mageia 8 version instead. This bug report will be closed OLD if there is no further notice within 1st September 2021.
Hello Aurelien, for information always on MAGEIA7, sometimes DRM core notifier errors from Nouveau drivers occurs. It seems that it is an exit of the screen saver which is at the origin of the 1st error core notify, following this one others follow until freezing the screen. This phenomenon seems to be amplified, but not systematically, when Virtual box is launched. My last pb: journalctl -b -6 --no-hostname|grep -E "Reboot --|core notifier|Kernel command line|virtual"-- Reboot -- juil. 15 09:25:32 kernel: Booting paravirtualized kernel on bare hardware juil. 15 09:25:32 kernel: Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.10.46-desktop-1.mga7 root=UUID=8ebeda33-9710-49a7-93a4-809d13e2809f ro splash quiet noiswmd resume=UUID=2fdb0460-b082-496e-b748-e904514ac886 audit=0 juil. 15 09:25:37 kernel: input: HP WMI hotkeys as /devices/virtual/input/input6 juil. 15 09:25:50 dkms-autorebuild.sh[806]: virtualbox (6.1.22-1.mga7): Already installed on this kernel. juil. 15 11:33:11 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout juil. 15 12:16:53 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout juil. 15 12:17:11 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout juil. 15 12:17:59 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout juil. 15 12:18:25 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout juil. 15 12:18:31 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout juil. 15 12:19:57 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout juil. 15 12:20:09 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout juil. 15 12:22:25 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout juil. 15 12:22:29 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout juil. 15 12:24:25 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout juil. 15 12:24:34 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout juil. 15 12:24:38 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout juil. 15 12:32:41 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout juil. 15 12:32:48 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout juil. 15 12:32:56 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout juil. 15 12:33:07 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout juil. 15 12:33:12 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout juil. 15 12:33:15 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout juil. 15 12:33:59 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout juil. 15 12:34:09 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout juil. 15 12:34:14 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout juil. 15 12:34:21 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout juil. 15 12:34:24 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout juil. 15 12:34:28 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout
Hi bug reporter and hi assignee and others involved, Please reopen this bug report if it is still valid for Mageia 8 or 9(cauldron), and change "Version:" in the upper left of this report accordingly. This report is being closed as OLD because it was filed against Mageia 7, for which support ended on June 30th 2021. Thanks, Marja
Resolution: (none) => OLDStatus: NEW => RESOLVED
Created attachment 13095 [details] nouveau core notifier timeout with following trap
Hello Marja, under Mageia7/XFCE-Nouveau driver I continued with the slowdowns but the freezes having resumed, I switched to Mageia8 XFCE/Cinnamon-Nouveau driver and the freezes arrive too. Neither virtualbox nor any personal script is launched. The novelty is the multiple TRAP [ cut here] traces following the core notifier timeout. CTRL+Alt+F2 is inoperative, only a remote ssh allows the shutdown I put log tracing some core notifier with TRAP from cmd: journalctl --since 13:46 --until "15:47:31" > 2022-01-20_journalctlTrapsNouveauSince13-46toSSHshutdown.log Good year despite everything...
Status comment: x11-driver-video-nouveau-1.0.16-3.mga7 => x11-driver-video-nouveau-1.0.16-3.mga7-> x11-driver-video-nouveau-1.0.17-1.mga8Version: 7 => 8
Reopening for now. Driver maintainer to tell if it is better to start a new bug instead.
CC: (none) => friResolution: OLD => (none)Status: RESOLVED => REOPENED
Hello, post here to say that the problem continues, e.g. for the month of April out of 11 starts, 5 end up with freezes -- Reboot -- avril 01 18:25:09 HPxw9400 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout avril 01 18:25:13 HPxw9400 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout -- Reboot -- avril 02 23:02:08 HPxw9400 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout avril 02 23:02:12 HPxw9400 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout -- Reboot -- avril 03 23:09:29 HPxw9400 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout avril 03 23:09:33 HPxw9400 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout avril 03 23:09:36 HPxw9400 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout -- Reboot -- -- Reboot -- -- Reboot -- -- Reboot -- -- Reboot -- -- Reboot -- avril 06 17:19:39 HPxw9400 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout avril 06 18:28:16 HPxw9400 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout avril 06 18:28:38 HPxw9400 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout avril 06 18:34:46 HPxw9400 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout avril 06 18:36:23 HPxw9400 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout -- Reboot -- avril 06 20:36:28 HPxw9400 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout avril 06 23:02:54 HPxw9400 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout avril 06 23:03:10 HPxw9400 kernel: nouveau 0000:18:00.0: DRM: core notifier timeout -- Reboot --
Created attachment 13231 [details] journalctl --since "2022-04-01" --until "2022-04-30" --no-hostname |grep -E "drivers/gpu/drm/nouveau|- Reboot|DRM: base|DRM: core"
Hello, log in attachment for the only April month from command journalctl --since "2022-04-01" --until "2022-04-30" --no-hostname |grep -E "drivers/gpu/drm/nouveau|- Reboot|DRM: base|DRM: core" > journalctl2022-04grepRebootnouveauDRM.log In this log, sometimes but not always when DRM fault WARNING in files drivers/gpu/drm/nouveau/dispnv50/disp.c:213 nv50_dmac_wait+0x1e1/0x230 [nouveau] or drivers/gpu/drm/nouveau/nvkm/engine/fifo/channv50.c:85 nv50_fifo_chan_engine_fini+0x224/0x270 [nouveau] Configuration: rpm -qa |grep nouveau x11-driver-video-nouveau-1.0.17-1.mga8 lib64drm_nouveau2-2.4.110-1.mga8
Hello, just so you know, since I switched to iceWM instead of xfce, no more freezes. The difference between these 2 desktops is that there is no screensaver-locker launched with iceWM. I already suspected this feature since freezes were often seen after a period of inactivity. Surprise, it is not an xfce screensaver that is launched during Xfce sessions but the Cinnamon one. Indeed I also installed this last in parallel of XFCE. Under XFCE journalctl often shows cinnamon-screensaver: Fatal IO error 11 alone or before or after new 0000:18:00.0: DRM: core notifier timeout Under iceWM no trace of cinnamon-screensaver since no screensaver is run and none core notifier timeout error
(In reply to kalagani kalagani from comment #61) > Hello, > just so you know, since I switched to iceWM instead of xfce, no more freezes. > > The difference between these 2 desktops is that there is no > screensaver-locker launched with iceWM. > I already suspected this feature since freezes were often seen after a > period of inactivity. > Surprise, it is not an xfce screensaver that is launched during Xfce > sessions but the Cinnamon one. > Indeed I also installed this last in parallel of XFCE. > > Under XFCE journalctl often shows > cinnamon-screensaver: Fatal IO error 11 > alone or before or after > new 0000:18:00.0: DRM: core notifier timeout > Under iceWM no trace of cinnamon-screensaver since no screensaver is run > and none core notifier timeout error A choice when using 2 Xiaomi monitors https://cookieclicker2.io
CC: (none) => wardrose4472902