Bug 23048

Summary: desktop almost frozen for nearly 1 hour
Product: Mageia Reporter: Antonin Roussel <antonin.roussel>
Component: RPM PackagesAssignee: Kernel and Drivers maintainers <kernel>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: Normal CC: marja11, shlomif, smelror
Version: 6   
Target Milestone: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Source RPM: CVE:
Status comment:
Attachments: journal excerpt - bug since 14:22 (desktop clock frozen)
journal excerpt starting earlier - display bug since 14:22 (desktop clock frozen)
journal excerpt - bug 2 - involving Web Content
journal excerpt starting earlier - display bug 1 since 14:22 (desktop clock frozen)

Description Antonin Roussel 2018-05-17 15:44:55 CEST
Description of problem:
Since 14:22 desktop was frozen, barely able to move the mouse around (very very slowly, no click effect)
Keyboard looks inactive too, no VerNum key light switch.
In a similar situation, I was unable to Alt+Syst+{REISUB}
Succeeded in jumping in tty2, but could not log in, because of time out 60 seconds
then I killed the station through the power button...

I don't know why it happens, but you will find attached a journalctl excerpt.
What I was doing : system update, boinc grid, firefox, python, image processing, ...
Comment 1 Antonin Roussel 2018-05-17 15:45:59 CEST
Created attachment 10163 [details]
journal excerpt - bug since 14:22 (desktop clock frozen)
Comment 2 Antonin Roussel 2018-05-17 15:47:11 CEST
Comment on attachment 10163 [details]
journal excerpt - bug since 14:22 (desktop clock frozen)

Reboot 15:10
Comment 3 Antonin Roussel 2018-05-17 15:50:58 CEST
A strange behaviour happened at the begining (or slightly before) : imagemagick library was not able to recognize jpeg files in order to reduce them.
ImageMagick: No decode delegate for this image format `' @ error/constitute.c/ReadImage/504

May that denotes a memory problem ?
Comment 4 Marja Van Waes 2018-05-18 08:08:43 CEST
(In reply to Antonin Roussel from comment #3)
> A strange behaviour happened at the begining (or slightly before) :
> imagemagick library was not able to recognize jpeg files in order to reduce
> them.
> ImageMagick: No decode delegate for this image format `' @
> error/constitute.c/ReadImage/504
> 
> May that denotes a memory problem ?

Maybe.. I'm wondering what this means:

mai 17 14:22:44 localhost.localdomain kernel: vboxdrv: ffffffffc0ba2020 VMMR0.r0
mai 17 14:22:44 localhost.localdomain kernel: vboxdrv: ffffffffc0cc2020 VBoxDDR0.r0

because later, while your system was still frozen, there is:

mai 17 14:54:13 localhost.localdomain kernel: vboxwrapper_261 invoked oom-killer: gfp_mask=0x14201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), nodemask=(null),  order=0, oom_score_adj=0
mai 17 14:54:13 localhost.localdomain kernel: vboxwrapper_261 cpuset=/ mems_allowed=0
mai 17 14:54:13 localhost.localdomain kernel: CPU: 2 PID: 31270 Comm: vboxwrapper_261 Tainted: G           O    4.14.20-desktop-1.mga6 #1
mai 17 14:54:13 localhost.localdomain kernel: Hardware name: System manufacturer System Product Name/H110T, BIOS 1802 05/26/2016
mai 17 14:54:13 localhost.localdomain kernel: Call Trace:
mai 17 14:54:13 localhost.localdomain kernel:  dump_stack+0x5c/0x85
mai 17 14:54:13 localhost.localdomain kernel:  dump_header.isra.29+0x91/0x20a
mai 17 14:54:13 localhost.localdomain kernel:  oom_kill_process+0x218/0x3d0
mai 17 14:54:13 localhost.localdomain kernel:  out_of_memory+0xee/0x540
etc.

Assigning the kernel & driver maintainers

Assignee: bugsquad => kernel
CC: (none) => marja11

Comment 5 Shlomi Fish 2018-05-18 08:57:51 CEST
Is the problem reproducible? Is it possible that your machine ran out of RAM?

CC: (none) => shlomif

Comment 6 Antonin Roussel 2018-05-18 11:10:26 CEST
Created attachment 10167 [details]
journal excerpt starting earlier - display bug since 14:22 (desktop clock frozen)
Comment 7 Antonin Roussel 2018-05-18 11:11:44 CEST
Created attachment 10168 [details]
journal excerpt - bug 2 - involving Web Content
Comment 8 Antonin Roussel 2018-05-18 11:21:14 CEST
Created attachment 10169 [details]
journal excerpt starting earlier - display bug 1 since 14:22 (desktop clock frozen)

Attachment 10163 is obsolete: 0 => 1
Attachment 10167 is obsolete: 0 => 1

Comment 9 Antonin Roussel 2018-05-18 11:44:04 CEST
* I think the problem is reproducible, but I don't know how yet.
* RAM shortage is probably the reason, but I don't know how to monitor it. I would like, to be able to get out before falling in the "black hole".

- Is journalctl the best tool to get RAM history informations ?
I will give a try to _free_ command logged every minute.
But a _top_ command head, on memory usage maybe better...
((I just realize journalctl gives memory shortage alert))

- My memory is 8Gio + SWAP 1Gio
free
              total       utilisé      libre     partagé tamp/cache   disponible
Mem:        8070580     3529700      224468     1134300     4316412     3101296
Partition d'échange:     1048572       32768     1015804


- In the strange behaviour chapter, I would add that in the browser, some heavy images were not displayed anymore by javascript libraries handling 360x180 immersive pictures.
I already got this bug when making panorama pictures with hugin.
While freezing, the station shows a continuous front orange light (hard drive LED), next to the usual permanent front blue one (power LED). This indicates a continuous hard drive usage.

- I have got the bug again later after the one in attachment 10167 [details]
attachment 10169 [details] : 
 14:08 kernel: exact_client_0
 14:22 kernel: vboxdrv
 15:10 reboot
attachment 10168 [details] : 
 17:25 kernel: Web Content
 17:51 reboot

- About pictures handled : some are more than 10000x5000 which is far too large (more than 50MP in memory for a 7Mio file)
Actually, I use up to three or four graphic applications together to handle these pictures, and in the background boinc can use vbox, sometimes intensively. Of course this could lead the working station to sweat a lot !
Comment 10 Stig-Ørjan Smelror 2018-05-21 00:39:43 CEST
Hi.

You could try to add some extra swap by adding a swap file. I think 1GB may be too small for your usage.

This is _one_ HOWTO:
https://www.cyberciti.biz/faq/linux-add-a-swap-file-howto/

Cheers,
Stig

CC: (none) => smelror

Comment 11 Antonin Roussel 2019-01-28 14:19:58 CET
Thank you for the nudge.
Several months ago, I have added some SWAP directly on unused space from the disk. So there was more than 10Gio SWAP, and things got better.

# swapon -s
Nom de fichier         Type            Taille    Utilisé   Priorité
/dev/sda14             partition       10533696  6818312   -2
/dev/sda3              partition       1048572   0         -3

# top
top - 13:37:54 up 36 days, 23:10,  2 users,  load average: 4,62, 4,72, 4,64
Tâches: 261 total,   5 en cours, 198 en veille,   0 arrêté,   0 zombie
%Cpu0  :  67,3/1,0    68[|||||||||||||||||||||||||||||||||||||                ]
%Cpu1  :  71,0/3,3    74[||||||||||||||||||||||||||||||||||||||||             ]
%Cpu2  :  70,1/1,7    72[||||||||||||||||||||||||||||||||||||||               ]
%Cpu3  :  64,9/2,3    67[|||||||||||||||||||||||||||||||||||                  ]
KiB Mem : 57,0/8070376  [||||||||||||||||||||||||||||||                       ]
KiB Éch : 58,9/11582268 [|||||||||||||||||||||||||||||||                      ]
KiB Mem :  8070376 total,   157156 libr,  2759228 util,  5153992 tamp/cache
KiB Éch : 11582268 total,  4763956 libr,  6818312 util.  3479204 dispo Mem 

Today, things are still better. But I still see used swap increasing over days, and roughly once a month, the computer fall back in freezing mode for one or more hours.
I found no-way to free its swap from shell, because my free memory is too low :
# swapoff -a
swapoff: /dev/sda14 : échec de swapoff: Ne peut allouer de la mémoire
# swapon -a

The only way I found to free a big chuck of the swap (almost all of it), is to close the graphic session, and start a new one. Do you think this is an expected behaviour ? (This make me think there is something wrong from graphic session).

Maybe could I play with swappiness, or try to clear cache from Mem.
Comment 12 Antonin Roussel 2019-02-04 01:45:36 CET
Addendum : swap was 100% full, Mem 60% full, and still no freezing. Then it was possible to close many of opened applications, and to successfully execute command # swap off -a && swap on -a
swap emptied. This makes me happy.
Comment 13 Antonin Roussel 2019-03-05 09:02:38 CET
plasmashell can be greedy in memory and swap usage: when desktop background image is set to change every hour, picking out of a large directory of pictures.
I got a great relief when killing plasmashell, before relaunching it: 
1) Alt+F2 killall plasmashell 
2) Alt+F2 plasmashell

Thank you for your advices.

Resolution: (none) => FIXED
Status: NEW => RESOLVED