Bug 10186

Summary: gkrellm stops with: "xmessage: gkrellm sensors aborted"
Product: Mageia Reporter: Chris Denice <eatdirt>
Component: RPM PackagesAssignee: Bruno Cornec <bruno>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: Normal CC: anssi.hannula, gmontalbine, hc, jim, jiml, mageia, pasnak, terraagua
Version: 3Keywords: Triaged
Target Milestone: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Source RPM: gkrellm-2.3.5-8.mga3.src.rpm CVE:
Status comment:

Description Chris Denice 2013-05-20 21:57:09 CEST
Starting gkrellm with sensors on, for fan, cpu temp etc... in dock mode:

gkrellm -w

crashes after a few hours with the above message. There is nothing in the log, but this started only in the last few days (on Cauldron), maybe related to new kernel and/or libsensor?

I tried:

gkrellm -w --without-libsensors


which still crashes after a while with the same message. This happens on 2 different computers running mga3 now.

Cheers,
chris.


Reproducible: 

Steps to Reproduce:
Manuel Hiebel 2013-05-20 23:20:40 CEST

Keywords: (none) => Triaged
Assignee: bugsquad => bruno

Comment 1 Bruno Cornec 2013-05-24 01:11:20 CEST
What is your window manager ? Could also be related to that.

Upstream at: http://members.dslextreme.com/users/billw/gkrellm/Bugs.html mention also some potential issues wrt the -w usage.
Comment 2 Chris Denice 2013-05-24 08:14:38 CEST
That's fvwm2, I can try with others and without the -w.
Comment 3 Henrik Christiansen 2013-05-25 10:19:25 CEST
Same happens for me on a mageia 3 i586 kde upgraded from mageia 2.

CC: (none) => hc

Comment 4 Chris Denice 2013-09-05 09:15:49 CEST
I ran on gdb and got the following backtrace. It seems it is linked to nvidia drivers. Do you use also nvidia drivers Henrik?

--------------

Program received signal SIGABRT, Aborted.
0x00007f9245f54ab9 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x00007f9245f54ab9 in raise () from /lib64/libc.so.6
#1  0x00007f9245f560e8 in abort () from /lib64/libc.so.6
#2  0x00007f9245f92ae7 in __libc_message () from /lib64/libc.so.6
#3  0x00007f9245f99ca7 in _int_free () from /lib64/libc.so.6
#4  0x00007f92415570ac in ?? () from /usr/lib64/nvidia-current/libGL.so.1
#5  0x00007f923f26917a in ?? () from /usr/lib64/nvidia-current/tls/libnvidia-tls.so.319.49
#6  0x00007f9247a659a2 in g_system_thread_new () from /lib64/libglib-2.0.so.0
#7  0x00007f9247a4a50f in g_thread_new_internal () from /lib64/libglib-2.0.so.0
#8  0x00007f92479f9e13 in g_thread_create_full () from /lib64/libglib-2.0.so.0
#9  0x00007f92479f9e67 in g_thread_create () from /lib64/libglib-2.0.so.0
#10 0x000000000044e980 in update_sensors ()
#11 0x0000000000419af7 in update_monitors ()
#12 0x00007f9247a25e13 in g_timeout_dispatch () from /lib64/libglib-2.0.so.0
#13 0x00007f9247a252b6 in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
#14 0x00007f9247a25608 in g_main_context_iterate.isra.23 () from /lib64/libglib-2.0.so.0
#15 0x00007f9247a25a0a in g_main_loop_run () from /lib64/libglib-2.0.so.0
#16 0x00007f9248b84997 in gtk_main () from /lib64/libgtk-x11-2.0.so.0
#17 0x0000000000415e2d in main ()
Comment 5 Chris Denice 2013-09-06 21:37:25 CEST
I can confirm that it comes from the nvidia drivers, so I have added anssi in the CC list. I have no crash if I used the nouveau driver. I'll try to get more info on the backtrace by installing the debug nvidia packages.
Master anssi, if you have any suggestions, I'll take them.

Cheers,
chris.

CC: (none) => anssi.hannula

Comment 6 Chris Denice 2013-09-07 22:48:26 CEST
*** Bug 10594 has been marked as a duplicate of this bug. ***

CC: (none) => gmontalbine

Comment 7 Chris Denice 2013-09-07 22:50:51 CEST
With all debuginfo-installed; and nvidia drivers loaded. It seems I am still missing some debug info on the nvidia-current, is that because it is tainted?

----

#0  0x00007f293bc5aab9 in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007f293bc5c0e8 in __GI_abort () at abort.c:90
#2  0x00007f293bc98ae7 in __libc_message (do_abort=do_abort@entry=2, 
    fmt=fmt@entry=0x7f293bd9d3c8 "*** Error in `%s': %s: 0x%s ***\n")
    at ../sysdeps/unix/sysv/linux/libc_fatal.c:196
#3  0x00007f293bc9fca7 in malloc_printerr (ptr=0x212af60, 
    str=0x7f293bd9d410 "double free or corruption (!prev)", action=3) at malloc.c:4902
#4  _int_free (av=<optimized out>, p=0x212af50, have_lock=<optimized out>) at malloc.c:3758
#5  0x00007f293725d0ac in ?? () from /usr/lib64/nvidia-current/libGL.so.1
#6  0x00007f2934f6f17a in ?? () from /usr/lib64/nvidia-current/tls/libnvidia-tls.so.319.49
#7  0x00007f293d76b9a2 in g_system_thread_new (
    thread_func=thread_func@entry=0x7f293d6ffa90 <g_deprecated_thread_proxy>, 
    stack_size=stack_size@entry=0, error=error@entry=0x0) at gthread-posix.c:1132
#8  0x00007f293d75050f in g_thread_new_internal (name=name@entry=0x0, 
    proxy=proxy@entry=0x7f293d6ffa90 <g_deprecated_thread_proxy>, 
    func=func@entry=0x448e60 <read_sensors_thread>, data=data@entry=0x0, 
    stack_size=stack_size@entry=0, error=error@entry=0x0) at gthread.c:884
#9  0x00007f293d6ffe13 in g_thread_create_full (
    func=func@entry=0x448e60 <read_sensors_thread>, data=data@entry=0x0, 
    stack_size=stack_size@entry=0, joinable=joinable@entry=0, bound=bound@entry=0, 
---Type <return> to continue, or q <return> to quit---
    priority=priority@entry=G_THREAD_PRIORITY_LOW, error=error@entry=0x0)
    at deprecated/gthread-deprecated.c:374
#10 0x00007f293d6ffe67 in g_thread_create (func=func@entry=0x448e60 <read_sensors_thread>, 
    data=data@entry=0x0, joinable=joinable@entry=0, error=error@entry=0x0)
    at deprecated/gthread-deprecated.c:343
#11 0x000000000044e980 in run_sensors_thread () at sensors.c:230
#12 update_sensors () at sensors.c:1273
#13 0x0000000000419af7 in update_monitors () at main.c:362
#14 0x00007f293d72be13 in g_timeout_dispatch (source=source@entry=0x223b670, 
    callback=<optimized out>, user_data=<optimized out>) at gmain.c:4450
#15 0x00007f293d72b2b6 in g_main_dispatch (context=0x2101820) at gmain.c:3065
#16 g_main_context_dispatch (context=context@entry=0x2101820) at gmain.c:3641
#17 0x00007f293d72b608 in g_main_context_iterate (context=0x2101820, block=block@entry=1, 
    dispatch=dispatch@entry=1, self=<optimized out>) at gmain.c:3712
#18 0x00007f293d72ba0a in g_main_loop_run (loop=0x227b3a0) at gmain.c:3906
#19 0x00007f293e88a997 in IA__gtk_main () at gtkmain.c:1257
#20 0x0000000000415e2d in main (argc=1, argv=0x7fff69f641f8) at main.c:2345
Comment 8 Chris Denice 2013-09-07 22:57:48 CEST
More information with list:

(gdb) list
51	       fork/vfork function temporarily invalidated the PID field.  Adjust for
52	       that.  */
53	    if (__builtin_expect (pid <= 0, 0))
54	      pid = (pid & INT_MAX) == 0 ? selftid : -pid;
55	
56	  return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
57	}
58	libc_hidden_def (raise)
59	weak_alias (raise, gsignal)
(gdb) up
#1  0x00007f293bc5c0e8 in __GI_abort () at abort.c:90
90	      raise (SIGABRT);
(gdb) up
#2  0x00007f293bc98ae7 in __libc_message (do_abort=do_abort@entry=2, 
    fmt=fmt@entry=0x7f293bd9d3c8 "*** Error in `%s': %s: 0x%s ***\n")
    at ../sysdeps/unix/sysv/linux/libc_fatal.c:196
196	      abort ();
(gdb) up
#3  0x00007f293bc9fca7 in malloc_printerr (ptr=0x212af60, 
    str=0x7f293bd9d410 "double free or corruption (!prev)", action=3) at malloc.c:4902
4902	      __libc_message (action & 2, "*** Error in `%s': %s: 0x%s ***\n",
(gdb) up
#4  _int_free (av=<optimized out>, p=0x212af50, have_lock=<optimized out>) at malloc.c:3758
3758	      malloc_printerr (check_action, errstr, chunk2mem(p));
(gdb) up
#5  0x00007f293725d0ac in ?? () from /usr/lib64/nvidia-current/libGL.so.1
(gdb) list
3753	    {
3754	      errstr = "free(): invalid pointer";
3755	    errout:
3756	      if (! have_lock && locked)
3757		(void)mutex_unlock(&av->mutex);
3758	      malloc_printerr (check_action, errstr, chunk2mem(p));
3759	      return;
3760	    }
3761	  /* We know that each chunk is at least MINSIZE bytes in size or a
3762	     multiple of MALLOC_ALIGNMENT.  */
james Whitby 2013-09-07 23:49:04 CEST

CC: (none) => jim

Comment 9 James Kerr 2013-09-10 19:36:38 CEST
*** Bug 11210 has been marked as a duplicate of this bug. ***

CC: (none) => terraagua

Comment 10 jim l 2013-11-02 05:03:10 CET
The problem appears to be a double free, which really isn't the end of the world and should (I think) raise a SIGABRT.

I patched the function run_sensors_thread in sensors.c like this:

static void
run_sensors_thread(void)
	{
	GError *error = NULL;
	if (thread_busy)
		return;
	thread_busy = TRUE;
	signal(SIGABRT, SIG_IGN);
	g_thread_create(read_sensors_thread, NULL, FALSE, &error);
	if(error != NULL)
	  printf(" %d %s\n",(uint)error->code, error->message);
	signal(SIGABRT, SIG_DFL);
	}

Note that you do have to #include <signal.h> at the top of the file for this to work.

This is consistent with the signal handling that is initialized in main() and, so long as that double free is the only error that pops up in the Nvidia driver, this ought to let things keep running.  I hope.  We'll see, anyway.

CC: (none) => jiml

Comment 11 jim l 2013-11-02 05:05:25 CET
The problem appears to be a double free, which really isn't the end of the world and should (I think) raise a SIGABRT.

I patched the function run_sensors_thread in sensors.c like this:

static void
run_sensors_thread(void)
	{
	GError *error = NULL;
	if (thread_busy)
		return;
	thread_busy = TRUE;
	signal(SIGABRT, SIG_IGN);
	g_thread_create(read_sensors_thread, NULL, FALSE, &error);
	if(error != NULL)
	  printf(" %d %s\n",(uint)error->code, error->message);
	signal(SIGABRT, SIG_DFL);
	}

Note that you do have to #include <signal.h> at the top of the file for this to work.

This is consistent with the signal handling that is initialized in main() and, so long as that double free is the only error that pops up in the Nvidia driver, this ought to let things keep running.  I hope.  We'll see, anyway.
Comment 12 jim l 2013-11-02 05:06:10 CET
The problem appears to be a double free, which really isn't the end of the world and should (I think) raise a SIGABRT.

I patched the function run_sensors_thread in sensors.c like this:

static void
run_sensors_thread(void)
	{
	GError *error = NULL;
	if (thread_busy)
		return;
	thread_busy = TRUE;
	signal(SIGABRT, SIG_IGN);
	g_thread_create(read_sensors_thread, NULL, FALSE, &error);
	if(error != NULL)
	  printf(" %d %s\n",(uint)error->code, error->message);
	signal(SIGABRT, SIG_DFL);
	}

Note that you do have to #include <signal.h> at the top of the file for this to work.

This is consistent with the signal handling that is initialized in main() and, so long as that double free is the only error that pops up in the Nvidia driver, this ought to let things keep running.  I hope.  We'll see, anyway.
Comment 13 jim l 2013-11-02 05:07:20 CET
sorry about the multiples.  I did not realize that they were actually getting posted, when I was getting a login message after submitting (cookies were blocked).
Comment 14 jim l 2013-11-02 05:30:07 CET
Sorry...the mod I made can't work.  Move the signal commands to enclose the for loop in read_sensors_thread() and that should work.
JP Pasnak 2013-12-11 01:17:41 CET

CC: (none) => pasnak

Comment 15 Bruno Cornec 2014-01-26 12:15:47 CET
I use it without the -w on KDE with Driver "nvidia" without issue.
There is no new upstream version. 
Should we include a patch based on the upper code ? Would someone provide one clean that apply on the latest SVN version ?
Comment 16 Chris Denice 2014-01-26 12:34:19 CET
I don't have this bug anymore with latest nvidia kernel drivers, if that the case for others, we should simply closed the bug as resolved!
Comment 17 Gary Montalbine 2014-01-26 14:13:44 CET
I had reported this as  a bug, I think last summer. It is no longer a problem. Thanks for the fix.
Comment 18 Sander Lepik 2014-01-26 14:45:34 CET
Closing then..

Status: NEW => RESOLVED
CC: (none) => mageia
Resolution: (none) => FIXED