Bug 538 - Fast scrolling crashes libreoffice
Summary: Fast scrolling crashes libreoffice
Status: RESOLVED FIXED
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: Cauldron
Hardware: x86_64 Linux
Priority: Normal normal
Target Milestone: ---
Assignee: Mageia Bug Squad
QA Contact:
URL:
Whiteboard:
Keywords: NEEDINFO
Depends on:
Blocks:
 
Reported: 2011-03-24 23:27 CET by Frank Griffin
Modified: 2011-08-22 23:50 CEST (History)
1 user (show)

See Also:
Source RPM: libreoffice
CVE:
Status comment:


Attachments
Document with math equations that causes the crash (93.26 KB, application/octet-stream)
2011-03-25 02:18 CET, Frank Griffin
Details

Description Frank Griffin 2011-03-24 23:27:26 CET
Open pretty much any reasonably large .odt file in libreoffice writer, and drag the vertical scrollbar up and down quickly and repeatedly, and libreoffice will crash.  No stderr when started from a console, either, it just disappears.

If you install libreoffice-debug, you get a popup window which disappears along with the app window immediately.  But, then, you do get stderr:

[ftg@localhost ~]$ ooffice
[ftg@localhost ~]$ **
GConf:ERROR:gconf-client.c:2159:gconf_client_lookup: assertion failed: (last_slash != NULL)


Reproducible: 

Steps to Reproduce:
Comment 1 D Morgan 2011-03-25 00:07:26 CET
can you please start libreoffice in gdb to provide a backtrace ?

Keywords: (none) => NEEDINFO
CC: (none) => dmorganec

D Morgan 2011-03-25 00:07:38 CET

Source RPM: libreoffice-core-3.3.1.2-7.mga1.x86_64.rpm => libreoffice

Comment 2 Frank Griffin 2011-03-25 01:55:25 CET
(In reply to comment #1)
> can you please start libreoffice in gdb to provide a backtrace ?

Not really, due to bug#541 .  Unfortunately, the SEGV taken by libreoffice occurs on the X dispatch thread, and when gdb, started in a terminal window, suspends it because of the SEGV, it freezes the desktop.  Bug#541 prevents running gdb from a VC.

However, if you attach gdb from a terminal window, trigger the SEGV, and then switch to a shrunken VC, do a "killall gdb", and switch back to the desktop, what you see in the terminal window is:

(gdb) cont
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x00007f1f362ee7ee in X11SalGraphics::DrawCairoAAFontString (
    this=<value optimized out>, rLayout=...)
    at /usr/src/debug/libreoffice-3.3.1.2/vcl/unx/source/gdi/salgdi3.cxx:1047
1047	        void *pPattern = pOptions ? pOptions->GetPattern(pId) : NULL;
(gdb) Quitting: Can't detach Thread 0x7f1f1db1f710 (LWP 6110): No such process

Sorry, but that's the best I can do until bug#541 is fixed or until fgrlx works again.  From the look of it, either pOptions is non-NULL but garbage, or else pId has a value that leads to a SEGV in GetPattern().
Comment 3 Frank Griffin 2011-03-25 02:07:35 CET
Actually, the DrawCairoAAFontString() makes sense, since when scrolling fast, text being rendered in the document window is probably scrolling out of the visible area before it is completely rendered or possibly before rendering even starts.  I could believe that objects or pointers are getting discarded because of the scrolling while they're still in use on other threads.
Comment 4 Frank Griffin 2011-03-25 02:10:20 CET
However, the problem must be in libreoffice, since I tried the same thing in kate with no problem.
Comment 5 Frank Griffin 2011-03-25 02:16:43 CET
(In reply to comment #4)
> However, the problem must be in libreoffice, since I tried the same thing in
> kate with no problem.

Well, not so fast.  I suppose it depends on what kind of text is being drawn.  Kate is a text editor, and the document I'm scrolling fast in libreoffice contins a fair number of equations with math symbols....

UPDATE: I just tried fast-scrolling a plain text document, and it works fine.  I'll attach the document that fails.
Comment 6 Frank Griffin 2011-03-25 02:18:28 CET
Created attachment 165 [details]
Document with math equations that causes the crash
Comment 7 D Morgan 2011-03-25 07:45:54 CET
i don't know if i will be able to fix, but the good news is that i reproduce with your file.
Comment 8 Frank Griffin 2011-03-25 11:54:17 CET
Hmmm,  I'm wondering if running gdb through ssh from a terminal window on a different machine would shield gdb from X freezing on the other machine ?  IIRC, X through ssh starts its own display ?

If I can use gdb, I can try to hunt it down...
Comment 9 Frank Griffin 2011-03-25 13:45:53 CET
There's a suggestion here that you can get this sort of problem by not using the Cairo that comes with OO (but it's pretty old): http://www.mail-archive.com/allbugs@openoffice.org/msg490008.html
Comment 10 Frank Griffin 2011-03-25 17:41:37 CET
Found it, but the patch author will have to be consulted to know why it's wrong.

The code causing the problem is added by 0001-Resolves-rhbz-680460-honour-lcdfilter-subpixeling-et.patch

The top of the stack trace is:
#0  0x0000000000000005 in ?? ()
#1  0x00007f68f3a497f1 in X11SalGraphics::DrawCairoAAFontString (
    this=<value optimized out>, rLayout=...)
    at /usr/src/debug/libreoffice-3.3.1.2/vcl/unx/source/gdi/salgdi3.cxx:1047

Listing the code:
1045	    {
1046	        const ImplFontOptions *pOptions = rFont.GetFontOptions();
   0x00007f68f3a497d4 <+964>:	mov    0x0(%rbp),%rax
   0x00007f68f3a497d8 <+968>:	mov    %rbp,%rdi
   0x00007f68f3a497db <+971>:	callq  *0x30(%rax)

1047	        void *pPattern = pOptions ? pOptions->GetPattern(pId) : NULL;
   0x00007f68f3a497de <+974>:	test   %rax,%rax
---Type <return> to continue, or q <return> to quit---
   0x00007f68f3a497e1 <+977>:	je     0x7f68f3a4981f <X11SalGraphics::DrawCairoAAFontString(ServerFontLayout const&)+1039>
   0x00007f68f3a497e3 <+979>:	mov    (%rax),%rdx
   0x00007f68f3a497e6 <+982>:	mov    0x18(%rsp),%rsi
   0x00007f68f3a497eb <+987>:	mov    %rax,%rdi
   0x00007f68f3a497ee <+990>:	callq  *0x10(%rdx)        <=====  here's the culprit

1048	        if (pPattern)
=> 0x00007f68f3a497f1 <+993>:	test   %rax,%rax

It's the indicated call at 0x00007f68f3a497ee that fails because it is calling location 0x05.  This is because stmt 1047 is saying that if pOptions is not 0x0 (it's not), then assume it points to an ImplFontOptions object and call that object's GetPattern() method.

Unfortunately,

(gdb) print pOptions
$3 = <value optimized out>
(gdb) 

but we can see from the code starting at +974 that the address in the first 8 bytes of *pOption ( (rax) ) is being moved into rdx, and that the GetPattern() routine address to be called is at 0x10 past %rdx.  C++ internals aren't my forte, but I'd bet that the first dword of the object points to a vector of method/function entry points, and GetPattern()'s is #3.

An info-all-registers shows that %rdx is 0x7f68d4090078, so displaying the memory there,
(gdb) x/6xa 0x7f68d4090078
0x7f68d4090078:	0x0	0x55
0x7f68d4090088:	0x5	0x5
0x7f68d4090098:	0x0	0x0
and there at 0x7f68d4090078 + 0x10 = 0x7f68d4090088 we find 0x5 --- calling this caused the crash.

pOptions came from rFont.getFontOptions(), so let's print that:
(gdb) print rFont
$4 = (ServerFont &) @0x7f68bde4da98: {_vptr.ServerFont = 0x7f68ee829ff0, 
  maGlyphList = {_M_ht = {_M_node_allocator = 
    {<__gnu_cxx::new_allocator<__gnu_cxx::_Hashtable_node<std::pair<int const, GlyphData> > >> = {<No data fields>}, <No data fields>}, _M_hash = 
    {<No data fields>}, _M_equals = {<std::binary_function<int, int, bool>> = 
    {<No data fields>}, <No data fields>}, _M_get_key = 
    {<std::unary_function<std::pair<int const, GlyphData>, int const>> = 
    {<No data fields>}, <No data fields>}, 
      _M_buckets = std::vector of length 5, capacity 5 = {0x0, 0x0, 0x0, 
    0x7f68c40e4be8, 0x0}, _M_num_elements = 1}}, maFontSelData = 
    {<ImplFontAttributes> = {maName = {mpData = 0x7f68bc140ba0}, maStyleName = warning: can't find linker symbol for virtual table for `String' value
warning:   found `getNextSize(unsigned int)::nPrimes' instead

{mpData = 0x7f68f6a5be14}, meWeight = WEIGHT_BOLD, meItalic = ITALIC_NONE, 
      meFamily = FAMILY_DONTKNOW, mePitch = PITCH_DONTKNOW, meWidthType = 
    WIDTH_DONTKNOW, mbSymbolFlag = false}, maTargetName = {mpData = 
    0x7f68bc140ba0}, maSearchName = {mpData = 0x7f68d4c31828}, mnWidth = 15, 
    mnHeight = 17, mfExactHeight = 17, mnOrientation = 0, meLanguage = 1023, 
    mbVertical = false, mbNonAntialiased = false, mpFontData = 0x75, 
    mpFontEntry = 0x7f68bde1c940}, mnExtInfo = 0, mpExtData = 0x0, 
  mnRefCount = 1, mnBytesUsed = 304, mpPrevGCFont = 0x7f68bde4d918, 
  mpNextGCFont = 0x7f68bde4dc18, mnCos = 65536, mnSin = 0, mnZWJ = 0, mnZWNJ = 
    0, mbCollectedZW = false}

This doesn't look like it's been particularly well-initialized.  Several lines up, we had:
   ServerFont& rFont = rLayout.GetServerFont();
so
(gdb) print rLayout
$5 = (const ServerFontLayout &) @0x7f68bdd56ef0: {<GenericSalLayout> = 
    {<SalLayout> = {_vptr.SalLayout = 0x7f68ee829eb0, mnMinCharPos = 0, 
      mnEndCharPos = 1, mnLayoutFlags = 258, mnUnitsPerPixel = 1, 
      mnOrientation = 0, mnRefCount = 1, maDrawOffset = {<Pair> = {nA = 0, 
          nB = 0}, <No data fields>}, maDrawBase = {<Pair> = {nA = 716, nB = 
    444}, <No data fields>}}, mpGlyphItems = 0x7f68c586e5a0, mnGlyphCount = 1, 
    mnGlyphCapacity = 16, maBasePoint = {<Pair> = {nA = 0, nB = 
    0}, <No data fields>}}, mrServerFont = @0x7f68bde4da98}
The address for mrServerFont *looks* valid, but who knows.

Either the value returned by GetFontOptions() for pOptions is garbage because the options for the font were never initialized, or some or all of these objects have been freed and reused, or there's just been a memory overlay.  I can't tell which.

Maybe this will make sense to the patch author.  In any case, the trick of running libreoffice on one system, ssh'ing to it from another system, locating the process via "ps ax | grep swriter" and attaching gdb to it from the second system works fine.
Comment 11 Frank Griffin 2011-03-27 20:52:01 CEST
The problem no longer occurs in the 3.3.2.2-1 version just uploaded.  Did you drop the patch, or has it been fixed ?
Comment 12 D Morgan 2011-03-27 22:02:05 CEST
we will keep this bugreport open because i just dropped the patches for now.
Comment 13 D Morgan 2011-08-22 23:50:15 CEST
closing as updates for 3.3.3.1 had been issued.

Status: NEW => RESOLVED
Resolution: (none) => FIXED


Note You need to log in before you can comment on or make changes to this bug.