Bug 11880

Summary: Searching the latest cauldron x86_64 swish-e indexes is prone to failure or segault
Product: Mageia Reporter: William Murphy <warrendiogenese>
Component: RPM PackagesAssignee: Thomas Spuhler <thomas>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: Normal CC: jani.valimaa
Version: CauldronKeywords: PATCH, Triaged, UPSTREAM
Target Milestone: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Source RPM: swish-e-2.4.7-11.mga4.src.rpm CVE:
Status comment:
Attachments: Proposed patch

Description William Murphy 2013-12-05 17:21:43 CET
Description of problem:
Indexes generated using the latest cauldron x86_64 build of swish-e are likely to contain corrupt data. Searching these indexes is prone to segfaults or failure using swish-e directly or the perl-SWISH-API.

Original upstream bug report:
http://swish-e.org/archive/2013-10/13144.html

More informative upstream bug report:
http://swish-e.org/archive/2013-11/13148.html


Version-Release number of selected component (if applicable): 2.4.7-11.mga4.x86_64


How reproducible: 
  Almost always, but results vary. Failures are either segfaults or return an error.

Steps to Reproduce:
1. Create an index of a number of large documents. I chose random gtk-doc HTML document folders:
    swish-e -f ~/index.test.swish-e -i /usr/share/gtk-doc/html/gobject

2. Search for a common keyword for lots of hits:
    swish-e -f ~/index.test.swish-e -w "gobject"

I also tried cairo and glib and the results were either an immediate segfault or a truncated list of matching files, ending with an error similar too:
    err: Failed to seek to properties located at 2738810656504414208 for file number 230 : Invalid argument



Reproducible: 

Steps to Reproduce:
Comment 1 William Murphy 2013-12-05 17:25:13 CET
Created attachment 4577 [details]
Proposed patch

There hasn't been a new release in 4 years, but the project is in, however slight, active development and I make good use of it. 

Rather than sulk, I went bug hunting. I found fixes to 2 other minor bugs and the solution to this one. 

At first sight, this looked like a fix: http://dev.swish-e.org/ticket/14 

This failure is caused by using memcpy on overlapping memory areas in remove_worddata_longs. It's been there for years and just now failed. Changing memcpy to memmove fixed it.

I've attached a patch that fixes both of these bugs and silences this annoying warning: 
err: Parsing of undecoded UTF-8 will give garbage when decoding entities at /usr/lib/swish-e/swishspider line 97

I've tested the patch here and all is good again :)
Manuel Hiebel 2013-12-05 17:26:21 CET

Keywords: (none) => PATCH, Triaged, UPSTREAM
CC: (none) => jani.valimaa
Assignee: bugsquad => thomas

Comment 2 Thomas Spuhler 2013-12-11 23:21:48 CET
Thanks a lot for the Error Report and even more thanks for the Patch.
Would you please test it.
(This patch makes it build again, passing the test)

Status: NEW => ASSIGNED

Comment 3 Thomas Spuhler 2013-12-16 23:39:10 CET
No more error reports and it builds now.
I consider it as fixed

Status: ASSIGNED => RESOLVED
Resolution: (none) => FIXED