Bug 10403 - tesseract unable to read tif files
Summary: tesseract unable to read tif files
Status: RESOLVED FIXED
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: 3
Hardware: i586 Linux
Priority: Normal major
Target Milestone: ---
Assignee: QA Team
QA Contact:
URL:
Whiteboard: has_procedure MGA3-32-OK mga3-64-ok
Keywords: Triaged, validated_update
: 10554 (view as bug list)
Depends on: 10411
Blocks: 10402
  Show dependency treegraph
 
Reported: 2013-06-02 22:47 CEST by Pablo Saratxaga
Modified: 2014-05-08 18:07 CEST (History)
6 users (show)

See Also:
Source RPM: tesseract-3.02.02-3.mga3.src.rpm
CVE:
Status comment:


Attachments

Description Pablo Saratxaga 2013-06-02 22:47:00 CEST
Description of problem:
ocr program tesseract fails due to inability to read tiff files

Version-Release number of selected component (if applicable):
3.02.02

How reproducible:
always

Steps to Reproduce:
1. try to do ocr on a tif file (tesseract only supported format)
2. it fails with following error:

Tesseract Open Source OCR Engine v3.02.02 with Leptonica
Error in findTiffCompression: function not present
Error in pixReadStreamTiff: function not present
Error in pixReadStream: tiff: no pix returned
Error in pixRead: pix not read
Unsupported image type.

it used to work with a previous version (at least it worked on Mandriva 2011) on the same image file. the "function not present" are suspiscious;
also, the package description talks about libtiff, but it isn"t linked with libtiff.

Reproducible: 

Steps to Reproduce:
Pablo Saratxaga 2013-06-02 22:47:08 CEST

CC: (none) => pablo

Comment 1 Pablo Saratxaga 2013-06-03 14:40:36 CEST
another bug: if English files (tesseract-eng-*.rpm) aren't installed (because you only need to OCR in Spanish, for example), then tesseract is unable to work properly (it tries to load "/usr/share/tessdata/eng.traineddata").

solution: add "Requires: tesseract-eng = %{version}" to the main package, so that the needed tesseract-eng is installed even for non-Engish setups.
Comment 2 Pablo Saratxaga 2013-06-03 14:41:03 CEST
the TIFF related problem seems to be in leptonica library
Comment 3 Pablo Saratxaga 2013-06-03 15:02:37 CEST
I found the tiff problem! libtesseract was wrongly built, see: https://bugs.mageia.org/show_bug.cgi?id=10411

Depends on: (none) => 10411

Pablo Saratxaga 2013-06-03 15:47:38 CEST

Blocks: (none) => 10402

Manuel Hiebel 2013-06-08 16:54:58 CEST

Keywords: (none) => Triaged
Assignee: bugsquad => zen25000

Comment 4 Pablo Saratxaga 2013-06-18 23:17:50 CEST
*** Bug 10554 has been marked as a duplicate of this bug. ***

CC: (none) => gmontalbine

Comment 5 Gary Montalbine 2013-06-18 23:54:17 CEST
I unistalled tesseract,libtesseract and libleptonica. Reinstalled and have the same problem. Where do I get the updated rpm's?
Gary
Barry Jackson 2013-06-19 01:36:53 CEST

Whiteboard: (none) => OK

Comment 6 Pablo Saratxaga 2013-06-19 10:18:54 CEST
you don't need to do anything with tesseract, just update libleptonica.
version leptonica-1.69-2 has the bug, a newer version will solve it.

there is an updated version in Cauldron: libleptonica3-1.69-3.mga4
a fix for Mageia 3 has to follow the update procedures and tests before being available (that being said, you can just install the Cauldron version, it worked ok for me)
Comment 7 Barry Jackson 2013-06-19 11:08:42 CEST
Pablo - thanks for your help on this.
Gary has tested my local rebuilds of leptonica (updated as per cauldron) and tesseract (with -eng require) for 3 and all seems well, so I will push both to 3/updates/testing soon - just been very short on time lately.
Comment 8 Barry Jackson 2013-06-19 18:54:47 CEST
Update Advisory

tesseract has been submitted to 3/core/updates_testing

The change is as recommended by Pablo in comment #1 of this bug report.

Source rpm:-
tesseract-3.02.02-3.1.mga3.src.rpm

Affected rpms
tesseract-3.02.02-3.1.mga3.i586.rpm
libtesseract3-3.02.02-3.1.mga3.i586.rpm
libtesseract-devel-3.02.02-3.1.mga3.i586.rpm
tesseract-osd-3.02.02-3.1.mga3.i586.rpm
tesseract-heb-com-3.02.02-3.1.mga3.noarch.rpm
tesseract-ara-3.02.02-3.1.mga3.noarch.rpm
tesseract-bul-3.02.02-3.1.mga3.noarch.rpm
tesseract-cat-3.02.02-3.1.mga3.noarch.rpm
tesseract-ces-3.02.02-3.1.mga3.noarch.rpm
tesseract-chi_sim-3.02.02-3.1.mga3.noarch.rpm
tesseract-chi_tra-3.02.02-3.1.mga3.noarch.rpm
tesseract-chr-3.02.02-3.1.mga3.noarch.rpm
tesseract-dan-frak-3.02.02-3.1.mga3.noarch.rpm
tesseract-dan-3.02.02-3.1.mga3.noarch.rpm
tesseract-deu-frak-3.02.02-3.1.mga3.noarch.rpm
tesseract-deu-3.02.02-3.1.mga3.noarch.rpm
tesseract-ell-3.02.02-3.1.mga3.noarch.rpm
tesseract-eng-3.02.02-3.1.mga3.noarch.rpm
tesseract-fin-3.02.02-3.1.mga3.noarch.rpm
tesseract-fra-3.02.02-3.1.mga3.noarch.rpm
tesseract-heb-3.02.02-3.1.mga3.noarch.rpm
tesseract-hin-3.02.02-3.1.mga3.noarch.rpm
tesseract-hun-3.02.02-3.1.mga3.noarch.rpm
tesseract-ind-3.02.02-3.1.mga3.noarch.rpm
tesseract-ita-3.02.02-3.1.mga3.noarch.rpm
tesseract-jpn-3.02.02-3.1.mga3.noarch.rpm
tesseract-kor-3.02.02-3.1.mga3.noarch.rpm
tesseract-lav-3.02.02-3.1.mga3.noarch.rpm
tesseract-lit-3.02.02-3.1.mga3.noarch.rpm
tesseract-nld-3.02.02-3.1.mga3.noarch.rpm
tesseract-nor-3.02.02-3.1.mga3.noarch.rpm
tesseract-pol-3.02.02-3.1.mga3.noarch.rpm
tesseract-por-3.02.02-3.1.mga3.noarch.rpm
tesseract-ron-3.02.02-3.1.mga3.noarch.rpm
tesseract-rus-3.02.02-3.1.mga3.noarch.rpm
tesseract-slk-3.02.02-3.1.mga3.noarch.rpm
tesseract-slk-frak-3.02.02-3.1.mga3.noarch.rpm
tesseract-slv-3.02.02-3.1.mga3.noarch.rpm
tesseract-spa-3.02.02-3.1.mga3.noarch.rpm
tesseract-srp-3.02.02-3.1.mga3.noarch.rpm
tesseract-swe-frak-3.02.02-3.1.mga3.noarch.rpm
tesseract-swe-3.02.02-3.1.mga3.noarch.rpm
tesseract-tgl-3.02.02-3.1.mga3.noarch.rpm
tesseract-tha-3.02.02-3.1.mga3.noarch.rpm
tesseract-tur-3.02.02-3.1.mga3.noarch.rpm
tesseract-ukr-3.02.02-3.1.mga3.noarch.rpm
tesseract-vie-3.02.02-3.1.mga3.noarch.rpm
tesseract-debuginfo-3.02.02-3.1.mga3.i586.rpm

This main bug should now also be fixed by the leptonica update in #10411

Assignee: zen25000 => qa-bugs

Comment 9 Rémi Verschelde 2013-06-22 11:58:02 CEST
Installed core/release version tesseract with tesseract-fra on mga3 i586, could reproduce the bug of comment 1.

The update candidate fixes the bug of comment 1, since it installs tesseract-eng too.

The main bug from comment 0 stays unfixed (which is normal since it is handled in bug 10411, I'll test it now).

CC: (none) => remi
Whiteboard: OK => OK MGA3-32-OK

Comment 10 Rémi Verschelde 2013-06-22 12:16:49 CEST
Procedure: install the package from core/release without tesseract-eng (chose another locale) to reproduce the bug from comment 1. Procedure for the main bug in comment 0 is in bug 10411.

Whiteboard: OK MGA3-32-OK => has_procedure MGA3-32-OK

Comment 11 Lewis Smith 2013-06-24 19:41:22 CEST
MGA3-32-OK

Installed 'release' tesseract, basic fault re-produced.
Updated
 tesseract-3.02.02-3.1.mga3.i586.rpm
 libtesseract3-3.02.02-3.1.mga3.i586.rpm
 tesseract-eng-3.02.02-3.1.mga3.noarch.rpm
 tesseract-fra-3.02.02-3.1.mga3.noarch.rpm
 libleptonica3-1.69-2.1.mga3.i586.rpm
De-coded successfully (relatively) an English TIF with various font sizes incl italic; and a French JPEG landscape double-page.

But: the TIF with various font sizes was a single column of variable text sizes. So trying initially, to give it a helping hand:
 tesseract /mnt/common/docs/ElderChmpgn.tiff ~/elderflower -psm 4
(Assume a single column of text of variable sizes) yielded:
 Tesseract Open Source OCR Engine v3.02.02 with Leptonica
 set_count == gridheight():Error:Assert failed:in file colfind.cpp, line 648
 Segmentation fault
But since it worked OK with no -psm param, not worth persuing this?

CC: (none) => lewyssmith

Comment 12 Barry Jackson 2013-06-24 23:38:57 CEST
(In reply to Lewis Smith from comment #11)

>  set_count == gridheight():Error:Assert failed:in file colfind.cpp, line 648
>  Segmentation fault
> But since it worked OK with no -psm param, not worth persuing this?

Seems to be fixed upstream for next release:-

http://code.google.com/p/tesseract-ocr/issues/detail?id=653

Barry

CC: (none) => zen25000

Comment 13 claire robinson 2013-06-26 13:04:22 CEST
Testing complete mga3 64

Validating. Advisory uploaded.

Could sysadmin please push from 3 core/updates_testing to core/updates

Thanks!

Keywords: (none) => validated_update
Whiteboard: has_procedure MGA3-32-OK => has_procedure MGA3-32-OK mga3-64-ok
CC: (none) => sysadmin-bugs

Comment 14 Nicolas Vigier 2013-06-26 20:24:58 CEST
http://advisories.mageia.org/MGAA-2013-0039.html

Status: NEW => RESOLVED
CC: (none) => boklm
Resolution: (none) => FIXED

Nicolas Vigier 2014-05-08 18:07:04 CEST

CC: boklm => (none)


Note You need to log in before you can comment on or make changes to this bug.