Description of problem: ocr program tesseract fails due to inability to read tiff files Version-Release number of selected component (if applicable): 3.02.02 How reproducible: always Steps to Reproduce: 1. try to do ocr on a tif file (tesseract only supported format) 2. it fails with following error: Tesseract Open Source OCR Engine v3.02.02 with Leptonica Error in findTiffCompression: function not present Error in pixReadStreamTiff: function not present Error in pixReadStream: tiff: no pix returned Error in pixRead: pix not read Unsupported image type. it used to work with a previous version (at least it worked on Mandriva 2011) on the same image file. the "function not present" are suspiscious; also, the package description talks about libtiff, but it isn"t linked with libtiff. Reproducible: Steps to Reproduce:
CC: (none) => pablo
another bug: if English files (tesseract-eng-*.rpm) aren't installed (because you only need to OCR in Spanish, for example), then tesseract is unable to work properly (it tries to load "/usr/share/tessdata/eng.traineddata"). solution: add "Requires: tesseract-eng = %{version}" to the main package, so that the needed tesseract-eng is installed even for non-Engish setups.
the TIFF related problem seems to be in leptonica library
I found the tiff problem! libtesseract was wrongly built, see: https://bugs.mageia.org/show_bug.cgi?id=10411
Depends on: (none) => 10411
Blocks: (none) => 10402
Keywords: (none) => TriagedAssignee: bugsquad => zen25000
*** Bug 10554 has been marked as a duplicate of this bug. ***
CC: (none) => gmontalbine
I unistalled tesseract,libtesseract and libleptonica. Reinstalled and have the same problem. Where do I get the updated rpm's? Gary
Whiteboard: (none) => OK
you don't need to do anything with tesseract, just update libleptonica. version leptonica-1.69-2 has the bug, a newer version will solve it. there is an updated version in Cauldron: libleptonica3-1.69-3.mga4 a fix for Mageia 3 has to follow the update procedures and tests before being available (that being said, you can just install the Cauldron version, it worked ok for me)
Pablo - thanks for your help on this. Gary has tested my local rebuilds of leptonica (updated as per cauldron) and tesseract (with -eng require) for 3 and all seems well, so I will push both to 3/updates/testing soon - just been very short on time lately.
Update Advisory tesseract has been submitted to 3/core/updates_testing The change is as recommended by Pablo in comment #1 of this bug report. Source rpm:- tesseract-3.02.02-3.1.mga3.src.rpm Affected rpms tesseract-3.02.02-3.1.mga3.i586.rpm libtesseract3-3.02.02-3.1.mga3.i586.rpm libtesseract-devel-3.02.02-3.1.mga3.i586.rpm tesseract-osd-3.02.02-3.1.mga3.i586.rpm tesseract-heb-com-3.02.02-3.1.mga3.noarch.rpm tesseract-ara-3.02.02-3.1.mga3.noarch.rpm tesseract-bul-3.02.02-3.1.mga3.noarch.rpm tesseract-cat-3.02.02-3.1.mga3.noarch.rpm tesseract-ces-3.02.02-3.1.mga3.noarch.rpm tesseract-chi_sim-3.02.02-3.1.mga3.noarch.rpm tesseract-chi_tra-3.02.02-3.1.mga3.noarch.rpm tesseract-chr-3.02.02-3.1.mga3.noarch.rpm tesseract-dan-frak-3.02.02-3.1.mga3.noarch.rpm tesseract-dan-3.02.02-3.1.mga3.noarch.rpm tesseract-deu-frak-3.02.02-3.1.mga3.noarch.rpm tesseract-deu-3.02.02-3.1.mga3.noarch.rpm tesseract-ell-3.02.02-3.1.mga3.noarch.rpm tesseract-eng-3.02.02-3.1.mga3.noarch.rpm tesseract-fin-3.02.02-3.1.mga3.noarch.rpm tesseract-fra-3.02.02-3.1.mga3.noarch.rpm tesseract-heb-3.02.02-3.1.mga3.noarch.rpm tesseract-hin-3.02.02-3.1.mga3.noarch.rpm tesseract-hun-3.02.02-3.1.mga3.noarch.rpm tesseract-ind-3.02.02-3.1.mga3.noarch.rpm tesseract-ita-3.02.02-3.1.mga3.noarch.rpm tesseract-jpn-3.02.02-3.1.mga3.noarch.rpm tesseract-kor-3.02.02-3.1.mga3.noarch.rpm tesseract-lav-3.02.02-3.1.mga3.noarch.rpm tesseract-lit-3.02.02-3.1.mga3.noarch.rpm tesseract-nld-3.02.02-3.1.mga3.noarch.rpm tesseract-nor-3.02.02-3.1.mga3.noarch.rpm tesseract-pol-3.02.02-3.1.mga3.noarch.rpm tesseract-por-3.02.02-3.1.mga3.noarch.rpm tesseract-ron-3.02.02-3.1.mga3.noarch.rpm tesseract-rus-3.02.02-3.1.mga3.noarch.rpm tesseract-slk-3.02.02-3.1.mga3.noarch.rpm tesseract-slk-frak-3.02.02-3.1.mga3.noarch.rpm tesseract-slv-3.02.02-3.1.mga3.noarch.rpm tesseract-spa-3.02.02-3.1.mga3.noarch.rpm tesseract-srp-3.02.02-3.1.mga3.noarch.rpm tesseract-swe-frak-3.02.02-3.1.mga3.noarch.rpm tesseract-swe-3.02.02-3.1.mga3.noarch.rpm tesseract-tgl-3.02.02-3.1.mga3.noarch.rpm tesseract-tha-3.02.02-3.1.mga3.noarch.rpm tesseract-tur-3.02.02-3.1.mga3.noarch.rpm tesseract-ukr-3.02.02-3.1.mga3.noarch.rpm tesseract-vie-3.02.02-3.1.mga3.noarch.rpm tesseract-debuginfo-3.02.02-3.1.mga3.i586.rpm This main bug should now also be fixed by the leptonica update in #10411
Assignee: zen25000 => qa-bugs
Installed core/release version tesseract with tesseract-fra on mga3 i586, could reproduce the bug of comment 1. The update candidate fixes the bug of comment 1, since it installs tesseract-eng too. The main bug from comment 0 stays unfixed (which is normal since it is handled in bug 10411, I'll test it now).
CC: (none) => remiWhiteboard: OK => OK MGA3-32-OK
Procedure: install the package from core/release without tesseract-eng (chose another locale) to reproduce the bug from comment 1. Procedure for the main bug in comment 0 is in bug 10411.
Whiteboard: OK MGA3-32-OK => has_procedure MGA3-32-OK
MGA3-32-OK Installed 'release' tesseract, basic fault re-produced. Updated tesseract-3.02.02-3.1.mga3.i586.rpm libtesseract3-3.02.02-3.1.mga3.i586.rpm tesseract-eng-3.02.02-3.1.mga3.noarch.rpm tesseract-fra-3.02.02-3.1.mga3.noarch.rpm libleptonica3-1.69-2.1.mga3.i586.rpm De-coded successfully (relatively) an English TIF with various font sizes incl italic; and a French JPEG landscape double-page. But: the TIF with various font sizes was a single column of variable text sizes. So trying initially, to give it a helping hand: tesseract /mnt/common/docs/ElderChmpgn.tiff ~/elderflower -psm 4 (Assume a single column of text of variable sizes) yielded: Tesseract Open Source OCR Engine v3.02.02 with Leptonica set_count == gridheight():Error:Assert failed:in file colfind.cpp, line 648 Segmentation fault But since it worked OK with no -psm param, not worth persuing this?
CC: (none) => lewyssmith
(In reply to Lewis Smith from comment #11) > set_count == gridheight():Error:Assert failed:in file colfind.cpp, line 648 > Segmentation fault > But since it worked OK with no -psm param, not worth persuing this? Seems to be fixed upstream for next release:- http://code.google.com/p/tesseract-ocr/issues/detail?id=653 Barry
CC: (none) => zen25000
Testing complete mga3 64 Validating. Advisory uploaded. Could sysadmin please push from 3 core/updates_testing to core/updates Thanks!
Keywords: (none) => validated_updateWhiteboard: has_procedure MGA3-32-OK => has_procedure MGA3-32-OK mga3-64-okCC: (none) => sysadmin-bugs
http://advisories.mageia.org/MGAA-2013-0039.html
Status: NEW => RESOLVEDCC: (none) => boklmResolution: (none) => FIXED
CC: boklm => (none)