| Summary: | tesseract unable to read tif files | ||
|---|---|---|---|
| Product: | Mageia | Reporter: | Pablo Saratxaga <pablo> |
| Component: | RPM Packages | Assignee: | QA Team <qa-bugs> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | major | ||
| Priority: | Normal | CC: | gmontalbine, lewyssmith, pablo, rverschelde, sysadmin-bugs, zen25000 |
| Version: | 3 | Keywords: | Triaged, validated_update |
| Target Milestone: | --- | ||
| Hardware: | i586 | ||
| OS: | Linux | ||
| Whiteboard: | has_procedure MGA3-32-OK mga3-64-ok | ||
| Source RPM: | tesseract-3.02.02-3.mga3.src.rpm | CVE: | |
| Status comment: | |||
| Bug Depends on: | 10411 | ||
| Bug Blocks: | 10402 | ||
|
Description
Pablo Saratxaga
2013-06-02 22:47:00 CEST
Pablo Saratxaga
2013-06-02 22:47:08 CEST
CC:
(none) =>
pablo another bug: if English files (tesseract-eng-*.rpm) aren't installed (because you only need to OCR in Spanish, for example), then tesseract is unable to work properly (it tries to load "/usr/share/tessdata/eng.traineddata").
solution: add "Requires: tesseract-eng = %{version}" to the main package, so that the needed tesseract-eng is installed even for non-Engish setups.
the TIFF related problem seems to be in leptonica library I found the tiff problem! libtesseract was wrongly built, see: https://bugs.mageia.org/show_bug.cgi?id=10411 Depends on:
(none) =>
10411
Pablo Saratxaga
2013-06-03 15:47:38 CEST
Blocks:
(none) =>
10402
Manuel Hiebel
2013-06-08 16:54:58 CEST
Keywords:
(none) =>
Triaged I unistalled tesseract,libtesseract and libleptonica. Reinstalled and have the same problem. Where do I get the updated rpm's? Gary
Barry Jackson
2013-06-19 01:36:53 CEST
Whiteboard:
(none) =>
OK you don't need to do anything with tesseract, just update libleptonica. version leptonica-1.69-2 has the bug, a newer version will solve it. there is an updated version in Cauldron: libleptonica3-1.69-3.mga4 a fix for Mageia 3 has to follow the update procedures and tests before being available (that being said, you can just install the Cauldron version, it worked ok for me) Pablo - thanks for your help on this. Gary has tested my local rebuilds of leptonica (updated as per cauldron) and tesseract (with -eng require) for 3 and all seems well, so I will push both to 3/updates/testing soon - just been very short on time lately. Update Advisory tesseract has been submitted to 3/core/updates_testing The change is as recommended by Pablo in comment #1 of this bug report. Source rpm:- tesseract-3.02.02-3.1.mga3.src.rpm Affected rpms tesseract-3.02.02-3.1.mga3.i586.rpm libtesseract3-3.02.02-3.1.mga3.i586.rpm libtesseract-devel-3.02.02-3.1.mga3.i586.rpm tesseract-osd-3.02.02-3.1.mga3.i586.rpm tesseract-heb-com-3.02.02-3.1.mga3.noarch.rpm tesseract-ara-3.02.02-3.1.mga3.noarch.rpm tesseract-bul-3.02.02-3.1.mga3.noarch.rpm tesseract-cat-3.02.02-3.1.mga3.noarch.rpm tesseract-ces-3.02.02-3.1.mga3.noarch.rpm tesseract-chi_sim-3.02.02-3.1.mga3.noarch.rpm tesseract-chi_tra-3.02.02-3.1.mga3.noarch.rpm tesseract-chr-3.02.02-3.1.mga3.noarch.rpm tesseract-dan-frak-3.02.02-3.1.mga3.noarch.rpm tesseract-dan-3.02.02-3.1.mga3.noarch.rpm tesseract-deu-frak-3.02.02-3.1.mga3.noarch.rpm tesseract-deu-3.02.02-3.1.mga3.noarch.rpm tesseract-ell-3.02.02-3.1.mga3.noarch.rpm tesseract-eng-3.02.02-3.1.mga3.noarch.rpm tesseract-fin-3.02.02-3.1.mga3.noarch.rpm tesseract-fra-3.02.02-3.1.mga3.noarch.rpm tesseract-heb-3.02.02-3.1.mga3.noarch.rpm tesseract-hin-3.02.02-3.1.mga3.noarch.rpm tesseract-hun-3.02.02-3.1.mga3.noarch.rpm tesseract-ind-3.02.02-3.1.mga3.noarch.rpm tesseract-ita-3.02.02-3.1.mga3.noarch.rpm tesseract-jpn-3.02.02-3.1.mga3.noarch.rpm tesseract-kor-3.02.02-3.1.mga3.noarch.rpm tesseract-lav-3.02.02-3.1.mga3.noarch.rpm tesseract-lit-3.02.02-3.1.mga3.noarch.rpm tesseract-nld-3.02.02-3.1.mga3.noarch.rpm tesseract-nor-3.02.02-3.1.mga3.noarch.rpm tesseract-pol-3.02.02-3.1.mga3.noarch.rpm tesseract-por-3.02.02-3.1.mga3.noarch.rpm tesseract-ron-3.02.02-3.1.mga3.noarch.rpm tesseract-rus-3.02.02-3.1.mga3.noarch.rpm tesseract-slk-3.02.02-3.1.mga3.noarch.rpm tesseract-slk-frak-3.02.02-3.1.mga3.noarch.rpm tesseract-slv-3.02.02-3.1.mga3.noarch.rpm tesseract-spa-3.02.02-3.1.mga3.noarch.rpm tesseract-srp-3.02.02-3.1.mga3.noarch.rpm tesseract-swe-frak-3.02.02-3.1.mga3.noarch.rpm tesseract-swe-3.02.02-3.1.mga3.noarch.rpm tesseract-tgl-3.02.02-3.1.mga3.noarch.rpm tesseract-tha-3.02.02-3.1.mga3.noarch.rpm tesseract-tur-3.02.02-3.1.mga3.noarch.rpm tesseract-ukr-3.02.02-3.1.mga3.noarch.rpm tesseract-vie-3.02.02-3.1.mga3.noarch.rpm tesseract-debuginfo-3.02.02-3.1.mga3.i586.rpm This main bug should now also be fixed by the leptonica update in #10411 Assignee:
zen25000 =>
qa-bugs Installed core/release version tesseract with tesseract-fra on mga3 i586, could reproduce the bug of comment 1. The update candidate fixes the bug of comment 1, since it installs tesseract-eng too. The main bug from comment 0 stays unfixed (which is normal since it is handled in bug 10411, I'll test it now). CC:
(none) =>
remi Procedure: install the package from core/release without tesseract-eng (chose another locale) to reproduce the bug from comment 1. Procedure for the main bug in comment 0 is in bug 10411. Whiteboard:
OK MGA3-32-OK =>
has_procedure MGA3-32-OK MGA3-32-OK Installed 'release' tesseract, basic fault re-produced. Updated tesseract-3.02.02-3.1.mga3.i586.rpm libtesseract3-3.02.02-3.1.mga3.i586.rpm tesseract-eng-3.02.02-3.1.mga3.noarch.rpm tesseract-fra-3.02.02-3.1.mga3.noarch.rpm libleptonica3-1.69-2.1.mga3.i586.rpm De-coded successfully (relatively) an English TIF with various font sizes incl italic; and a French JPEG landscape double-page. But: the TIF with various font sizes was a single column of variable text sizes. So trying initially, to give it a helping hand: tesseract /mnt/common/docs/ElderChmpgn.tiff ~/elderflower -psm 4 (Assume a single column of text of variable sizes) yielded: Tesseract Open Source OCR Engine v3.02.02 with Leptonica set_count == gridheight():Error:Assert failed:in file colfind.cpp, line 648 Segmentation fault But since it worked OK with no -psm param, not worth persuing this? CC:
(none) =>
lewyssmith (In reply to Lewis Smith from comment #11) > set_count == gridheight():Error:Assert failed:in file colfind.cpp, line 648 > Segmentation fault > But since it worked OK with no -psm param, not worth persuing this? Seems to be fixed upstream for next release:- http://code.google.com/p/tesseract-ocr/issues/detail?id=653 Barry CC:
(none) =>
zen25000 Testing complete mga3 64 Validating. Advisory uploaded. Could sysadmin please push from 3 core/updates_testing to core/updates Thanks! Keywords:
(none) =>
validated_update http://advisories.mageia.org/MGAA-2013-0039.html Status:
NEW =>
RESOLVED
Nicolas Vigier
2014-05-08 18:07:04 CEST
CC:
boklm =>
(none) |