Bug 19259

Summary: Gscan2pdf doesn't recover any text after Tesseract OCR
Product: Mageia Reporter: papoteur <yvesbrungard>
Component: RPM PackagesAssignee: David GEIGER <geiger.david68210>
Status: RESOLVED OLD QA Contact:
Severity: normal    
Priority: Normal CC: marja11
Version: 5   
Target Milestone: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Source RPM: gscan2pdf-1.2.5-3.mga5 CVE:
Status comment:

Description papoteur 2016-08-28 12:33:50 CEST
Description of problem:
See https://sourceforge.net/p/gscan2pdf/bugs/229/
Open a PDF file
Ask for an OCR with tesseract.
The page is processed, but the "OCR tab" remains void.
In log, I get the corersponding lines (I replaced the source file name with <myfile.pdf>) :

INFO - 1 pages
INFO - pdfimages -f 1 -l 1 "<myfile.pdf>" x
INFO - New page filename x-000.ppm, format Portable pixmap format (color)
INFO - New page filename /tmp/gscan2pdf-Lq1D/g9C4aGQW6w.png, format Portable Network Graphics
INFO - Added /tmp/gscan2pdf-Lq1D/pErqBf8Gs0.png at page 1 with resolution 199.950168350168
DEBUG - Started setting page_number_start from 1 to 2
DEBUG - Finished setting page_number_start from 1 to 2
INFO - Found tesseract version 3.02.02.
INFO - echo tessedit_create_hocr 1 > hocr.config;tesseract /tmp/gscan2pdf-Lq1D/pErqBf8Gs0.png /tmp/2hXGRZlhDI -l fra +hocr.config;rm hocr.config
DEBUG - Warnings from Tesseract: Tesseract Open Source OCR Engine v3.02.02 with Leptonica

INFO - Replaced /tmp/gscan2pdf-Lq1D/pErqBf8Gs0.png at page 1 with /tmp/gscan2pdf-Lq1D/wFB4NSdVrc.png, resolution 199.950168350168

When I launch
echo tessedit_create_hocr 1 > hocr.config;tesseract /tmp/gscan2pdf-Lq1D/wFB4NSdVrc.png /tmp/2hXGRZlhDI -l fra +hocr.config
I get /tmp/2hXGRZlhDI.html file with the good content.
Version-Release number of selected component (if applicable):
Comment 1 Marja Van Waes 2016-08-28 12:48:37 CEST
Assigning to maintainer

CC: (none) => marja11
Assignee: bugsquad => geiger.david68210

Comment 2 papoteur 2018-01-04 18:01:20 CET
Mageia 5 is EOL. 
Thus closing.
Works in Mageia 6

Resolution: (none) => OLD
Status: NEW => RESOLVED