Bug 24459 - OCR badly done with xsane and tesseract : to be OK it needs a little Perl script
Summary: OCR badly done with xsane and tesseract : to be OK it needs a little Perl sc...
Status: NEW
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: Cauldron
Hardware: All Linux
Priority: Normal normal
Target Milestone: ---
Assignee: All Packagers
QA Contact:
URL:
Whiteboard:
Keywords: PATCH
Depends on:
Blocks:
 
Reported: 2019-03-03 21:26 CET by Philippe Didier
Modified: 2021-03-07 21:01 CET (History)
5 users (show)

See Also:
Source RPM: tesseract xsane
CVE:
Status comment:


Attachments
script to use tesseract with xsane (656 bytes, text/plain)
2019-03-03 21:32 CET, Philippe Didier
Details

Description Philippe Didier 2019-03-03 21:26:45 CET
First of all I am not a skilled  package maintainer
secondly I am not the xsane nor the tesseract maintainer 

when trying to operate an OCR (Optical Character Recognition) with tesseract on a page scanned with Xsane we can't get a good result

This nevertheless can be done by using this little trick :
You must add this in the configuration window of Xsane, in the OCR tab,
[OCR command] :  "xsane2tess -l xxx" 
where xxx are the 3 letters of the language to be used by tesseract depending of which is installed (fra for french, eng for english, grc for greek, ara for arabic ...)

this can be written directly in the xsane.rc file of a user in its ocr section

"ocr-command"
"xsane2tess -l fra"
"ocr-inputfile-option"
"-i"
"ocr-outputfile-options"
"-o"
"ocr-use-gui-pipe"
0
"ocr-gui-outfd-option"
"-x"
"ocr-progress-keyword"
""


xsane2tess.pl is a little script (I will give it as an attachment) that improves hugely the job

But... 
How to make a user able to use it ?

We can add this script in the tesseract package ...
But where to install it 
/usr/share/tesseract/configs/  would be perhaps a good place ?

But if it's installed,  how to know what to do with it ?

Maybe adding a notice to tesseract explaining what to do ?

Or can it be magically done by using a conditional post install script that modify the xsane.rc if it exists ?

Hoping that the maintainers will find it useful

Thanks
Comment 1 Philippe Didier 2019-03-03 21:32:06 CET
Created attachment 10834 [details]
script to use tesseract with xsane

script to use tesseract correctly with xsane
(needs to be called by xsane : must add something in the OCR config tab)
Philippe Didier 2019-03-03 21:34:32 CET

Summary: OCR badly done with xsane and tesseract it needs a little trick => OCR badly done with xsane and tesseract : to be OK it needs a little trick

Comment 2 Philippe Didier 2019-03-03 21:44:21 CET
> 
> We can add this script in the tesseract package ...
> But where to install it 
> /usr/share/tesseract/configs/  would be perhaps a good place ?


Sorry My Mistake ! I installed it in /usr/bin after having made it executable
and it can be found automatically if it's called by the xsane config file


> But if it's installed,  how to know what to do with it ?
> 
> Maybe adding a notice to tesseract explaining what to do ?
> 
> Or can it be magically done by using a conditional post install script that
> modify the xsane.rc if it exists ?
> 
> Hoping that the maintainers will find it useful
> 
> Thanks
Comment 3 Marja Van Waes 2019-03-04 08:43:19 CET
Assigning to the xsane maintainer, CC'ing the tesseract maintainer.

Feel free to set the Severity back to enhancement, as the reporter did, I just felt this issue shouldn't happen!

Severity: enhancement => normal
Assignee: bugsquad => lists.jjorge
Keywords: (none) => PATCH
Summary: OCR badly done with xsane and tesseract : to be OK it needs a little trick => OCR badly done with xsane and tesseract : to be OK it needs a little Perl script
CC: (none) => marja11, zen25000

Comment 4 José Jorge 2019-03-04 09:20:03 CET
I feel this should be upstreamed to xsane project, as it is an enhancement to it's OCR plugin.
Comment 5 Lewis Smith 2019-03-04 20:16:14 CET
I think this is just an Xsane affair, not tesseract at all.
I have played with OCR & tesseract (on existing scanned images); did not know you could do it directly from Xsane!

CC: (none) => lewyssmith

Manuel Hiebel 2021-03-04 22:04:59 CET

Assignee: lists.jjorge => pkg-bugs

Florian Hubold 2021-03-07 18:08:23 CET

CC: (none) => doktor5000

Morgan Leijström 2021-03-07 21:01:57 CET

CC: (none) => fri


Note You need to log in before you can comment on or make changes to this bug.