Bug 24459

Summary: OCR badly done with xsane and tesseract : to be OK it needs a little Perl script
Product: Mageia Reporter: Philippe Didier <philippedidier>
Component: RPM PackagesAssignee: All Packagers <pkg-bugs>
Status: NEW --- QA Contact:
Severity: normal    
Priority: Normal CC: doktor5000, fri, lewyssmith, marja11, zen25000
Version: CauldronKeywords: PATCH
Target Milestone: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Source RPM: tesseract xsane CVE:
Status comment:
Attachments: script to use tesseract with xsane

Description Philippe Didier 2019-03-03 21:26:45 CET
First of all I am not a skilled  package maintainer
secondly I am not the xsane nor the tesseract maintainer 

when trying to operate an OCR (Optical Character Recognition) with tesseract on a page scanned with Xsane we can't get a good result

This nevertheless can be done by using this little trick :
You must add this in the configuration window of Xsane, in the OCR tab,
[OCR command] :  "xsane2tess -l xxx" 
where xxx are the 3 letters of the language to be used by tesseract depending of which is installed (fra for french, eng for english, grc for greek, ara for arabic ...)

this can be written directly in the xsane.rc file of a user in its ocr section

"ocr-command"
"xsane2tess -l fra"
"ocr-inputfile-option"
"-i"
"ocr-outputfile-options"
"-o"
"ocr-use-gui-pipe"
0
"ocr-gui-outfd-option"
"-x"
"ocr-progress-keyword"
""


xsane2tess.pl is a little script (I will give it as an attachment) that improves hugely the job

But... 
How to make a user able to use it ?

We can add this script in the tesseract package ...
But where to install it 
/usr/share/tesseract/configs/  would be perhaps a good place ?

But if it's installed,  how to know what to do with it ?

Maybe adding a notice to tesseract explaining what to do ?

Or can it be magically done by using a conditional post install script that modify the xsane.rc if it exists ?

Hoping that the maintainers will find it useful

Thanks
Comment 1 Philippe Didier 2019-03-03 21:32:06 CET
Created attachment 10834 [details]
script to use tesseract with xsane

script to use tesseract correctly with xsane
(needs to be called by xsane : must add something in the OCR config tab)
Philippe Didier 2019-03-03 21:34:32 CET

Summary: OCR badly done with xsane and tesseract it needs a little trick => OCR badly done with xsane and tesseract : to be OK it needs a little trick

Comment 2 Philippe Didier 2019-03-03 21:44:21 CET
> 
> We can add this script in the tesseract package ...
> But where to install it 
> /usr/share/tesseract/configs/  would be perhaps a good place ?


Sorry My Mistake ! I installed it in /usr/bin after having made it executable
and it can be found automatically if it's called by the xsane config file


> But if it's installed,  how to know what to do with it ?
> 
> Maybe adding a notice to tesseract explaining what to do ?
> 
> Or can it be magically done by using a conditional post install script that
> modify the xsane.rc if it exists ?
> 
> Hoping that the maintainers will find it useful
> 
> Thanks
Comment 3 Marja Van Waes 2019-03-04 08:43:19 CET
Assigning to the xsane maintainer, CC'ing the tesseract maintainer.

Feel free to set the Severity back to enhancement, as the reporter did, I just felt this issue shouldn't happen!

Severity: enhancement => normal
Assignee: bugsquad => lists.jjorge
Keywords: (none) => PATCH
Summary: OCR badly done with xsane and tesseract : to be OK it needs a little trick => OCR badly done with xsane and tesseract : to be OK it needs a little Perl script
CC: (none) => marja11, zen25000

Comment 4 José Jorge 2019-03-04 09:20:03 CET
I feel this should be upstreamed to xsane project, as it is an enhancement to it's OCR plugin.
Comment 5 Lewis Smith 2019-03-04 20:16:14 CET
I think this is just an Xsane affair, not tesseract at all.
I have played with OCR & tesseract (on existing scanned images); did not know you could do it directly from Xsane!

CC: (none) => lewyssmith

Manuel Hiebel 2021-03-04 22:04:59 CET

Assignee: lists.jjorge => pkg-bugs

Florian Hubold 2021-03-07 18:08:23 CET

CC: (none) => doktor5000

Morgan Leijström 2021-03-07 21:01:57 CET

CC: (none) => fri