Bug 26160 - tesseract man page display HTML code
Summary: tesseract man page display HTML code
Status: RESOLVED FIXED
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: 7
Hardware: All Linux
Priority: Normal normal
Target Milestone: ---
Assignee: QA Team
QA Contact:
URL:
Whiteboard: MGA7-64-OK
Keywords: advisory, validated_update
Depends on:
Blocks:
 
Reported: 2020-02-01 11:29 CET by papoteur
Modified: 2020-02-04 12:08 CET (History)
5 users (show)

See Also:
Source RPM: tesseract-4.0.0-1.mga7.src.rpm
CVE:
Status comment:


Attachments

Description papoteur 2020-02-01 11:29:48 CET
Description of problem:
man tesseract
I got
<!DOCTYPE html PUBLIC "‐//W3C//DTD XHTML 1.1//EN"
    "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">         <html
xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"> <head>  <meta
http‐equiv="Content‐Type"         content="application/xhtml+xml;
charset=UTF‐8" /> <meta name="generator" content="AsciiDoc 8.6.9"
/>  <title>TESSERACT(1)</title> <style type="text/css"> /* Shared
CSS for AsciiDoc xhtml11 and html5 backends */

/* Default font. */ body {
  font‐family: Georgia,serif; }

/* Title font. */ h1, h2, h3, h4, h5, h6, div.title,  caption.ti‐
tle, thead, p.table.header, #toctitle, #author, #revnumber, #rev‐
date, #revremark, #footer {
  font‐family: Arial,Helvetica,sans‐serif; }

body {
  margin: 1em 5% 1em 5%; }

a {
  color: blue;
  text‐decoration: underline; } a:visited {
  color: fuchsia; }
....

Version-Release number of selected component (if applicable):

tesseract-4.0.0-1.mga7
Comment 1 Lewis Smith 2020-02-01 21:19:46 CET
> tesseract man page display HTML code
It does indeed!
But it is a really good program.

 $ tesseract --help
shows enough information to use it.
 $ tesseract --help-extra
shows even more.
 
Assigning to barjac as the registered & active maintainer.

Assignee: bugsquad => zen25000
Source RPM: tesseract => tesseract-4.0.0-1.mga7.src.rpm

Comment 2 David GEIGER 2020-02-03 09:21:05 CET
Should be fixed with upcoming tesseract-4.0.0-1.1.mga7 in Core/Updates_testing repo!

Please test it, thanks.

CC: (none) => geiger.david68210

Comment 3 papoteur 2020-02-03 10:13:39 CET
Hello,
Installed
- lib64tesseract4-4.0.0-1.1.mga7.x86_64
- tesseract-4.0.0-1.1.mga7.x86_64

Now, man is OK :)
I didn't test anything else
Comment 4 David GEIGER 2020-02-03 10:27:18 CET
Assigning to QA,


Advisory:
=============================

This new update fixes manpages generation as in our current tesseract package manpages display a bogus HTML code, see with:

$ man tesseract


=============================


Packages in 7/core/updates_testing:
========================
tesseract-4.0.0-1.1.mga7.x86_64.rpm
lib64tesseract4-4.0.0-1.1.mga7.x86_64.rpm
lib64tesseract-devel-4.0.0-1.1.mga7.x86_64.rpm
tesseract-osd-4.0.0-1.1.mga7.x86_64.rpm

tesseract-4.0.0-1.1.mga7.i586.rpm
libtesseract4-4.0.0-1.1.mga7.i586.rpm
libtesseract-devel-4.0.0-1.1.mga7.i586.rpm
tesseract-osd-4.0.0-1.1.mga7.i586.rpm

tesseract-afr-4.0.0-1.1.mga7.noarch.rpm
tesseract-amh-4.0.0-1.1.mga7.noarch.rpm
tesseract-ara-4.0.0-1.1.mga7.noarch.rpm
tesseract-asm-4.0.0-1.1.mga7.noarch.rpm
tesseract-aze_cyrl-4.0.0-1.1.mga7.noarch.rpm
tesseract-aze-4.0.0-1.1.mga7.noarch.rpm
tesseract-bel-4.0.0-1.1.mga7.noarch.rpm
tesseract-ben-4.0.0-1.1.mga7.noarch.rpm
tesseract-bod-4.0.0-1.1.mga7.noarch.rpm
tesseract-bos-4.0.0-1.1.mga7.noarch.rpm
tesseract-bul-4.0.0-1.1.mga7.noarch.rpm
tesseract-cat-4.0.0-1.1.mga7.noarch.rpm
tesseract-ceb-4.0.0-1.1.mga7.noarch.rpm
tesseract-ces-4.0.0-1.1.mga7.noarch.rpm
tesseract-chi_sim-4.0.0-1.1.mga7.noarch.rpm
tesseract-chi_tra-4.0.0-1.1.mga7.noarch.rpm
tesseract-chr-4.0.0-1.1.mga7.noarch.rpm
tesseract-cym-4.0.0-1.1.mga7.noarch.rpm
tesseract-dan_frak-4.0.0-1.1.mga7.noarch.rpm
tesseract-dan-4.0.0-1.1.mga7.noarch.rpm
tesseract-deu_frak-4.0.0-1.1.mga7.noarch.rpm
tesseract-deu-4.0.0-1.1.mga7.noarch.rpm
tesseract-dzo-4.0.0-1.1.mga7.noarch.rpm
tesseract-ell-4.0.0-1.1.mga7.noarch.rpm
tesseract-eng-4.0.0-1.1.mga7.noarch.rpm
tesseract-enm-4.0.0-1.1.mga7.noarch.rpm
tesseract-epo-4.0.0-1.1.mga7.noarch.rpm
tesseract-equ-4.0.0-1.1.mga7.noarch.rpm
tesseract-est-4.0.0-1.1.mga7.noarch.rpm
tesseract-eus-4.0.0-1.1.mga7.noarch.rpm
tesseract-fas-4.0.0-1.1.mga7.noarch.rpm
tesseract-fin-4.0.0-1.1.mga7.noarch.rpm
tesseract-fra-4.0.0-1.1.mga7.noarch.rpm
tesseract-frk-4.0.0-1.1.mga7.noarch.rpm
tesseract-frm-4.0.0-1.1.mga7.noarch.rpm
tesseract-gle-4.0.0-1.1.mga7.noarch.rpm
tesseract-glg-4.0.0-1.1.mga7.noarch.rpm
tesseract-grc-4.0.0-1.1.mga7.noarch.rpm
tesseract-guj-4.0.0-1.1.mga7.noarch.rpm
tesseract-hat-4.0.0-1.1.mga7.noarch.rpm
tesseract-heb-4.0.0-1.1.mga7.noarch.rpm
tesseract-hin-4.0.0-1.1.mga7.noarch.rpm
tesseract-hrv-4.0.0-1.1.mga7.noarch.rpm
tesseract-hun-4.0.0-1.1.mga7.noarch.rpm
tesseract-iku-4.0.0-1.1.mga7.noarch.rpm
tesseract-ind-4.0.0-1.1.mga7.noarch.rpm
tesseract-isl-4.0.0-1.1.mga7.noarch.rpm
tesseract-ita_old-4.0.0-1.1.mga7.noarch.rpm
tesseract-ita-4.0.0-1.1.mga7.noarch.rpm
tesseract-jav-4.0.0-1.1.mga7.noarch.rpm
tesseract-jpn-4.0.0-1.1.mga7.noarch.rpm
tesseract-kan-4.0.0-1.1.mga7.noarch.rpm
tesseract-kat_old-4.0.0-1.1.mga7.noarch.rpm
tesseract-kat-4.0.0-1.1.mga7.noarch.rpm
tesseract-kaz-4.0.0-1.1.mga7.noarch.rpm
tesseract-khm-4.0.0-1.1.mga7.noarch.rpm
tesseract-kir-4.0.0-1.1.mga7.noarch.rpm
tesseract-kor-4.0.0-1.1.mga7.noarch.rpm
tesseract-kur-4.0.0-1.1.mga7.noarch.rpm
tesseract-lao-4.0.0-1.1.mga7.noarch.rpm
tesseract-lat-4.0.0-1.1.mga7.noarch.rpm
tesseract-lav-4.0.0-1.1.mga7.noarch.rpm
tesseract-lit-4.0.0-1.1.mga7.noarch.rpm
tesseract-mal-4.0.0-1.1.mga7.noarch.rpm
tesseract-mar-4.0.0-1.1.mga7.noarch.rpm
tesseract-mkd-4.0.0-1.1.mga7.noarch.rpm
tesseract-mlt-4.0.0-1.1.mga7.noarch.rpm
tesseract-msa-4.0.0-1.1.mga7.noarch.rpm
tesseract-mya-4.0.0-1.1.mga7.noarch.rpm
tesseract-nep-4.0.0-1.1.mga7.noarch.rpm
tesseract-nld-4.0.0-1.1.mga7.noarch.rpm
tesseract-nor-4.0.0-1.1.mga7.noarch.rpm
tesseract-ori-4.0.0-1.1.mga7.noarch.rpm
tesseract-pan-4.0.0-1.1.mga7.noarch.rpm
tesseract-pol-4.0.0-1.1.mga7.noarch.rpm
tesseract-por-4.0.0-1.1.mga7.noarch.rpm
tesseract-pus-4.0.0-1.1.mga7.noarch.rpm
tesseract-ron-4.0.0-1.1.mga7.noarch.rpm
tesseract-rus-4.0.0-1.1.mga7.noarch.rpm
tesseract-san-4.0.0-1.1.mga7.noarch.rpm
tesseract-sin-4.0.0-1.1.mga7.noarch.rpm
tesseract-slk_frak-4.0.0-1.1.mga7.noarch.rpm
tesseract-slk-4.0.0-1.1.mga7.noarch.rpm
tesseract-slv-4.0.0-1.1.mga7.noarch.rpm
tesseract-spa_old-4.0.0-1.1.mga7.noarch.rpm
tesseract-spa-4.0.0-1.1.mga7.noarch.rpm
tesseract-sqi-4.0.0-1.1.mga7.noarch.rpm
tesseract-srp_latn-4.0.0-1.1.mga7.noarch.rpm
tesseract-srp-4.0.0-1.1.mga7.noarch.rpm
tesseract-swa-4.0.0-1.1.mga7.noarch.rpm
tesseract-swe-4.0.0-1.1.mga7.noarch.rpm
tesseract-syr-4.0.0-1.1.mga7.noarch.rpm
tesseract-tam-4.0.0-1.1.mga7.noarch.rpm
tesseract-tel-4.0.0-1.1.mga7.noarch.rpm
tesseract-tgk-4.0.0-1.1.mga7.noarch.rpm
tesseract-tgl-4.0.0-1.1.mga7.noarch.rpm
tesseract-tha-4.0.0-1.1.mga7.noarch.rpm
tesseract-tir-4.0.0-1.1.mga7.noarch.rpm
tesseract-tur-4.0.0-1.1.mga7.noarch.rpm
tesseract-uig-4.0.0-1.1.mga7.noarch.rpm
tesseract-ukr-4.0.0-1.1.mga7.noarch.rpm
tesseract-urd-4.0.0-1.1.mga7.noarch.rpm
tesseract-uzb_cyrl-4.0.0-1.1.mga7.noarch.rpm
tesseract-uzb-4.0.0-1.1.mga7.noarch.rpm
tesseract-vie-4.0.0-1.1.mga7.noarch.rpm
tesseract-yid-4.0.0-1.1.mga7.noarch.rpm


Source RPM: 
========================
tesseract-4.0.0-1.1.mga7.src.rpm

Assignee: zen25000 => qa-bugs

Comment 5 Len Lawrence 2020-02-03 16:38:47 CET
Installed the four main packages plus tesseract-eng.

Confirmed the problem with `man tesseract`.
Updated the packages and checked man pages again.
All OK.

Looking to see see if this can be tested further.

CC: (none) => tarazed25

Comment 6 Len Lawrence 2020-02-03 17:21:38 CET
Had a stab at this.  Converted an A4 postscript page to PNG format and
$ tesseract abc3.png -
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 328
----------------------------------
George,

M a f y J a n e t :’chjhirlffamily

----------------------------------
The dashed lines have been added for clarity.
This was a tough test of label print output.
Three groups of characters in different colours in a plain italic font:
              George
 Mary  Janet  Michelle
              and Family

Not a bad effort really.

Tried black on white with Bitstream Charter 15 and the OCR looked close to perfect, just two errors.
$ tesseract Dodgson.png -
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 137
‘Twas brillig and the slithy toves
Did gyre and gimble in the wabe
All mimsy were the borogoves

And the mome raths outgrabe.

These are elementary tests which do not exercize any of the options but should be enough to pass this given that the bug has nothing to do with the OCR processing.

Whiteboard: (none) => MGA7-64-OK

Comment 7 Thomas Andrews 2020-02-03 22:21:52 CET
Brillig indeed. Validating. Advisory in Comment 4.

Keywords: (none) => validated_update
CC: (none) => andrewsfarm, sysadmin-bugs

Thomas Backlund 2020-02-04 11:38:26 CET

CC: (none) => tmb
Keywords: (none) => advisory

Comment 8 Mageia Robot 2020-02-04 12:08:28 CET
An update for this issue has been pushed to the Mageia Updates repository.

https://advisories.mageia.org/MGAA-2020-0044.html

Resolution: (none) => FIXED
Status: NEW => RESOLVED


Note You need to log in before you can comment on or make changes to this bug.