Bug 32044 - html2text, core dumped: ominous backtrace
Summary: html2text, core dumped: ominous backtrace
Status: RESOLVED FIXED
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: Cauldron
Hardware: All Linux
Priority: Normal normal
Target Milestone: ---
Assignee: David GEIGER
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-06-23 13:20 CEST by Elmar Stellnberger
Modified: 2023-07-15 11:29 CEST (History)
2 users (show)

See Also:
Source RPM: html2text-2.0.0-2.mga9.src.rpm
CVE:
Status comment:


Attachments
HSAHEC6K2 - html page from dpaste as downloaded by curl/wget (12.87 KB, text/html)
2023-06-23 13:22 CEST, Elmar Stellnberger
Details
installed packages for i586 and amd64 tests (1.43 KB, text/plain)
2023-06-23 13:25 CEST, Elmar Stellnberger
Details
gdb-debug.log (55.22 KB, text/plain)
2023-06-23 13:35 CEST, Elmar Stellnberger
Details
html2text-2.0.0 miniroot for OpenSuSE Leap 15.4, amd64 (793 bytes, text/plain)
2023-06-24 11:02 CEST, Elmar Stellnberger
Details
html2text-2.0.0 miniroot for OpenSuSE Leap 15.4, amd64 (+repourls) (1.04 KB, text/plain)
2023-06-24 11:07 CEST, Elmar Stellnberger
Details

Description Elmar Stellnberger 2023-06-23 13:20:00 CEST
html2text-2.0.0-2 dumps its core when invoked with this html page (see attached file) from dpaste. Initially discovered under Mageia 9 i586. Verified to dump core with html2text-2.0.0-2 x86_64 mg9 as well. Works with html2text-1.3.2a-16.mga8/x64.
Comment 1 Elmar Stellnberger 2023-06-23 13:22:18 CEST
Created attachment 13889 [details]
HSAHEC6K2 - html page from dpaste as downloaded by curl/wget

viewable with Lynx if you add the .html suffix
invoke: html2text HSAHEC6K2
Comment 2 Elmar Stellnberger 2023-06-23 13:25:58 CEST
Created attachment 13890 [details]
installed packages for i586 and amd64 tests

have a look at: Bug 32045
Comment 3 Elmar Stellnberger 2023-06-23 13:35:00 CEST
Created attachment 13891 [details]
gdb-debug.log

  The program segfaults when calling isspace() with an invalid character value greater than 0xFF (man page of isspace prohibits this). As far as good. However when I go up one frame the source code viewed by gdb indicates that it calls isspace with the 3rd character of the string "???#104", so it ought not fail to do this. When I disassembled the assembly that invokes isspace it read a 32bit value from the stack above the BP (where the first parameter should be). However that one is itself of course an invalid value for a character (you can´t dereference the value either):
(gdb) p/c *0x2300949c
Cannot access memory at address 0x2300949c
143					istr &s(*value_return->strinG);
144					string::size_type x = s.length();
145					while (x > 0 && isspace(s[x - 1]))
  Seems to me as if the program was translated/compiled wrongly. Please have a look!
Comment 4 Elmar Stellnberger 2023-06-23 13:43:24 CEST
> urpmq --list-url | grep media/core
Core Release https://ftp.fi.muni.cz/pub/linux/mageia/distrib/9/i586/media/core/release
Core Updates https://ftp.fi.muni.cz/pub/linux/mageia/distrib/9/i586/media/core/updates
( for those who want to reproduce on x64 using the minimal chroot )
Comment 5 Elmar Stellnberger 2023-06-23 15:19:06 CEST
strange, I don't really understand it:
html2text-2.0.0-2.mga9
glibc-2.36-43.mga9
glibc-devel-2.36-43.mga9
html2text-debugsource-2.0.0-2.mga9
html2text-debuginfo-2.0.0-2.mga9
glibc-debugsource-2.36-43.mga9
glibc-debuginfo-2.36-43.mga9
Comment 6 Morgan Leijström 2023-06-23 21:50:34 CEST
I dont get what you try to say with comment 4 and 5, but for the rest:

Thank you for the investigation

This package have no registered maintainer, so setting to all

Assignee: bugsquad => pkg-bugs
CC: (none) => fri

Comment 7 Lewis Smith 2023-06-23 22:01:51 CEST
Thank you for your deep investigations.

Cauldron system, x64, Cinnamon.
I installed html2text-2.0.0-2.mga9; tried attachment 13889 [details], the web page in question, in a browser - it works (points to another page).

 $ html2text [downloaded]HSAHEC6K2.html 
 Segmentation fault (core dumped)
Clear enough.

The application web site:
 https://github.com/grobian/html2text
says little, I doubt there is a newer version. If it is an upstream problem,
 https://github.com/grobian/html2text/issues
offers raising an issue, for which you have to "Sign up for Github" - if it comes to that (for Elmar).
"Seems to me as if the program was translated/compiled wrongly"

Morgan has already assigned this globally.

CC: (none) => lewyssmith

Comment 8 Elmar Stellnberger 2023-06-24 11:02:16 CEST
Created attachment 13892 [details]
html2text-2.0.0 miniroot for OpenSuSE Leap 15.4, amd64

problem apparently not specific to Mageia 9: same version of html2text segfaults under minimal chroot for OpenSuSE Leap 15.4 - wanna have a look at the application web site
Comment 9 Elmar Stellnberger 2023-06-24 11:07:09 CEST
Created attachment 13893 [details]
html2text-2.0.0 miniroot for OpenSuSE Leap 15.4, amd64 (+repourls)

oops, forgot to include repo-urls

Attachment 13892 is obsolete: 0 => 1

Comment 10 Elmar Stellnberger 2023-06-24 13:10:57 CEST
opened an issue upstreams: https://github.com/grobian/html2text/issues/54
Let us see what they will respond.
Comment 11 Elmar Stellnberger 2023-06-24 16:39:12 CEST
Lewis, can you please tell me whether html2text was compiled with GCC or CLang? What compiler version was used? Grobian has explained well why later versions of html2text don´t segfault, yet I don´t dare to expect that he would comment on the assembly code I have seen.
Comment 12 David GEIGER 2023-06-24 17:50:53 CEST
Please test new html2text-2.1.1-1.mga9 in Core/Updates_testing repo!

CC: (none) => geiger.david68210

Comment 13 Lewis Smith 2023-06-24 21:10:49 CEST
David, first thanks for this instant new version.
I was going to try it now, but on a Cauldron system, cannot see html2text in updates_testing; nor can I find the right argument for urpmi --media ... ; it did not like 'core/updates_testing' or "updates_testing".

Are you able to tell Elmar "whether html2text was compiled with GCC or CLang".

@Elmar
Please try the new version if you can.
Comment 14 Lewis Smith 2023-06-25 19:57:00 CEST
Still cannot see it.
Comment 15 David GEIGER 2023-06-25 20:23:55 CEST
# LC_ALL=C urpmi --test html2text
Marking html2text as manually installed, it won't be auto-orphaned


    $MIRRORLIST: media/core/updates_testing/html2text-2.1.1-1.mga9.x86_64.rpm
installing html2text-2.1.1-1.mga9.x86_64.rpm from /var/cache/urpmi/rpms
Preparing...                     ########################################################################################################################
Installation is possible
Comment 17 David GEIGER 2023-06-26 18:16:12 CEST
Now 2.1.1 as an update in Core/Release!
Comment 18 Lewis Smith 2023-06-26 21:48:59 CEST
Thanks, found it at last:
 html2text-2.1.1-1.mga9

Trying it - NO crash.
I tried 2 ways:
- 'saving as' the viewed page, then html2text'ing that; the result was meaningless if textually correct.
 $ html2text dpaste_HSAHEC6K2.html > dpaste_HSAHEC6K2.txt

- right-clicking the page link, saving that as... then html2text'ing that; the result was OK (see below for comparison of the O/P & the viewed page).
 $ html2text HSAHEC6K2.html > HSAHEC6K2.txt

1) html2text O/P of page right-clicked 'save as'...

Sign in New API Help About
114 bytes of Plain text
Created 1 day, 9 hours ago — expires in 6 days
Viewed 4 times
https://dpaste.com/HSAHEC6K2
COPY TO CLIPBOARD ✔ SOFT WRAP RAW TEXT DUPLICATE DIFF
1 https://www.derstandard.at/story/2000132770775/1-200-transistoren-in-
  handarbeit-student-baut-eigene-prozessoren-in
https://www.derstandard.at/story/2000132770775/1-200-transistoren-in-
handarbeit-student-baut-eigene-prozessoren-in
===============================================================================
Share:

===============================================================================

2) X mouse select/paste of the GUI page:

SIGN IN NEW API HELP ABOUT
114 bytes of Plain text
Created 1 day, 9 hours ago — expires in 6 days
Viewed 4 times
COPY TO CLIPBOARD SOFT WRAP RAW TEXT DUPLICATE DIFF
1
https://www.derstandard.at/story/2000132770775/1-200-transistoren-in-handarbeit-student-baut-eigene-prozessoren-in
Share:   

Well done David. This can go on its way.

(In reply to Elmar Stellnberger from comment #10)
> opened an issue upstreams: https://github.com/grobian/html2text/issues/54
> Let us see what they will respond.
Perhaps you can close that?

Assignee: pkg-bugs => geiger.david68210
CC: lewyssmith => (none)

Comment 19 David GEIGER 2023-07-15 11:29:48 CEST
Closing as fixed!

Resolution: (none) => FIXED
Status: NEW => RESOLVED


Note You need to log in before you can comment on or make changes to this bug.