Bug 4724 - text installer segfaults since switching from perl-5.12.x to 5.14.x
Summary: text installer segfaults since switching from perl-5.12.x to 5.14.x
Status: RESOLVED FIXED
Alias: None
Product: Mageia
Classification: Unclassified
Component: Installer (show other bugs)
Version: Cauldron
Hardware: All Linux
Priority: release_blocker critical
Target Milestone: ---
Assignee: Mageia Bug Squad
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-02-27 20:22 CET by AL13N
Modified: 2012-03-12 08:07 CET (History)
7 users (show)

See Also:
Source RPM: drakx-installer-stage2
CVE:
Status comment:


Attachments
mageia text installer bug (3.97 KB, image/png)
2012-03-01 23:40 CET, AL13N
Details
mageia text installer bug (14.11 KB, image/png)
2012-03-02 21:21 CET, AL13N
Details
GDB trace (4.56 KB, image/png)
2012-03-05 18:18 CET, Thierry Vignaud
Details
Actually we either segfault in perl or in ncurses/perl-Curses (4.99 KB, image/png)
2012-03-05 18:47 CET, Thierry Vignaud
Details
text installer (9.07 KB, image/png)
2012-03-07 00:24 CET, AL13N
Details
GDB trace always segfaulting in perl now (6.39 KB, image/png)
2012-03-07 18:16 CET, Thierry Vignaud
Details
smaller bug reproducer (5.85 KB, application/force-download)
2012-03-08 11:47 CET, Thierry Vignaud
Details
smaller bug reproducer (5.94 KB, application/force-download)
2012-03-09 12:09 CET, Thierry Vignaud
Details
text installer (3.56 KB, image/png)
2012-03-09 17:45 CET, AL13N
Details
rescue (4.52 KB, image/png)
2012-03-11 08:32 CET, AL13N
Details

Description AL13N 2012-02-27 20:22:49 CET
How to reproduce in Mageia 2 Beta 1:

1. install using DVD
2. press ESC (to go to text bootloader: due to bug 2038)
3. type in 'text' and hit the RETURN key
4. it loads the program
5. you immediately get an error:

"/bin/unicode_start: /usr/bin/tty: not found

umounting partitions
you may safely reboot or halt your system"

supposedly the above is now fixed by tv (which means that if you use the boot.iso with a netinstall, that you get the same as Mageia 1 Alfa 3):

after loading, the screen refreshes and at the top i get:

exited abnormally :-( -- received signal 6

tv speculated that this is likely due to the perl migration from 5.12 to 5.14
AL13N 2012-02-27 20:23:18 CET

CC: (none) => alien, thierry.vignaud
Priority: Normal => release_blocker

Thierry Vignaud 2012-02-27 20:40:56 CET

Source RPM: (none) => drakx-installer-stage2

Comment 1 AL13N 2012-03-01 23:40:40 CET
Created attachment 1669 [details]
mageia text installer bug

with the new update for text installer tty being installed, i now get some kind of udev related bug
Thierry Vignaud 2012-03-02 08:54:00 CET

Depends on: (none) => 4768

Comment 2 AL13N 2012-03-02 21:21:21 CET
Created attachment 1672 [details]
mageia text installer bug

after the sh change, now i get this:

Attachment 1669 is obsolete: 0 => 1

Comment 3 Thierry Vignaud 2012-03-05 17:01:00 CET
I cannot reproduce this.
What's the output of lspcidrake -v (to be attached, not pasted) on that system?
Comment 4 Thierry Vignaud 2012-03-05 18:18:01 CET
Jerome, the drakx installer is broken since we switched from perl-5.12 to perl-5.14. It's segfaulting.

See attachment from a patched stage2

Summary: text installer doesn't work => text installer segfaults since switching from perl-5.12.x to 5.14.x
CC: (none) => jquelin
Depends on: 4768 => (none)

Comment 5 Thierry Vignaud 2012-03-05 18:18:22 CET
Created attachment 1679 [details]
GDB trace
Comment 6 Thierry Vignaud 2012-03-05 18:47:34 CET
Created attachment 1680 [details]
Actually we either segfault in perl or in ncurses/perl-Curses

That's the other trace. We get either one randomly
Comment 7 Jerome Quelin 2012-03-06 09:21:49 CET
unfortunately, i must confess that my c-foo is way too weak nowadays to really help you on this topic.

note that curses perl module doesn't have a bug reported that looks like that, and i couldn't find mention of such a bug in fedora or in debian.
Comment 8 AL13N 2012-03-06 20:25:53 CET
i could take a peek, but i'd rather have a simple perl-ncurses reproducer, i've been looking, but sadly the screenshots only contain "bt" not "bt full"

after looking i suspect at the line it does have an issue, it seems like it's running out of height boundary...

i'd like to have a reproducer in the perl code, so i can do a full valgrind on it, or check more in gdb...
Comment 9 AL13N 2012-03-07 00:24:25 CET
Created attachment 1689 [details]
text installer

I get this now... after the new drakx-installer release...

how did you get a gdb trace?
Comment 10 Thierry Vignaud 2012-03-07 18:16:16 CET
Created attachment 1697 [details]
GDB trace always segfaulting in perl now

Jerome: After updating & patching ncurses, it now always segfaults in perl when mallocing ...
Comment 11 Thierry Vignaud 2012-03-07 18:24:57 CET
(In reply to comment #8)
> i could take a peek, but i'd rather have a simple perl-ncurses reproducer, i've
> been looking, but sadly the screenshots only contain "bt" not "bt full"
> 
> after looking i suspect at the line it does have an issue, it seems like it's
> running out of height boundary...
> 
> i'd like to have a reproducer in the perl code, so i can do a full valgrind on
> it, or check more in gdb...

The whole point is to actually end in having such a reproducer.
Note that stage2 environment is different (less packages, less files, less
environment variables set, ...)

(In reply to comment #9)
> how did you get a gdb trace?

I renamed /usr/bin/runinstall2 as /usr/bin/runinstall2.pl, call "gdb --args
perl /usr/bin/runinstall2.pl" from runinstall2 and add everything I need in the
squashfs.

You can see the old
https://wiki.mageia.org/en/Drakx-installer_tips_and_tricks#modifying_the_stage2
page that Pixel & me wrote in the old days.

I think it would be easier for you by first rebuilding drakx-installer-stage2
with that:

%build
export DEBUG_INSTALL=1

You may also want to alter perl-install/install/share/list.xml in order to
include more stuff (bash, ...)

Then you can run gdb, extract a core file and put it on some disk after
mounting it.

Or you can also add debug packages' content to the squashfs image (either
running manually unsquashfs/mksquashfs or using misc/mdkinst_stage2_tool
--uncompress / misc/mdkinst_stage2_tool --compress
Comment 12 AL13N 2012-03-07 18:48:43 CET
argh....

is there currently a way to reproduce it in a normal environment?
Comment 13 AL13N 2012-03-07 18:53:43 CET
looking at the malloc bt, it looks really weird... the addresses are > 32bit, so i wonder if i586 has this issue?

also, i'm not certain, but it looks like it wants to malloc in the same region as the code?

also the addresses are hopefully some kind of hardware mapping, because i doubt you have 128TB of memory on this machine...
Comment 14 Thierry Vignaud 2012-03-08 11:47:41 CET
Created attachment 1704 [details]
smaller bug reproducer

Bug can be reproduced with a much smaller script:
- unsquashfs mdkinst.sqfs
- cd squashfs-root
- tar xf /where/it/was/downloaded/bug2.tgz
- gdb -q --args chroot squashfs-root.dbg  /usr/bin/run4 --text
- type "gcore"
- exit gdb
- gdb -q perl core.XXXXX

Alternatively, you can root the following if you've a proper env (which I doubt):
chroot squashfs-root gdb -q --args perl /usr/bin/run4 --text
Comment 15 Thierry Vignaud 2012-03-08 11:51:37 CET
s/squashfs-root.dbg/squashfs-root/

BTW, you 'll also need to run:
mkdir squashfs-root/proc
mount -t proc none squashfs-root/proc


What is strange is despite having /dev/tty, strace -e file sh shows:

open("/dev/tty", O_RDWR) = -1 ENXIO (No such device or address)
sh: 0 can't access tty, job control off

which may be related (hint, hint)
Comment 16 Thierry Vignaud 2012-03-08 12:22:06 CET
Colin, I tried include /lib/udev/devices /lib/udev/rules.d/{10-console,50-udev-default,75-tty-description}.rules /lib/udev/{console_check,console_init} to no avail.
It may indeed be the switch to udev that is breaking the text installer

CC: (none) => mageia

Comment 17 Thierry Vignaud 2012-03-08 14:33:36 CET
I've updated the installer stage2.
One can now rebuild it and set debug to 1 in the spec file then the generated rpm will contains a mdkinst.sqfs with gdb, bash, ...

For example, if you've a local mirror, you can overwrite install/stage2/mdkinst.sqfs with the one from the rpm you just generated, then boot your boot.iso.
Once in the stage2, you can run:
gdb -q --args perl /usr/lib/libDrakX/install/install2

Alternatively, you can just boot boot.iso in your favourite emulator and just set up a web server with /install/stage2/mdkinst.sqfs tree.
Comment 18 Colin Guthrie 2012-03-08 15:02:02 CET
Cool. I'll try and play with the installer a bit this evening as there are few other tests I want to do too relating to LVMs and whether we really need to generate host-only initrds from the installer.

Is this only happening in 32-bit installs?
Comment 19 Thierry Vignaud 2012-03-08 15:17:42 CET
Both arches are affected
Comment 20 Pascal Terjan 2012-03-09 01:52:09 CET
(In reply to comment #15)
> open("/dev/tty", O_RDWR) = -1 ENXIO (No such device or address)
> sh: 0 can't access tty, job control off

Doesn't this mean that there is no controlling tty?
AFAIK /dev/tty is pointing to the controlling tty of the process opening it


I tried to run in chroot with your tarball from comment 14 and was getting:

unicode_start skipped on not a tty
[Inferior 1 (process 4175) exited with code 01]

Then I mounted /dev and /dev/pts in the chroot and got:

unicode_start skipped on /dev/pts/0
[Inferior 1 (process 4236) exited with code 01]

But no segfault.

CC: (none) => pterjan

Remco Rijnders 2012-03-09 09:03:01 CET

CC: (none) => remco

Comment 21 Arnaud Patard 2012-03-09 09:41:13 CET
I doubt that the ENXIO error is the source of the problem. Modify runinstall2 with something based on this hack :

@@ -4,4 +4,6 @@ echo "Starting Udev\n"
 perl -I/usr/lib/libDrakX -Minstall::install2 -e "install::install2::start_udev()"
 echo "You can start the installer by running install2"
 echo "You can run it in GDB by running gdb-inst"
-exec /usr/bin/busybox sh
+ln -s /usr/bin/busybox /tmp/setsid
+ln -s /usr/bin/busybox /tmp/cttyhack
+exec /tmp/setsid /tmp/cttyhack /usr/bin/busybox sh

and the error will be gone... but not the crash.

I'm getting the crash in a qemu vm in some curses code (same as in comment #6), which was supposed to be fixed according to comment #10.
Package not up to date or something like that ? I've noticed on svn a patch called ncurses-fix-segfault.diff which may fix it (even if I'm not sure to understand what it does). Package never submitted or the fix is not enough/trying to fix the bug at the wrong place, thus never submitted ?

CC: (none) => arnaud.patard

Comment 22 Thierry Vignaud 2012-03-09 11:13:56 CET
(In reply to comment #20)
> But no segfault.

Note that you need to set up TERM=linux. Do you get the dialog instead of the segfault?

(In reply to comment #21)
> I doubt that the ENXIO error is the source of the problem. Modify runinstall2
> with something based on this hack :

gdb shows ncurses having tons of ioctl failure on /dev/tty* before the segfault so...
 
> I'm getting the crash in a qemu vm in some curses code (same as in comment #6),
> which was supposed to be fixed according to comment #10.

I didn't uploaded the patched ncurses b/c we still got a segfault within perl then and it looks like the recent switch to udev is the cause.
Comment 23 Thierry Vignaud 2012-03-09 12:02:24 CET
I've tried with a stage2 w/o udev and run4 still explodes either in perl or in ncurses so it's new perl+ncurses
Comment 24 Thierry Vignaud 2012-03-09 12:09:56 CET
Created attachment 1708 [details]
smaller bug reproducer

Updated reproducer (smaller, resynced with install2.pm) & include gdb-run4 for easier debugging
Thierry Vignaud 2012-03-09 12:09:58 CET

Attachment 1704 is obsolete: 0 => 1

Comment 25 Colin Guthrie 2012-03-09 12:12:23 CET
Glad I'm (for now at least) off the hook :)

Forgive the ignorance here, but are we talking about libncurses here? if so, could libncursesw be used instead?

I ask because of details on bug #2156 (that I became aware of indirectly via bug #4372).

Perhaps the fixes in #2156 are to blame for this? Just random, uninformed guesses, but it should be easy to test with an updated libreadline.
Comment 26 Thierry Vignaud 2012-03-09 12:26:10 CET
Indeed perl-Curses use libncursesw
Comment 27 Thierry Vignaud 2012-03-09 12:33:29 CET
_BUT_: librpm.so is linked with liblua.so.5.1... which is linked with libncurses.so :-(

Which can be checked:
# ldd /bin/rpm|fgrep curses
        libncurses.so.5 => /lib64/libncurses.so.5 (0x00007ff279f3f000)
        libncursesw.so.5 => /usr/lib64/libncursesw.so.5 (0x00007ff279472000)

So this may explain it (but not why we can't reproduce it out of the installer env)
Comment 28 Thierry Vignaud 2012-03-09 12:44:07 CET
lua is pretty simple to fix.
And indeed once fixed, install runs fine.
Comment 29 Colin Guthrie 2012-03-09 12:45:32 CET
\o/ Awesome, good work Thierry :)
Comment 30 Thierry Vignaud 2012-03-09 14:28:24 CET
Well thanks you for the hint :-)

Resolution: (none) => FIXED
Status: NEW => RESOLVED

Comment 31 AL13N 2012-03-09 17:45:53 CET
Created attachment 1710 [details]
text installer

this may be related...
Comment 32 AL13N 2012-03-11 08:32:41 CET
Created attachment 1718 [details]
rescue

actually, since this change, the rescue has now the same bug... i think it could likely related due to change between curses libraries...
Comment 33 Pascal Terjan 2012-03-11 12:26:09 CET
Looks like a charset problem

â (U+2500) is C4 in PC437/PC850 but does not exist in ISO-8859-1 or ISO-8859-15 where C4 is Ã
Comment 34 Thierry Vignaud 2012-03-11 15:32:02 CET
Please open a new bug report for that.
That has nothing to do with the segfault.
It's reproducible with drakboot with TERM=linux when
locale is not UTF-8, but not with TERM=screen:

TERM     | UTF-8 | no-UTF-8
---------+--------------
linux[1] |   OK  |  BUG
screen   |   OK  |  OK

[1] like in the installer
Comment 35 AL13N 2012-03-11 22:22:24 CET
I don't 100% follow what exactly is the bug and why it suddenly appears:

A) the installer cursesw mode doesn't use UTF-8 while the old curses one did use UTF-8 ?

B) or did the old installer use screen mode? and there never was UTF-8?

C) or does the currently used cursesw library have a bug producing non-UTF-8 codes?

D) or did cursesw always had issues with this and it's just because we switch that we see it?

E) or is the bug that we suddenly have a non-UTF-8 locale in the installer?

i'll try and make a bug, but atm i have no idea what to file it against or what the bug even is...
Comment 36 Pascal Terjan 2012-03-11 23:23:28 CET
(In reply to comment #35)
> I don't 100% follow what exactly is the bug and why it suddenly appears:
> 
> A) the installer cursesw mode doesn't use UTF-8 while the old curses one did
> use UTF-8 ?

The difference between curses and cursesw is that cursesw handles multibyte characters.
Comment 37 Thierry Vignaud 2012-03-12 06:52:58 CET
As already asked, please stop commenting here and open a new bug report. As for perl-Curses, it was always linked with libncursesw
Comment 38 AL13N 2012-03-12 07:38:17 CET
(In reply to comment #37)
> As already asked, please stop commenting here and open a new bug report. As for
> perl-Curses, it was always linked with libncursesw

but against what? how can i file a bug if i don't know what is wrong???

or should i file it against the installer?
Comment 39 Thierry Vignaud 2012-03-12 08:07:00 CET
You know what's wrong: the text installer

Note You need to log in before you can comment on or make changes to this bug.