How to reproduce in Mageia 2 Beta 1: 1. install using DVD 2. press ESC (to go to text bootloader: due to bug 2038) 3. type in 'text' and hit the RETURN key 4. it loads the program 5. you immediately get an error: "/bin/unicode_start: /usr/bin/tty: not found umounting partitions you may safely reboot or halt your system" supposedly the above is now fixed by tv (which means that if you use the boot.iso with a netinstall, that you get the same as Mageia 1 Alfa 3): after loading, the screen refreshes and at the top i get: exited abnormally :-( -- received signal 6 tv speculated that this is likely due to the perl migration from 5.12 to 5.14
CC: (none) => alien, thierry.vignaudPriority: Normal => release_blocker
Source RPM: (none) => drakx-installer-stage2
Created attachment 1669 [details] mageia text installer bug with the new update for text installer tty being installed, i now get some kind of udev related bug
Depends on: (none) => 4768
Created attachment 1672 [details] mageia text installer bug after the sh change, now i get this:
Attachment 1669 is obsolete: 0 => 1
I cannot reproduce this. What's the output of lspcidrake -v (to be attached, not pasted) on that system?
Jerome, the drakx installer is broken since we switched from perl-5.12 to perl-5.14. It's segfaulting. See attachment from a patched stage2
Summary: text installer doesn't work => text installer segfaults since switching from perl-5.12.x to 5.14.xCC: (none) => jquelinDepends on: 4768 => (none)
Created attachment 1679 [details] GDB trace
Created attachment 1680 [details] Actually we either segfault in perl or in ncurses/perl-Curses That's the other trace. We get either one randomly
unfortunately, i must confess that my c-foo is way too weak nowadays to really help you on this topic. note that curses perl module doesn't have a bug reported that looks like that, and i couldn't find mention of such a bug in fedora or in debian.
i could take a peek, but i'd rather have a simple perl-ncurses reproducer, i've been looking, but sadly the screenshots only contain "bt" not "bt full" after looking i suspect at the line it does have an issue, it seems like it's running out of height boundary... i'd like to have a reproducer in the perl code, so i can do a full valgrind on it, or check more in gdb...
Created attachment 1689 [details] text installer I get this now... after the new drakx-installer release... how did you get a gdb trace?
Created attachment 1697 [details] GDB trace always segfaulting in perl now Jerome: After updating & patching ncurses, it now always segfaults in perl when mallocing ...
(In reply to comment #8) > i could take a peek, but i'd rather have a simple perl-ncurses reproducer, i've > been looking, but sadly the screenshots only contain "bt" not "bt full" > > after looking i suspect at the line it does have an issue, it seems like it's > running out of height boundary... > > i'd like to have a reproducer in the perl code, so i can do a full valgrind on > it, or check more in gdb... The whole point is to actually end in having such a reproducer. Note that stage2 environment is different (less packages, less files, less environment variables set, ...) (In reply to comment #9) > how did you get a gdb trace? I renamed /usr/bin/runinstall2 as /usr/bin/runinstall2.pl, call "gdb --args perl /usr/bin/runinstall2.pl" from runinstall2 and add everything I need in the squashfs. You can see the old https://wiki.mageia.org/en/Drakx-installer_tips_and_tricks#modifying_the_stage2 page that Pixel & me wrote in the old days. I think it would be easier for you by first rebuilding drakx-installer-stage2 with that: %build export DEBUG_INSTALL=1 You may also want to alter perl-install/install/share/list.xml in order to include more stuff (bash, ...) Then you can run gdb, extract a core file and put it on some disk after mounting it. Or you can also add debug packages' content to the squashfs image (either running manually unsquashfs/mksquashfs or using misc/mdkinst_stage2_tool --uncompress / misc/mdkinst_stage2_tool --compress
argh.... is there currently a way to reproduce it in a normal environment?
looking at the malloc bt, it looks really weird... the addresses are > 32bit, so i wonder if i586 has this issue? also, i'm not certain, but it looks like it wants to malloc in the same region as the code? also the addresses are hopefully some kind of hardware mapping, because i doubt you have 128TB of memory on this machine...
Created attachment 1704 [details] smaller bug reproducer Bug can be reproduced with a much smaller script: - unsquashfs mdkinst.sqfs - cd squashfs-root - tar xf /where/it/was/downloaded/bug2.tgz - gdb -q --args chroot squashfs-root.dbg /usr/bin/run4 --text - type "gcore" - exit gdb - gdb -q perl core.XXXXX Alternatively, you can root the following if you've a proper env (which I doubt): chroot squashfs-root gdb -q --args perl /usr/bin/run4 --text
s/squashfs-root.dbg/squashfs-root/ BTW, you 'll also need to run: mkdir squashfs-root/proc mount -t proc none squashfs-root/proc What is strange is despite having /dev/tty, strace -e file sh shows: open("/dev/tty", O_RDWR) = -1 ENXIO (No such device or address) sh: 0 can't access tty, job control off which may be related (hint, hint)
Colin, I tried include /lib/udev/devices /lib/udev/rules.d/{10-console,50-udev-default,75-tty-description}.rules /lib/udev/{console_check,console_init} to no avail. It may indeed be the switch to udev that is breaking the text installer
CC: (none) => mageia
I've updated the installer stage2. One can now rebuild it and set debug to 1 in the spec file then the generated rpm will contains a mdkinst.sqfs with gdb, bash, ... For example, if you've a local mirror, you can overwrite install/stage2/mdkinst.sqfs with the one from the rpm you just generated, then boot your boot.iso. Once in the stage2, you can run: gdb -q --args perl /usr/lib/libDrakX/install/install2 Alternatively, you can just boot boot.iso in your favourite emulator and just set up a web server with /install/stage2/mdkinst.sqfs tree.
Cool. I'll try and play with the installer a bit this evening as there are few other tests I want to do too relating to LVMs and whether we really need to generate host-only initrds from the installer. Is this only happening in 32-bit installs?
Both arches are affected
(In reply to comment #15) > open("/dev/tty", O_RDWR) = -1 ENXIO (No such device or address) > sh: 0 can't access tty, job control off Doesn't this mean that there is no controlling tty? AFAIK /dev/tty is pointing to the controlling tty of the process opening it I tried to run in chroot with your tarball from comment 14 and was getting: unicode_start skipped on not a tty [Inferior 1 (process 4175) exited with code 01] Then I mounted /dev and /dev/pts in the chroot and got: unicode_start skipped on /dev/pts/0 [Inferior 1 (process 4236) exited with code 01] But no segfault.
CC: (none) => pterjan
CC: (none) => remco
I doubt that the ENXIO error is the source of the problem. Modify runinstall2 with something based on this hack : @@ -4,4 +4,6 @@ echo "Starting Udev\n" perl -I/usr/lib/libDrakX -Minstall::install2 -e "install::install2::start_udev()" echo "You can start the installer by running install2" echo "You can run it in GDB by running gdb-inst" -exec /usr/bin/busybox sh +ln -s /usr/bin/busybox /tmp/setsid +ln -s /usr/bin/busybox /tmp/cttyhack +exec /tmp/setsid /tmp/cttyhack /usr/bin/busybox sh and the error will be gone... but not the crash. I'm getting the crash in a qemu vm in some curses code (same as in comment #6), which was supposed to be fixed according to comment #10. Package not up to date or something like that ? I've noticed on svn a patch called ncurses-fix-segfault.diff which may fix it (even if I'm not sure to understand what it does). Package never submitted or the fix is not enough/trying to fix the bug at the wrong place, thus never submitted ?
CC: (none) => arnaud.patard
(In reply to comment #20) > But no segfault. Note that you need to set up TERM=linux. Do you get the dialog instead of the segfault? (In reply to comment #21) > I doubt that the ENXIO error is the source of the problem. Modify runinstall2 > with something based on this hack : gdb shows ncurses having tons of ioctl failure on /dev/tty* before the segfault so... > I'm getting the crash in a qemu vm in some curses code (same as in comment #6), > which was supposed to be fixed according to comment #10. I didn't uploaded the patched ncurses b/c we still got a segfault within perl then and it looks like the recent switch to udev is the cause.
I've tried with a stage2 w/o udev and run4 still explodes either in perl or in ncurses so it's new perl+ncurses
Created attachment 1708 [details] smaller bug reproducer Updated reproducer (smaller, resynced with install2.pm) & include gdb-run4 for easier debugging
Attachment 1704 is obsolete: 0 => 1
Glad I'm (for now at least) off the hook :) Forgive the ignorance here, but are we talking about libncurses here? if so, could libncursesw be used instead? I ask because of details on bug #2156 (that I became aware of indirectly via bug #4372). Perhaps the fixes in #2156 are to blame for this? Just random, uninformed guesses, but it should be easy to test with an updated libreadline.
Indeed perl-Curses use libncursesw
_BUT_: librpm.so is linked with liblua.so.5.1... which is linked with libncurses.so :-( Which can be checked: # ldd /bin/rpm|fgrep curses libncurses.so.5 => /lib64/libncurses.so.5 (0x00007ff279f3f000) libncursesw.so.5 => /usr/lib64/libncursesw.so.5 (0x00007ff279472000) So this may explain it (but not why we can't reproduce it out of the installer env)
lua is pretty simple to fix. And indeed once fixed, install runs fine.
\o/ Awesome, good work Thierry :)
Well thanks you for the hint :-)
Resolution: (none) => FIXEDStatus: NEW => RESOLVED
Created attachment 1710 [details] text installer this may be related...
Created attachment 1718 [details] rescue actually, since this change, the rescue has now the same bug... i think it could likely related due to change between curses libraries...
Looks like a charset problem â (U+2500) is C4 in PC437/PC850 but does not exist in ISO-8859-1 or ISO-8859-15 where C4 is Ã
Please open a new bug report for that. That has nothing to do with the segfault. It's reproducible with drakboot with TERM=linux when locale is not UTF-8, but not with TERM=screen: TERM | UTF-8 | no-UTF-8 ---------+-------------- linux[1] | OK | BUG screen | OK | OK [1] like in the installer
I don't 100% follow what exactly is the bug and why it suddenly appears: A) the installer cursesw mode doesn't use UTF-8 while the old curses one did use UTF-8 ? B) or did the old installer use screen mode? and there never was UTF-8? C) or does the currently used cursesw library have a bug producing non-UTF-8 codes? D) or did cursesw always had issues with this and it's just because we switch that we see it? E) or is the bug that we suddenly have a non-UTF-8 locale in the installer? i'll try and make a bug, but atm i have no idea what to file it against or what the bug even is...
(In reply to comment #35) > I don't 100% follow what exactly is the bug and why it suddenly appears: > > A) the installer cursesw mode doesn't use UTF-8 while the old curses one did > use UTF-8 ? The difference between curses and cursesw is that cursesw handles multibyte characters.
As already asked, please stop commenting here and open a new bug report. As for perl-Curses, it was always linked with libncursesw
(In reply to comment #37) > As already asked, please stop commenting here and open a new bug report. As for > perl-Curses, it was always linked with libncursesw but against what? how can i file a bug if i don't know what is wrong??? or should i file it against the installer?
You know what's wrong: the text installer