Bug 11953 - First reboot after install hangs forever at "Checking for new hardware"
Summary: First reboot after install hangs forever at "Checking for new hardware"
Status: RESOLVED FIXED
Alias: None
Product: Mageia
Classification: Unclassified
Component: Installer (show other bugs)
Version: Cauldron
Hardware: All Linux
Priority: Normal normal
Target Milestone: ---
Assignee: Mageia Bug Squad
QA Contact:
URL:
Whiteboard: 4RC
Keywords:
Depends on:
Blocks:
 
Reported: 2013-12-11 14:44 CET by Dick Gevers
Modified: 2014-01-18 17:25 CET (History)
6 users (show)

See Also:
Source RPM: Mageia-4-RC-*-DVD.iso
CVE:
Status comment:


Attachments
/root/drakx/report.bug.gz from 2 bit iso (252.71 KB, application/octet-stream)
2013-12-11 14:56 CET, Dick Gevers
Details
system journal (265.03 KB, application/octet-stream)
2013-12-29 12:05 CET, Dick Gevers
Details
xz compressed journal (24.39 KB, application/octet-stream)
2013-12-29 12:51 CET, Thierry Vignaud
Details
systemd journal of first reboot only (32 bit iso) (168.81 KB, application/octet-stream)
2013-12-29 17:58 CET, Dick Gevers
Details
dmesg.old as per #c14 (17.27 KB, application/octet-stream)
2013-12-30 21:18 CET, Dick Gevers
Details
journal till 2nd reboot for 32 bit install (2nd prerelease round) (161.04 KB, application/octet-stream)
2013-12-31 14:11 CET, Dick Gevers
Details
screenshot of screen when hanging forever (971.28 KB, image/jpeg)
2013-12-31 17:24 CET, Dick Gevers
Details

Description Dick Gevers 2013-12-11 14:44:21 CET
Description of problem:

After 64 bit install waited for ~1 hour (after 32 bit install waited ~ 1/2 hour) but first reboot is hanging endlessly at the poijt "Checking for new hardware" and needs to be reset.

My hardware: Notebook entry # 3 at: 
https://wiki.mageia.org/en/QA_iso_hardware_list#Notebooks

Will attach bug report in next comment.

Ref. irc with ennael on qa minutes ago.

Reproducible: 

Steps to Reproduce:
Dick Gevers 2013-12-11 14:44:45 CET

Whiteboard: (none) => 4beta2

Comment 1 Dick Gevers 2013-12-11 14:56:23 CET
Created attachment 4601 [details]
/root/drakx/report.bug.gz from 2 bit iso

s/poijt/point
Comment 2 Dick Gevers 2013-12-11 15:05:17 CET
Remark: The second reboot is as quick as normal.
Manuel Hiebel 2013-12-22 13:51:14 CET

CC: (none) => mageia, thierry.vignaud

Comment 3 Dick Gevers 2013-12-28 23:52:42 CET
Still applies

Source RPM: Mageia-4-beta2-*-DVD prerelease round #4 => Mageia-4-RC-x86_64-DVD.iso prerelease round #1
Whiteboard: 4beta2 => 4RC

Comment 4 Thierry Vignaud 2013-12-29 11:41:38 CET
We would need the boot logs (see journalctl)

Keywords: (none) => NEEDINFO
CC: (none) => mageia

Comment 5 Dick Gevers 2013-12-29 12:05:08 CET
Created attachment 4678 [details]
system journal

I trust this will be enough (if not please ask)
Dick Gevers 2013-12-29 12:05:23 CET

Keywords: NEEDINFO => (none)

Comment 6 Thierry Vignaud 2013-12-29 12:16:05 CET
Note that:
1) xz usually compress better than bzip2
2) you don't have to create a tarball for one single file,
   you can just compress it:
   xz < system.journal > /tmp/system.journal.xz

Anssi, any chance that your /sbin/display_driver_helper could block?

I doubt it since I see message about the dm service.
Are you sure your logs include your failed boots?

CC: (none) => anssi.hannula

Thierry Vignaud 2013-12-29 12:16:19 CET

Keywords: (none) => NEEDINFO

Comment 7 Thierry Vignaud 2013-12-29 12:51:17 CET
Created attachment 4679 [details]
xz compressed journal

What's more it would have best to do as asked and to attach the compressed output of journalctl (25kb vs 266kb for your tarball)

Attachment 4678 is obsolete: 0 => 1

Comment 8 Dick Gevers 2013-12-29 12:57:15 CET
sure i know xz compresses better, but i am not accustomed to use it: sorry.

Yes I'm sure the boot which was hanging forever was included in this journal.

Keywords: NEEDINFO => (none)

Comment 9 Colin Guthrie 2013-12-29 13:27:07 CET
The log appears to show only a 5minute wait before a reboot request is initiated.

Could you reproduce when passing systemd.log_level=debug on the command line to get more debug logging.

What is strange is that on the bad boot it says this:


Ker  28 23:46:38 dvglt.homepower service_harddrake[1212]: ### Program is starting ###
Ker  28 23:46:39 dvglt.homepower service_harddrake[1212]: running: dmidecode
Ker  28 23:46:39 dvglt.homepower service_harddrake[1212]: modified file /etc/sysconfig/harddrake2/kernel
Ker  28 23:46:40 dvglt.homepower service_harddrake[1212]: created file /etc/sysconfig/harddrake2/previous_hw
Ker  28 23:46:40 dvglt.homepower service_harddrake[1212]: launched command: display_driver_helper --check-loaded
Ker  28 23:46:40 dvglt.homepower systemd[1]: Started Reconfigure the system on administrator request.


Note the last line (which implies a "systemctl --system daemon-reload" being issued. At that point service_harddrake appears to run a second time:

Ker  28 23:46:40 dvglt.homepower service_harddrake[1274]: ### Program is starting ###
Ker  28 23:46:41 dvglt.homepower service_harddrake[1274]: running: dmidecode
Ker  28 23:46:41 dvglt.homepower service_harddrake[1274]: modified file /etc/sysconfig/harddrake2/kernel
Ker  28 23:46:42 dvglt.homepower service_harddrake[1274]: created file /etc/sysconfig/harddrake2/previous_hw
Ker  28 23:46:42 dvglt.homepower service_harddrake[1274]: launched command: display_driver_helper --check-loaded

On the second boot, it only runs once (although the "Started Reconfigure the system on administrator request." still happens).

Perhaps this is related to the kind of stuff that is only done on first boot?

We could also be hitting the old issue where we used to get deadlocks due to using systemctl commands inside the harddrake code without the --no-block argument. Actually we used to go via /sbin/service which in turn called systemctl. I added an env var SYSTEMCTL_NO_BLOCK which we can (and do) set to ensure that calling /sbin/service from harddrake ultimately called systemctl with --no-block. If the harddrake code has been updated to call systemctl directly it might now be missing this --no-block argument and deadlocking.

That said, such a deadlock should only be subject to a 5min timeout at most anyway.

So not really sure what would cause this problem from looking at the logs. I guess more analysis of the "first time" only stuff is needed.
Comment 10 Dick Gevers 2013-12-29 13:53:29 CET
In reply to coling:

>The log appears to show only a 5minute wait before a reboot request is initiated.

I did not want to wait longer, as I already expected it to hang endlessly ;)

>Could you reproduce when passing systemd.log_level=debug on the command line to >get more debug logging.

In progress: 3 minutes and counting. Will post it after a longer break and after first boot only.
Comment 11 Colin Guthrie 2013-12-29 14:18:18 CET
Awesome thanks :)
Comment 12 Dick Gevers 2013-12-29 17:58:07 CET
Created attachment 4680 [details]
systemd journal of first reboot only (32 bit iso)

The reboot hangs at about 12:54h (UTC) and the laptop is reset at about 13:42h
Dick Gevers 2013-12-29 17:58:40 CET

Source RPM: Mageia-4-RC-x86_64-DVD.iso prerelease round #1 => Mageia-4-RC-*-DVD.iso prerelease round #1

Comment 13 Colin Guthrie 2013-12-30 14:26:53 CET
This doesn't seem to show any kind of hang. The logs show the system has booted successfully and prints the line:

Dec 29 12:49:57 dvglt.homepower systemd[1]: Startup finished in 16.290s (kernel) + 39.121s (userspace) = 55.412s.


This implies that userspace considers itself done!

When things "hang", do you get a login prompt on tty2? That said, I doubt you will I see no reference to either graphical.target or multi-user.target (nor, importantly getty.target) in the log. This seems to imply that the target systemd is aiming for is somewhat undefined. e.g. perhaps it's just aiming for sysinit.target... knowing the kernel command line from the first boot would help...

It would be very interesting to see what (if anything) the /etc/systemd/system/default.target symlink says *before* the first boot and if passing systemd.target=multi-user.target helps in any way.

The journal itself seems somewhat lacking too. It's like it's not everything in it... e.g. it should start by explaining what the command line is and then shown all the kernel dmesg output. It's likely due to systemd-journal-flush.service not running!

So something quite odd is happening here and the logs appear to be somewhat incomplete... :s
Comment 14 Dick Gevers 2013-12-30 21:12:14 CET
Replying to #c13 from coling:

The hang is apparent from the time difference: there's a 50 minute wait till I rebooted.

Frankly I did not look to try and get into tty2 (will look into it on a next occasion this occurs).

The command line must have been very similar to this:
BOOT_IMAGE=M4RC root=UUID=<blikidname>  splash quiet resume=UUID=<blkidname> vga=788


I still have the dmesg.old relating to the same install and boot: 
'stat' says 2013-12-29 12:49:35, but the first couple of lines is gone, incl. that showing actual command line.

In case it helps, I will attach it here.
Comment 15 Dick Gevers 2013-12-30 21:18:48 CET
Created attachment 4689 [details]
dmesg.old as per #c14

dmesg.old as per #c14
Comment 16 Dick Gevers 2013-12-31 13:42:38 CET
Next try: 

tty2 is where I see (twice) "Checking for new hardware", in between is one line:
"Starting netprofile: The default is the current profile"

tty1 has the Mageia blue splashscreen, if I press Esc it rerolls the end of dmesg to "Checking for new hardware"

tty3 prints:
 * PCMCIA: probing PCI bus..
      not found

etcetera

No working tty can be reached.

Also [ Alt + Ctrl + 2x BS ] gives the same end result.

I will print the command line from /boot/grub/menu.lst and look out for a better journal than before.
Comment 17 Dick Gevers 2013-12-31 14:00:37 CET
Upon 2nd reboot after finishing install I agree that neither dmesg nor *old containd Command line (very peculiar !)

but relevant entry of /boot/grub/menu.lst is:
title m4rc32
lock
kernel (hd0,5)/vmlinuz BOOT_IMAGE=m4rc32 root=UUID=<blkidname>  splash quiet resume=UUID=<blkidname> vga-788
root (hd0,5)
initrd /inird.img
Comment 18 Dick Gevers 2013-12-31 14:11:58 CET
Created attachment 4696 [details]
journal till 2nd reboot for 32 bit install (2nd prerelease round)

I hope this journal gives some more clues than earlier ones. It's after the 2nd reboot and shows a 21 minute wait from 1st reboot.
Comment 19 Dick Gevers 2013-12-31 17:24:44 CET
Created attachment 4700 [details]
screenshot of screen when hanging forever
Dick Gevers 2014-01-08 12:04:47 CET

Source RPM: Mageia-4-RC-*-DVD.iso prerelease round #1 => Mageia-4-RC-*-DVD.iso prerelease round #4

Comment 20 Dick Gevers 2014-01-11 12:42:49 CET
Valid for 5th round of isos.

Source RPM: Mageia-4-RC-*-DVD.iso prerelease round #4 => Mageia-4-RC-*-DVD.iso prerelease round #5

Comment 21 Sander Lepik 2014-01-11 13:34:00 CET
I'd point out that the same hang is happening on some computers with LiveDVDs - somehow hardware specific?

CC: (none) => mageia

Comment 22 Dick Gevers 2014-01-12 19:48:13 CET
True. First reboot after Live install from some Live disks hangs forever and does not allow any mouse/mousepad/keyboard action in order to login.

Summary: First reboot after classical install hangs forever at "Checking for new hardware" => First reboot after install hangs forever at "Checking for new hardware"
Source RPM: Mageia-4-RC-*-DVD.iso prerelease round #5 => Mageia-4-RC-*-DVD.iso

Comment 23 Dick Gevers 2014-01-15 06:53:16 CET
Valid for 8th round of isos.
Comment 24 Colin Guthrie 2014-01-15 10:13:50 CET
If it's convenient, can you test if applying this patch to the installed /usr/lib/libDrakx/services.pm before the first reboot solves the problem?

http://gitweb.mageia.org/software/drakx/commit/?id=341ee9fe5e9810a8cc1a7d7156da0158f78dce68 (obviously ignore the NEWS changes...)

I think it should (it certainly helps the Live ISOs at the same point and it's more or less the same bug).
Comment 25 Dick Gevers 2014-01-15 11:58:20 CET
In answer to Colin Guthrie #C24:

I wouldn't mind trying, but I'm no hacker and I don't have any idea how I should apply the patch between/during finishing install and first reboot: sorry.
Comment 26 Sander Lepik 2014-01-15 12:35:24 CET
If you have working live media you can insert this after install finishes and boot into that system, then mount your root partition and edit /usr/lib/libDrakX/services.pm - 4 lines to change. After that boot up your installed system.
Comment 27 Dick Gevers 2014-01-15 12:42:50 CET
Thanks, I'll try that on install now running.
Comment 28 Colin Guthrie 2014-01-15 13:04:33 CET
It's actually even easier. At the end of the install when you enter a password for users or on the services page, you can just flip to tty2 and tweak your install via the /mnt tree (e.g. in this case you'd want to patch /mnt/usr/lib/libDrakx/services.pm). You can then run wget/curl + patch etc. (or just apply the changes manually with vi - there are only four changes really I think)
claire robinson 2014-01-15 13:40:38 CET

CC: (none) => eeeemail

Comment 29 Dick Gevers 2014-01-15 14:18:04 CET
32 bit classical install completed.

But there's no vi and there was no connection yet.

So I rebooted to Cauldron (other partition) and made the changes from there to the newly installed file.

Now first reboot after install completes ok to kdm and login is okay.

So IMHO this patch resolves this bug.
Comment 30 Colin Guthrie 2014-01-15 14:25:06 CET
Nice, really good to know :)
Comment 31 Sander Lepik 2014-01-15 14:26:28 CET
Colin, new release on the way then? :P
Comment 32 Dick Gevers 2014-01-15 17:47:18 CET
For good order's sake I have to remark that when the install is an upgrade from Mageia 3.1 there is no hand on first boot afterwards.
Comment 33 Dick Gevers 2014-01-18 17:25:04 CET
Now fixed. Thanks

Status: NEW => RESOLVED
Resolution: (none) => FIXED


Note You need to log in before you can comment on or make changes to this bug.