Bug 16684 - First execution of harddrake2 on fresh system takes forever
Summary: First execution of harddrake2 on fresh system takes forever
Status: RESOLVED FIXED
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: Cauldron
Hardware: x86_64 Linux
Priority: release_blocker major
Target Milestone: ---
Assignee: Colin Guthrie
QA Contact:
URL:
Whiteboard:
Keywords: PATCH
: 17299 17584 (view as bug list)
Depends on:
Blocks:
 
Reported: 2015-09-01 22:33 CEST by Frank Griffin
Modified: 2017-01-17 10:40 CET (History)
7 users (show)

See Also:
Source RPM: harddrake, systemd
CVE:
Status comment:


Attachments
Debug shell for systemd (22.94 KB, image/png)
2016-02-24 12:20 CET, Anne Nicolas
Details

Description Frank Griffin 2015-09-01 22:33:13 CEST
I've noticed this on several fresh installs on two different machines since cauldron reopened.  After the install, the first boot appears to hang at "Checking for new hardware".  It's not a true deadlock, as I let one of them sit and the boot actually proceeded after about 10-15 minutes, but a 10-15 minute timeout for anything is insane.

If I CTRL-ALT-DELETE to force a reboot, systemd takes over the shutdown, and it happens normally.  On subsequent boots, "Checking for new hardware" completes within 5 seconds.

No idea what causes this, or where to look for diagnostics.

Reproducible: 

Steps to Reproduce:
Comment 1 Thierry Vignaud 2015-09-02 12:24:51 CEST
Look at journalctl output (search for service_harddrake)

Keywords: (none) => NEEDINFO
CC: (none) => thierry.vignaud
Source RPM: drakx-tools => harddrake

Comment 2 Frank Griffin 2015-09-02 15:14:47 CEST
Here's the output from the first boot:

*************************************************************************
Sep 01 13:28:04 localhost.localdomain service_harddrake[821]: ### Program is starting ###
Sep 01 13:28:05 localhost.localdomain service_harddrake[821]: running: dmidecode
Sep 01 13:28:05 localhost.localdomain service_harddrake[821]: modified file /etc/sysconfig/harddrake2/kernel
Sep 01 13:28:05 localhost.localdomain service_harddrake[821]: Autoconfiguring laptop tools since we switched between laptop and desktop systems
Sep 01 13:28:05 localhost.localdomain service_harddrake[821]: running: /sbin/chkconfig --list --type xinetd
Sep 01 13:28:05 localhost.localdomain service_harddrake[821]: running: /bin/mountpoint -q /sys/fs/cgroup/systemd
Sep 01 13:28:05 localhost.localdomain service_harddrake[821]: running: /bin/systemctl disable --no-reload apmd.service
Sep 01 13:28:10 localhost.localdomain kernel: ACPI Warning: \_SB_.PCI0.PEG0.GFX0._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20150410/nsarguments-95)
Sep 01 13:28:10 localhost.localdomain kernel: ACPI Warning: \_SB_.PCI0.PEG0.GFX0._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20150410/nsarguments-95)
Sep 01 13:28:10 localhost.localdomain kernel: nouveau E[    PBUS][0000:01:00.0] MMIO write of 0x10000000 FAULT at 0x002a00 [ !ENGINE ]
Sep 01 13:28:12 localhost.localdomain systemd[1]: Received SIGRTMIN+20 from PID 287 (plymouthd).
Sep 01 13:28:31 localhost.localdomain systemd[1]: Looping too fast. Throttling execution a little.
Sep 01 13:28:32 localhost.localdomain systemd[1]: Looping too fast. Throttling execution a little.
Sep 01 13:28:33 localhost.localdomain systemd[1]: Looping too fast. Throttling execution a little.

...
repeats every second or so
...

Sep 01 13:29:06 localhost.localdomain systemd[1]: Looping too fast. Throttling execution a little.
Sep 01 13:29:07 localhost.localdomain systemd[1]: Looping too fast. Throttling execution a little.
Sep 01 13:29:35 localhost.localdomain systemd[1]: Looping too fast. Throttling execution a little.
Sep 01 13:29:36 localhost.localdomain systemd[1]: Received SIGINT.
Sep 01 13:29:36 localhost.localdomain systemd[1]: Requested transaction contradicts existing jobs: Transaction is destructive.
Sep 01 13:29:36 localhost.localdomain systemd[1]: systemd-networkd.socket: Failed to queue service startup job (Maybe the service file is missing or not a non-template unit?): Transaction is destructive.
Sep 01 13:29:36 localhost.localdomain systemd[1]: systemd-networkd.socket: Unit entered failed state.
Sep 01 13:29:36 localhost.localdomain systemd[1]: Stopped Save boot dmesg content.
Sep 01 13:29:36 localhost.localdomain kernel: audit_printk_skb: 3 callbacks suppressed
Sep 01 13:29:36 localhost.localdomain kernel: audit: type=1131 audit(1441128576.919:39): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=mandriva-save-dmesg comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Sep 01 13:29:36 localhost.localdomain systemd[1]: Stopped target Sound Card.
Sep 01 13:29:36 localhost.localdomain audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=mandriva-save-dmesg comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Sep 01 13:29:36 localhost.localdomain service_harddrake[821]: running: /sbin/chkconfig --list --type xinetd
...
Sep 01 13:29:36 localhost.localdomain systemd[1]: Stopped Login Service.
Sep 01 13:29:36 localhost.localdomain service_harddrake[821]: running: /bin/mountpoint -q /sys/fs/cgroup/systemd
...
Sep 01 13:29:36 localhost.localdomain systemd[1]: Stopped Name Service Cache Daemon.
Sep 01 13:29:36 localhost.localdomain service_harddrake[821]: running: /bin/systemctl enable --no-reload laptop-mode.service
*************************************************************************

The "Stopped" line is probably the start of the CTRL-ALT-DELETE initiating shutdown.

On the reboot, we have:

*************************************************************************
Sep 01 13:30:12 localhost.localdomain service_harddrake[802]: ### Program is starting ###
Sep 01 13:30:13 localhost.localdomain service_harddrake[802]: running: dmidecode
Sep 01 13:30:13 localhost.localdomain service_harddrake[802]: modified file /etc/sysconfig/harddrake2/kernel
Sep 01 13:30:14 localhost.localdomain service_harddrake[802]: created file /etc/sysconfig/harddrake2/previous_hw
Sep 01 13:30:14 localhost.localdomain service_harddrake[802]: launched command: display_driver_helper --check-loaded
*************************************************************************

and that's all.

Keywords: NEEDINFO => (none)

Comment 3 Thierry Vignaud 2015-09-02 15:38:44 CEST
@Colin: it looks like if harddrake issues some systemctl commands during boot, systemd goes wild

CC: (none) => mageia
Source RPM: harddrake => harddrake, systemd

Comment 4 Thierry Vignaud 2015-09-02 15:39:17 CEST
@Franck: please attach such info instead of pasting, as then it's wrapped and thus unreadable :-(
Comment 5 Colin Guthrie 2015-09-02 15:47:24 CEST
Yeah, the newer systemd versions have reworked a bit of the non-blocking stuff (although it could be the enable/disable logic too) that I put in a while back to avoid such situations. We should probably try and refactor as much of this as possible before MGA6 to try and avoid some of the calls rather than the sticking plaster I had in place previously.

I'll try and take a look as I think there is also an issue about enabling/disabling when there is both a native systemd unit and a legacy sysvinit script (the latter should be dropped anyway as it has no value to us, but I think it's still a bug and thus should be fixed!)
Comment 6 Frank Griffin 2015-09-02 16:52:31 CEST
(In reply to Thierry Vignaud from comment #4)
> @Franck: please attach such info instead of pasting, as then it's wrapped
> and thus unreadable :-(

Good point.  I usually look at these things with vi, so I'm too used to seeing it wrapped :-)
Comment 7 Frank Griffin 2015-11-03 03:18:10 CET
Still happening in current cauldron.
claire robinson 2015-11-24 14:30:02 CET

CC: (none) => eeeemail, tmb

Comment 8 Frank Griffin 2015-11-25 14:32:58 CET
It occurs to me that I've seen something similar (at least in symptom) in the past.  I didn't do extensive diagnosis at the time, but it may have some bearing on this...

A laptop of mine died because the power pin connection to the mainboard came detached and it couldn't charge or run from AC.  As an interim solution, I simply transferred the hard drive to another laptop with different graphics, sound card, etc., specifications.

When I tried to boot, it hung in the same way described here.  I hadn't held out much hope that it would work, but I let it sit for about an hour.  To my surprise, when I came back to it, the DM login screen was up, the system ran perfectly, and subsequent boots went fine.

I assumed that harddrake2 had detected the hardware mismatch, and had triggered some sort of massive hardware analysis that configured all of the driver load specifications to the new hardware.  I remember being pretty impressed with this.

Could something be triggering this process unnecessarily here ?
Florian Hubold 2015-12-07 20:11:05 CET

CC: (none) => doktor5000

Comment 9 Florian Hubold 2015-12-09 19:55:41 CET
*** Bug 17299 has been marked as a duplicate of this bug. ***

CC: (none) => paiiou

Florian Hubold 2015-12-09 19:57:17 CET

Blocks: (none) => 15527

Comment 10 Georges Eckenschwiller 2015-12-18 15:33:47 CET
I have two remarks

1/ When I install Mageia, I have this message at the end of the installation of the packages:
script failed for systemd-227-3mga6
script failed for pm-fallback-policy-0.1-14.mga5

2/ I also have a rather long wait (approximately 30 s) between the end of plymouth and the screen of sddm
Comment 11 Florian Hubold 2015-12-18 16:15:12 CET
(In reply to Georges Eckenschwiller from comment #10)
> 1/ When I install Mageia, I have this message at the end of the installation
> of the packages:
> script failed for systemd-227-3mga6
> script failed for pm-fallback-policy-0.1-14.mga5

Hi, that doesn't belong here, it's covered in bug 17217 .

> 2/ I also have a rather long wait (approximately 30 s) between the end of
> plymouth and the screen of sddm

That also doesn't really belong here, you should open a separate bug report if it's reproducible.
Comment 12 Georges Eckenschwiller 2015-12-23 11:49:43 CET
I do not know if it is important, but

1/ As indicated in bug 17299, It is mandriva-everytime which lasts a little more than 6 minutes

2/ I notice that my partitions sda6(/home), sda7-/var/www) and sda8(/var/ftp) aren't mounted at the end of the boot.
Comment 13 Florian Hubold 2015-12-23 18:31:24 CET
(In reply to Georges Eckenschwiller from comment #12)
> 1/ As indicated in bug 17299

That's why it has been marked as a duplicate of this bug.

> 2/ I notice that my partitions sda6(/home), sda7-/var/www) and
> sda8(/var/ftp) aren't mounted at the end of the boot.

That does not belong here.
Comment 14 Georges Eckenschwiller 2016-01-19 11:53:49 CET
Problem still present with systemd 228
Comment 15 Samuel Verschelde 2016-01-19 12:08:40 CET
Assigning to systemd maintainer and setting as blocker for Mageia 6.

Priority: Normal => release_blocker
Assignee: bugsquad => mageia
Severity: normal => major

Comment 16 Anne Nicolas 2016-02-21 22:43:34 CET
Just tested as requested by coling with debug option for systemd. The point is it freezes very early in boot so we cannot see anything in a console.

I'm using a test iso using last version of systemd which still fails in post-install:
"ddebug.log:* urpmi error: ERROR: 'script' failed for systemd-229-1.mga6"

CC: (none) => ennael1

Comment 17 Anne Nicolas 2016-02-24 12:20:29 CET
Created attachment 7491 [details]
Debug shell for systemd
Comment 18 Thierry Vignaud 2016-02-24 13:18:08 CET
We could switch some "checked_on_boot => 1" into "checked_on_boot => 0" in http://gitweb.mageia.org/software/drakx/tree/perl-install/harddrake/data.pm

Those days, there's quite a lot of things that are autoloaded by udev (sata, raid, pata, card reader, firewire, scsi, usb, cd/dvd, ...)
Some are even obsolete (zip, ...)

I'll push some fixes for this tonight

Obviously those who take time are those who have a configurator and thus sg to do when hardware changes.

We could also alter stage2 so that it dumps a dummy /etc/sysconfig/harddrake2/previous_hw so that we don't do anything on first boot if hw hasn't actually changed since installation.
Comment 19 Thierry Vignaud 2016-03-05 10:11:08 CET
I cannot reproduce this bug but I've pushed a patched harddrake service that will log what takes time.
Please test with drakxtools-17.19-2.mga6
Comment 20 Anne Nicolas 2016-03-06 12:13:26 CET
Working on it. I'll post here the result today
Comment 21 Anne Nicolas 2016-03-06 21:51:26 CET
So I had to update release tag that was not updated so I used drakxtools-17.19-3.mga6 in test iso.

Following logs for this. First boot will end at 21:21:01.
Comment 22 Anne Nicolas 2016-03-06 21:55:24 CET
Logs can be found here: http://people.hupstream.com/~ennael/journal.log
Comment 23 Florian Hubold 2016-03-07 00:08:51 CET
This seems to be the issue and apmd is a legacy sysv init script: /etc/rc.d/init.d/apmd


mars 06 21:21:02 localhost service_harddrake[686]: running: /bin/systemctl disable --no-reload apmd.service
mars 06 21:21:04 localhost systemd[1]: Received SIGRTMIN+20 from PID 273 (plymouthd).
mars 06 21:21:07 localhost systemd[1]: Received SIGRTMIN+20 from PID 273 (plymouthd).
mars 06 21:23:02 localhost service_harddrake[686]: ERROR: killing runaway process (process=/bin/systemctl, pid=710, args=disable --no-reload apmd.service, error=ALARM at /usr/lib/libDrakX/run_program.pm line 241.
                                                   )
mars 06 21:23:02 localhost service_harddrake[686]: running: /sbin/chkconfig --list --type xinetd
mars 06 21:23:02 localhost service_harddrake[686]: running: /bin/mountpoint -q /sys/fs/cgroup/systemd
mars 06 21:23:02 localhost service_harddrake[686]: running: /bin/systemctl disable --no-reload laptop-mode.service
-- Reboot --

Although not sure why it runs twice ...


If we disable apmd at every boot via harddrake, why do we enable it during package installation in the first place? Can't we simply not enable it during apmd package installation and stop harddrake disabling it?

But the systemd issue that coling meant still needs to be fixed.
Comment 24 Thierry Vignaud 2016-03-07 10:17:53 CET
We can make the installer to store a value for IS_LAPTOP in /etc/sysconfig/harddrake2/kernel so that harddrake doesn't play with that on first boot (thus no 2m timeout for apmd -- though we should investigate if it's OK on a laptop after that change)
Comment 25 Thierry Vignaud 2016-03-07 10:43:19 CET
I'm uploading a new stage2 with an untested fix
Comment 26 Thierry Vignaud 2016-03-07 17:59:04 CET
It looks good for me.
I'll push my git tree later.

Keywords: (none) => PATCH
Status: NEW => ASSIGNED

Comment 27 Mageia Robot 2016-03-08 01:28:41 CET
commit 7e2d5f1447ad5e269ce8f5eab4a8b8ade7e96f85
Author: Thierry Vignaud <thierry.vignaud@...>
Date:   Mon Mar 7 10:25:32 2016 +0100

    prevent slow boot on first boot (mga#16684)
---
 Commit Link:
   http://gitweb.mageia.org/software/drakx/commit/?id=7e2d5f1447ad5e269ce8f5eab4a8b8ade7e96f85
Comment 28 Anne Nicolas 2016-03-08 08:41:51 CET
Fixed in last set of test isos
Comment 29 Thierry Vignaud 2016-03-08 10:26:59 CET
We would to check on a laptop though...
Comment 30 Rémi Verschelde 2016-03-14 12:59:06 CET
*** Bug 17584 has been marked as a duplicate of this bug. ***

CC: (none) => westel

Comment 31 Thierry Vignaud 2016-06-02 15:23:55 CEST
Presumed fixed

Status: ASSIGNED => RESOLVED
Resolution: (none) => FIXED

Samuel Verschelde 2017-01-17 10:29:39 CET

Blocks: 15527 => (none)

Georges Eckenschwiller 2017-01-17 10:40:45 CET

CC: paiiou => (none)


Note You need to log in before you can comment on or make changes to this bug.