| Summary: | First execution of harddrake2 on fresh system takes forever | ||
|---|---|---|---|
| Product: | Mageia | Reporter: | Frank Griffin <ftg> |
| Component: | RPM Packages | Assignee: | Colin Guthrie <mageia> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | major | ||
| Priority: | release_blocker | CC: | doktor5000, eeeemail, ennael1, mageia, thierry.vignaud, tmb, westel |
| Version: | Cauldron | Keywords: | PATCH |
| Target Milestone: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Source RPM: | harddrake, systemd | CVE: | |
| Status comment: | |||
| Attachments: | Debug shell for systemd | ||
|
Description
Frank Griffin
2015-09-01 22:33:13 CEST
Look at journalctl output (search for service_harddrake) Keywords:
(none) =>
NEEDINFO Here's the output from the first boot: ************************************************************************* Sep 01 13:28:04 localhost.localdomain service_harddrake[821]: ### Program is starting ### Sep 01 13:28:05 localhost.localdomain service_harddrake[821]: running: dmidecode Sep 01 13:28:05 localhost.localdomain service_harddrake[821]: modified file /etc/sysconfig/harddrake2/kernel Sep 01 13:28:05 localhost.localdomain service_harddrake[821]: Autoconfiguring laptop tools since we switched between laptop and desktop systems Sep 01 13:28:05 localhost.localdomain service_harddrake[821]: running: /sbin/chkconfig --list --type xinetd Sep 01 13:28:05 localhost.localdomain service_harddrake[821]: running: /bin/mountpoint -q /sys/fs/cgroup/systemd Sep 01 13:28:05 localhost.localdomain service_harddrake[821]: running: /bin/systemctl disable --no-reload apmd.service Sep 01 13:28:10 localhost.localdomain kernel: ACPI Warning: \_SB_.PCI0.PEG0.GFX0._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20150410/nsarguments-95) Sep 01 13:28:10 localhost.localdomain kernel: ACPI Warning: \_SB_.PCI0.PEG0.GFX0._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20150410/nsarguments-95) Sep 01 13:28:10 localhost.localdomain kernel: nouveau E[ PBUS][0000:01:00.0] MMIO write of 0x10000000 FAULT at 0x002a00 [ !ENGINE ] Sep 01 13:28:12 localhost.localdomain systemd[1]: Received SIGRTMIN+20 from PID 287 (plymouthd). Sep 01 13:28:31 localhost.localdomain systemd[1]: Looping too fast. Throttling execution a little. Sep 01 13:28:32 localhost.localdomain systemd[1]: Looping too fast. Throttling execution a little. Sep 01 13:28:33 localhost.localdomain systemd[1]: Looping too fast. Throttling execution a little. ... repeats every second or so ... Sep 01 13:29:06 localhost.localdomain systemd[1]: Looping too fast. Throttling execution a little. Sep 01 13:29:07 localhost.localdomain systemd[1]: Looping too fast. Throttling execution a little. Sep 01 13:29:35 localhost.localdomain systemd[1]: Looping too fast. Throttling execution a little. Sep 01 13:29:36 localhost.localdomain systemd[1]: Received SIGINT. Sep 01 13:29:36 localhost.localdomain systemd[1]: Requested transaction contradicts existing jobs: Transaction is destructive. Sep 01 13:29:36 localhost.localdomain systemd[1]: systemd-networkd.socket: Failed to queue service startup job (Maybe the service file is missing or not a non-template unit?): Transaction is destructive. Sep 01 13:29:36 localhost.localdomain systemd[1]: systemd-networkd.socket: Unit entered failed state. Sep 01 13:29:36 localhost.localdomain systemd[1]: Stopped Save boot dmesg content. Sep 01 13:29:36 localhost.localdomain kernel: audit_printk_skb: 3 callbacks suppressed Sep 01 13:29:36 localhost.localdomain kernel: audit: type=1131 audit(1441128576.919:39): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=mandriva-save-dmesg comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Sep 01 13:29:36 localhost.localdomain systemd[1]: Stopped target Sound Card. Sep 01 13:29:36 localhost.localdomain audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=mandriva-save-dmesg comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Sep 01 13:29:36 localhost.localdomain service_harddrake[821]: running: /sbin/chkconfig --list --type xinetd ... Sep 01 13:29:36 localhost.localdomain systemd[1]: Stopped Login Service. Sep 01 13:29:36 localhost.localdomain service_harddrake[821]: running: /bin/mountpoint -q /sys/fs/cgroup/systemd ... Sep 01 13:29:36 localhost.localdomain systemd[1]: Stopped Name Service Cache Daemon. Sep 01 13:29:36 localhost.localdomain service_harddrake[821]: running: /bin/systemctl enable --no-reload laptop-mode.service ************************************************************************* The "Stopped" line is probably the start of the CTRL-ALT-DELETE initiating shutdown. On the reboot, we have: ************************************************************************* Sep 01 13:30:12 localhost.localdomain service_harddrake[802]: ### Program is starting ### Sep 01 13:30:13 localhost.localdomain service_harddrake[802]: running: dmidecode Sep 01 13:30:13 localhost.localdomain service_harddrake[802]: modified file /etc/sysconfig/harddrake2/kernel Sep 01 13:30:14 localhost.localdomain service_harddrake[802]: created file /etc/sysconfig/harddrake2/previous_hw Sep 01 13:30:14 localhost.localdomain service_harddrake[802]: launched command: display_driver_helper --check-loaded ************************************************************************* and that's all. Keywords:
NEEDINFO =>
(none) @Colin: it looks like if harddrake issues some systemctl commands during boot, systemd goes wild CC:
(none) =>
mageia @Franck: please attach such info instead of pasting, as then it's wrapped and thus unreadable :-( Yeah, the newer systemd versions have reworked a bit of the non-blocking stuff (although it could be the enable/disable logic too) that I put in a while back to avoid such situations. We should probably try and refactor as much of this as possible before MGA6 to try and avoid some of the calls rather than the sticking plaster I had in place previously. I'll try and take a look as I think there is also an issue about enabling/disabling when there is both a native systemd unit and a legacy sysvinit script (the latter should be dropped anyway as it has no value to us, but I think it's still a bug and thus should be fixed!) (In reply to Thierry Vignaud from comment #4) > @Franck: please attach such info instead of pasting, as then it's wrapped > and thus unreadable :-( Good point. I usually look at these things with vi, so I'm too used to seeing it wrapped :-) Still happening in current cauldron.
claire robinson
2015-11-24 14:30:02 CET
CC:
(none) =>
eeeemail, tmb It occurs to me that I've seen something similar (at least in symptom) in the past. I didn't do extensive diagnosis at the time, but it may have some bearing on this... A laptop of mine died because the power pin connection to the mainboard came detached and it couldn't charge or run from AC. As an interim solution, I simply transferred the hard drive to another laptop with different graphics, sound card, etc., specifications. When I tried to boot, it hung in the same way described here. I hadn't held out much hope that it would work, but I let it sit for about an hour. To my surprise, when I came back to it, the DM login screen was up, the system ran perfectly, and subsequent boots went fine. I assumed that harddrake2 had detected the hardware mismatch, and had triggered some sort of massive hardware analysis that configured all of the driver load specifications to the new hardware. I remember being pretty impressed with this. Could something be triggering this process unnecessarily here ?
Florian Hubold
2015-12-07 20:11:05 CET
CC:
(none) =>
doktor5000
Florian Hubold
2015-12-09 19:57:17 CET
Blocks:
(none) =>
15527 I have two remarks 1/ When I install Mageia, I have this message at the end of the installation of the packages: script failed for systemd-227-3mga6 script failed for pm-fallback-policy-0.1-14.mga5 2/ I also have a rather long wait (approximately 30 s) between the end of plymouth and the screen of sddm (In reply to Georges Eckenschwiller from comment #10) > 1/ When I install Mageia, I have this message at the end of the installation > of the packages: > script failed for systemd-227-3mga6 > script failed for pm-fallback-policy-0.1-14.mga5 Hi, that doesn't belong here, it's covered in bug 17217 . > 2/ I also have a rather long wait (approximately 30 s) between the end of > plymouth and the screen of sddm That also doesn't really belong here, you should open a separate bug report if it's reproducible. I do not know if it is important, but 1/ As indicated in bug 17299, It is mandriva-everytime which lasts a little more than 6 minutes 2/ I notice that my partitions sda6(/home), sda7-/var/www) and sda8(/var/ftp) aren't mounted at the end of the boot. (In reply to Georges Eckenschwiller from comment #12) > 1/ As indicated in bug 17299 That's why it has been marked as a duplicate of this bug. > 2/ I notice that my partitions sda6(/home), sda7-/var/www) and > sda8(/var/ftp) aren't mounted at the end of the boot. That does not belong here. Problem still present with systemd 228 Assigning to systemd maintainer and setting as blocker for Mageia 6. Priority:
Normal =>
release_blocker Just tested as requested by coling with debug option for systemd. The point is it freezes very early in boot so we cannot see anything in a console. I'm using a test iso using last version of systemd which still fails in post-install: "ddebug.log:* urpmi error: ERROR: 'script' failed for systemd-229-1.mga6" CC:
(none) =>
ennael1 Created attachment 7491 [details]
Debug shell for systemd
We could switch some "checked_on_boot => 1" into "checked_on_boot => 0" in http://gitweb.mageia.org/software/drakx/tree/perl-install/harddrake/data.pm Those days, there's quite a lot of things that are autoloaded by udev (sata, raid, pata, card reader, firewire, scsi, usb, cd/dvd, ...) Some are even obsolete (zip, ...) I'll push some fixes for this tonight Obviously those who take time are those who have a configurator and thus sg to do when hardware changes. We could also alter stage2 so that it dumps a dummy /etc/sysconfig/harddrake2/previous_hw so that we don't do anything on first boot if hw hasn't actually changed since installation. I cannot reproduce this bug but I've pushed a patched harddrake service that will log what takes time. Please test with drakxtools-17.19-2.mga6 Working on it. I'll post here the result today So I had to update release tag that was not updated so I used drakxtools-17.19-3.mga6 in test iso. Following logs for this. First boot will end at 21:21:01. Logs can be found here: http://people.hupstream.com/~ennael/journal.log This seems to be the issue and apmd is a legacy sysv init script: /etc/rc.d/init.d/apmd
mars 06 21:21:02 localhost service_harddrake[686]: running: /bin/systemctl disable --no-reload apmd.service
mars 06 21:21:04 localhost systemd[1]: Received SIGRTMIN+20 from PID 273 (plymouthd).
mars 06 21:21:07 localhost systemd[1]: Received SIGRTMIN+20 from PID 273 (plymouthd).
mars 06 21:23:02 localhost service_harddrake[686]: ERROR: killing runaway process (process=/bin/systemctl, pid=710, args=disable --no-reload apmd.service, error=ALARM at /usr/lib/libDrakX/run_program.pm line 241.
)
mars 06 21:23:02 localhost service_harddrake[686]: running: /sbin/chkconfig --list --type xinetd
mars 06 21:23:02 localhost service_harddrake[686]: running: /bin/mountpoint -q /sys/fs/cgroup/systemd
mars 06 21:23:02 localhost service_harddrake[686]: running: /bin/systemctl disable --no-reload laptop-mode.service
-- Reboot --
Although not sure why it runs twice ...
If we disable apmd at every boot via harddrake, why do we enable it during package installation in the first place? Can't we simply not enable it during apmd package installation and stop harddrake disabling it?
But the systemd issue that coling meant still needs to be fixed.
We can make the installer to store a value for IS_LAPTOP in /etc/sysconfig/harddrake2/kernel so that harddrake doesn't play with that on first boot (thus no 2m timeout for apmd -- though we should investigate if it's OK on a laptop after that change) I'm uploading a new stage2 with an untested fix It looks good for me. I'll push my git tree later. Keywords:
(none) =>
PATCH commit 7e2d5f1447ad5e269ce8f5eab4a8b8ade7e96f85
Author: Thierry Vignaud <thierry.vignaud@...>
Date: Mon Mar 7 10:25:32 2016 +0100
prevent slow boot on first boot (mga#16684)
---
Commit Link:
http://gitweb.mageia.org/software/drakx/commit/?id=7e2d5f1447ad5e269ce8f5eab4a8b8ade7e96f85
Fixed in last set of test isos We would to check on a laptop though... Presumed fixed Status:
ASSIGNED =>
RESOLVED
Samuel Verschelde
2017-01-17 10:29:39 CET
Blocks:
15527 =>
(none)
Georges Eckenschwiller
2017-01-17 10:40:45 CET
CC:
paiiou =>
(none) |